Re: FDO and LTO on ARM

2011-08-05 Thread Richard Guenther
On Thu, Aug 4, 2011 at 8:42 PM, Jan Hubicka  wrote:
>>> Did you try using FDO with -Os?  FDO should make hot code parts
>>> optimized similar to -O3 but leave other pieces optimized for size.
>>> Using FDO with -O3 gives you the opposite, cold portions optimized
>>> for size while the rest is optimized for speed.
>
> FDO with -Os still optimize for size, even in hot parts.

I don't think so.  Or at least that would be a bug.  Shouldn't 'hot'
BBs/functions
be optimized for speed even at -Os?  Hm, I see predict.c indeed returns
always false for optimize_size :(

I thought we had just the neither cold or hot parts optimized according
to optimize_size.

>  So to get resonale
> speedups you need -O3+FDO.  -O3+FDO effectively defaults to -Os in cold
> portions of program.

Well, but unless your training coverage is 100% all parts with no coverage
get optimized with -O3 instead of -Os.  And I bet coverage for mozilla
isn't even close to 100%.  Thus I think recommending -O3 for FDO is
usually a bad idea.

So - did you try FDO with -O2? ;)

> Still -Os+FDO should be somewhat faster than -Os alone, so a slowdown is
> bug.  It is not very thoroughly since it is not really used in practice.
>
>>> Also do you get any warnings on profile mismatches? Perhaps something
>>> is wrong to the degree that the relevant part of profile gets
>>> misapplied.
>>
>> I don't get any warning on profile mismatches. I only get a "few"
>> missing gcda files warning, but that's expected.
>
> Perhaps you could compile one of less trivial files you are sure that are
> covered by train run and send me -fdump-tree-all-blocks -fdump-ipa-all dumps
> of the compilation so I can double check the profile seems sane. This could
> be good start to rule out something stupid.
>
> Honza
>>
>> Cheers,
>>
>> Mike
>>
>
>
>


Re: [named address] rejects-valid: error: initializer element is not computable at load time

2011-08-05 Thread Ulrich Weigand
Georg-Johann Lay wrote:
> Ulrich Weigand wrote:
> > This is pretty much working as expected.  "pallo" is a string literal
> > which (always) resides in the default address space.  According to the
> > named address space specification (TR 18037) there are no string literals
> > in non-default address spaces ...
> 
> The intension of TR 18037 to supply "Extension to Support Embedded Systems"
> and these are systems where every byte counts -- and it counts *where* the
> byte will be placed.
> 
> Basically named AS provide something like target specific qualifiers, and
> if GCC, maybe under the umbrella of GNU-C, would actually implement a feature
> like target specific qualifiers, that would be a great gain and much more
> appreciated than -- users will perceive it that way -- being more catholic
> than the pope ;-)

The problem with all language extensions is that you really need to be careful
that the new feature you want to add is fully specified, in all its potential
interactions with the rest of the (existing) language features.  If you don't,
some of those ambiguities are certain to be causing you problems later on --
in fact, that's just what has happened time and again with GCC extensions
that were added early on ...  This is why these days, extensions usually are
accepted only if they *are* fully specified (which usually means providing
a "diff" to the C standard text that would add the feature to the standard).

This is a non-trivial task.  One of the reasons why we decided to follow the
TR 18037 spec when implementing the __ea extension for SPU is that this task
had already been done for us.  If you want to deviate from that existing spec,
you're back to doing this work yourself.

> For example, you can have any combination of qualifiers like const, restrict
> or volatile, but it is not possible for named AS.  That's clear as long as
> named AS is as strict as TR 18037.  However, users want features to write
> down their code an a comfortable, type-safe way and not as it is at the 
> moment,
> i.e. by means of dreaded inline assembler macros (for my specific case).

A named AS qualifier *can* be combined with other qualifiers like const.
It cannot be combined with *other* named AS qualifiers, because that doesn't
make sense in the semantics underlying the address space concept of TR 18037.
What would you expect a combination of two AS qualifiers to mean?

> > The assignment above would therefore need to convert a pointer to the
> > string literal in the default space to a pointer to the __pgm address
> > space.  This might be impossible (depending on whether __pgm encloses
> > the generic space), and even if it is possible, it is not guaranteed
> > that the conversion can be done as a constant expression at compile time.
> 
> The backend can tell. It likes to implement features to help users.
> It knows about the semantic and if it's legal or not.
> 
> And even if it's all strict under TR 18037, the resulting error messages
> are *really* confusing to users because to them, a string literal's address
> is known.

It would be possible to the extend named AS implementation to allow AS pointer
conversions in initializers in those cases where the back-end knows this can
be done at load time.  (Since this is all implementation-defined anyway, it
would cause no issues with the standard.  We simply didn't do it because on
the SPU, it is not easily possible.)

Of course, that still wouldn't place the string literal into the non-generic
address space, it just would convert its address.

> > What I'd do to get a string placed into the __pgm space is to explicitly
> > initialize an *array* in __pgm space, e.g. like so:
> > 
> > const __pgm char pallo[] = "pallo";
> > 
> > This is defined to work according to TR 18037, and it does actually
> > work for me on spu-elf.
> 
> Ya, but it different to the line above.

Sure, because it allocates only the string data, and not in addition a
pointer to it as your code did ...

> Was just starting with the work and
> it worked some time ago, so I wondered.

I think some time in the past, there was a bug where initalizers like in
you original line were silently accepted but then incorrect code was
generated (i.e. the pointer would just be initialized to an address in
the generic address space, without any conversion).

> And I must admit I am not familiar
> will all the dreaded restriction TR 18037 imposes to render it less 
> functional :-(

It's not a restriction so much, it's simply that TR 18037 does not say anything
about string literals at all, so they keep working as they do in standard C.

> Do you think a feature like "target specific qualifier" would be reasonable?
> IMO it would be greatly appreciated by users.
> Should not be too hard atop the work already being done for named addresses.

As I said, any further extension would need to be carefully specified ...
In any case, whether this would then be accepted would be up to the
front-end maintainers, of cours

Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-05 Thread Ulrich Weigand
Georg-Johann Lay wrote:
> Ulrich Weigand wrote:
> > I'd be happy to bring this up to date if you're willing to work with
> > me to get this tested on a target that needs this support ...
> 
> Just attached a patch to bugzilla because an AVR user wanted to play
> with the AS support and asked me to supply my changes. It's still in
> a mess but you could get a more reasonable base than on a target where
> all named addresses vanish at expand.
> 
> The patch is fresh and attached to the enhancement PR49868, just BE stuff.
> There is also some sample code.

OK, I'll have a look.

Looking at your definition of the routines avr_addr_space_subset_p and
avr_addr_space_convert, they appear to imply that any generic address
can be used without conversion as a __pgm address and vice versa.

That is: avr_addr_space_subset_p says that __pgm is both a superset
and a subset of the generic address space, so they must be co-extensive.
avr_addr_space_convert then says that the addresses can be converted
between the two without changing their value.

Is this really true?  If so, why have a distinct __pgm address space
in the first place?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-05 Thread Georg-Johann Lay

Ulrich Weigand schrieb:

Georg-Johann Lay wrote:


Ulrich Weigand wrote:


I'd be happy to bring this up to date if you're willing to work with
me to get this tested on a target that needs this support ...


Just attached a patch to bugzilla because an AVR user wanted to play
with the AS support and asked me to supply my changes. It's still in
a mess but you could get a more reasonable base than on a target where
all named addresses vanish at expand.

The patch is fresh and attached to the enhancement PR49868, just BE stuff.
There is also some sample code.


OK, I'll have a look.

Looking at your definition of the routines avr_addr_space_subset_p and
avr_addr_space_convert, they appear to imply that any generic address
can be used without conversion as a __pgm address and vice versa.


There is a conversion HI (pmode) <-> PHI (__pgm)

These two modes resp. insns have the same arithmetic but differ in the
instructions they emit and in reloading.


That is: avr_addr_space_subset_p says that __pgm is both a superset
and a subset of the generic address space, so they must be co-extensive.
avr_addr_space_convert then says that the addresses can be converted
between the two without changing their value.

Is this really true?  If so, why have a distinct __pgm address space
in the first place?

Bye,
Ulrich


AVR hardware has basically three address spaces:

Memoy Physical   Mode  Instruction  Holds
--
RAM   0, 1, 2, ...   HI=Pmode  LD*, ST* .data, .rodata, .bss
Flash 0, 1, 2, ...   PHI   LPM  .text, .progmem.data
EEPROM0, 1, 2, ...   --via SFR  .eeprom

Devices have just some KB of RAM and constants are put into
.progmem.data via attribute progmem and read via inline asm.

AVR has three address registers X, Y and Z that can access memory.
SP is fixed and can just do push/pop on RAM.

Adressing capabilities follow. Only accessing byte is supported by HW:

RAM:
Constant address
*X, *--X, *X++
*Y, *--Y, *Y++, *(Y+offset)  is Frame pointer
*Z, *--Z, *Z++, *(Z+offset)

Offset in [0, 63]

Flash:
*Z, *Z++

Of course, RAM and Flash are no subsets of each other when regarded as
physical memory, but they are subsets when regarded as numbers. This
lead to my mistake to define RAM and Flash being no subsets of each other:
  http://gcc.gnu.org/ml/gcc/2010-11/msg00170.html

In a typical AVR program the user knows at compile time how to access a
variable and use an appropriate pointer like int* or const __pgm int*.
In a current program he uses inline asm for the second case.

However, there are situations like the following where you like to take
the decision at runtime:

char cast_3 (char in_pgm, void * p)
{
return in_pgm ? (*((char __pgm *) p)) : (*((char *) p));
}

The numeric value of p will stay exactly the same; just the mode and 
thus the access instruction changes like


 if (in_pgm)
   r = LPM Z (PHI:Z)
 else
   r = LD Z (HI:Z or LD X+ or whatever)

Linearizing the address space at compiler level is not wanted because 
that lead to bulky, slow code and reduced the effective address space

available for Flash which might be up to 64kWords.

An address in X, Y, Z is 16 bits wide, these regs occupy 2 hard regs.

Johann


Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-05 Thread Ulrich Weigand
Georg-Johann Lay wrote:

> AVR hardware has basically three address spaces:
[snip]

OK, thanks for the information!

> Of course, RAM and Flash are no subsets of each other when regarded as
> physical memory, but they are subsets when regarded as numbers. This
> lead to my mistake to define RAM and Flash being no subsets of each other:
>http://gcc.gnu.org/ml/gcc/2010-11/msg00170.html

Right, in your situation those are *not* subsets according to the AS rules,
so your avr_addr_space_subset_p routine needs to always return false
(which of course implies you don't need a avr_addr_space_convert routine).

Getting back to the discussion in the other thread, this also means that
pointer conversions during initialization cannot happen either, so this
discussion is basically moot.

> However, there are situations like the following where you like to take
> the decision at runtime:
> 
> char cast_3 (char in_pgm, void * p)
> {
>  return in_pgm ? (*((char __pgm *) p)) : (*((char *) p));
> }

That's really an abuse of "void *" ... if you have an address in the
Flash space, you should never assign it to a "void *".

Instead, if you just have a address and you don't know ahead of time
whether it refers to Flash or RAM space, you ought to hold that number
in an "int" (or "short" or whatever integer type is most appropriate),
and then convert from that integer type to either a "char *" or a
"char __pgm *".

> Linearizing the address space at compiler level is not wanted because 
> that lead to bulky, slow code and reduced the effective address space
> available for Flash which might be up to 64kWords.

I guess to simplify things like the above, you might have a third
address space (e.g. "__far") that is a superset of both the default
address space and the __pgm address space.  Pointers in the __far
address space might be e.g. 3 bytes wide, with the low 2 bytes
holding the address and the high byte identifying whether the address
is in Flash or RAM.

Then a plain dereference of a __far pointer would do the equivalent
of your cast_3 routine above.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-05 Thread Michael Matz
Hi,

On Fri, 5 Aug 2011, Ulrich Weigand wrote:

> > However, there are situations like the following where you like to take
> > the decision at runtime:
> > 
> > char cast_3 (char in_pgm, void * p)
> > {
> >  return in_pgm ? (*((char __pgm *) p)) : (*((char *) p));
> > }
> 
> That's really an abuse of "void *" ... if you have an address in the
> Flash space, you should never assign it to a "void *".
> 
> Instead, if you just have a address and you don't know ahead of time
> whether it refers to Flash or RAM space, you ought to hold that number
> in an "int" (or "short" or whatever integer type is most appropriate),
> and then convert from that integer type to either a "char *" or a
> "char __pgm *".

That would leave standard C.  You aren't allowed to construct pointers out 
of random integers.  I'd rather choose to abuse "void*" to be able to 
point into a yet-unspecified address spaces, which becomes specified once 
the void* pointer is converted into a non-void pointer (which it must be 
because you can't dereference a void pointer, hence it does no harm to 
leave its address space unspecified).

That would point to a third address space, call it "undef" :)  It would be 
superset of default and pgm, conversions between undef to {default,pgm} 
are allowed freely (and value preserving, i.e. trivial).  Conversion into 
undef could be rejected.  If they are allowed too, then also conversions 
between default and pgm are possible (via an intermediate step over 
undef), at which point the whole excercise seems a bit pointless and one 
could just as well allow conversions between default and pgm.


Ciao,
Michael.


Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-05 Thread Ulrich Weigand
Michael Matz wrote:
> On Fri, 5 Aug 2011, Ulrich Weigand wrote:
> > Instead, if you just have a address and you don't know ahead of time
> > whether it refers to Flash or RAM space, you ought to hold that number
> > in an "int" (or "short" or whatever integer type is most appropriate),
> > and then convert from that integer type to either a "char *" or a
> > "char __pgm *".
> 
> That would leave standard C.  You aren't allowed to construct pointers out 
> of random integers.

C leaves integer-to-pointer conversion *implementation-defined*,
not undefined, and GCC has always chosen to implement this by
(usually) keeping the value unchanged:
http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Arrays-and-pointers-implementation.html
This works both for default and non-default address spaces.

Of course, my suggested implementation would therefore rely on
implementation-defined behaviour (but by simply using the __pgm
address space, it does so anyway).

> That would point to a third address space, call it "undef" :)  It would be 
> superset of default and pgm, conversions between undef to {default,pgm} 
> are allowed freely (and value preserving, i.e. trivial).

That would probably violate the named AS specification, since two different
entities in the undef space would share the same pointer value ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: FDO and LTO on ARM

2011-08-05 Thread Jan Hubicka
Am Fri 05 Aug 2011 09:32:05 AM CEST schrieb Richard Guenther  
:



On Thu, Aug 4, 2011 at 8:42 PM, Jan Hubicka  wrote:

Did you try using FDO with -Os?  FDO should make hot code parts
optimized similar to -O3 but leave other pieces optimized for size.
Using FDO with -O3 gives you the opposite, cold portions optimized
for size while the rest is optimized for speed.


FDO with -Os still optimize for size, even in hot parts.


I don't think so.  Or at least that would be a bug.  Shouldn't 'hot'
BBs/functions
be optimized for speed even at -Os?  Hm, I see predict.c indeed returns
always false for optimize_size :(


It was outcome of discussion held some time ago.  I think it was Mark  
promoting point that users opitmize for size when they use -Os period.


I thought we had just the neither cold or hot parts optimized according
to optimize_size. I originally wanted to have attribute HOT to  
overwrite -Os, so the well annotateed sources (i.e. kernel) could  
compile with -Os by default and explicitely declare the hot parts hot  
and get them compiled appropriately.


With profile feedback however the current logic is binary - i.e.  
blocks are either hot since their count is bigger than the threshold  
or cold. We don't really have "I don't really know" state there.  In  
some cases it would make sense - i.e. there are optimizations that we  
want to do only in the hottest parts of code, but we don't have any  
logic for that.


My plan is to extend ipa-profile to do better hot/cold partitioning  
first: at the moment we decide on fixed fraction of maximal count in  
the program. This is unnecesarily conservative for programs with not  
terribly flat profiles.  At IPA level we could collect histogram of  
counts of instructions (i.e. figure out how much time we spend on  
instructions executed N times) and then figure out where is the  
threshold so 99% of executed instructions belongs to hot region. This  
should give noticeably smaller binaries.


I thought we had just the neither cold or hot parts optimized according
to optimize_size.






 So to get resonale
speedups you need -O3+FDO.  -O3+FDO effectively defaults to -Os in cold
portions of program.


Well, but unless your training coverage is 100% all parts with no coverage
get optimized with -O3 instead of -Os.  And I bet coverage for mozilla
isn't even close to 100%.  Thus I think recommending -O3 for FDO is
usually a bad idea.


Code with no coverage is cold in our model (as is code executed once  
or so) and thus optimized for -Os even at -O3+FDO. This is bit  
aggressive on optimizing for size side. We might consider changing  
this policy, but so far I didn't see any complains on this...


Honza


So - did you try FDO with -O2? ;)


Still -Os+FDO should be somewhat faster than -Os alone, so a slowdown is
bug.  It is not very thoroughly since it is not really used in practice.


Also do you get any warnings on profile mismatches? Perhaps something
is wrong to the degree that the relevant part of profile gets
misapplied.


I don't get any warning on profile mismatches. I only get a "few"
missing gcda files warning, but that's expected.


Perhaps you could compile one of less trivial files you are sure that are
covered by train run and send me -fdump-tree-all-blocks -fdump-ipa-all dumps
of the compilation so I can double check the profile seems sane. This could
be good start to rule out something stupid.

Honza


Cheers,

Mike












gcc-python-plugin finds its first bug in itself

2011-08-05 Thread David Malcolm
gcc-python-plugin [1] now provides a gcc-with-cpychecker harness that
runs gcc with an additional pass that checks CPython API calls
(internally, it's using the gcc python plugin to run a python script
that does the work).

I tried rebuilding the plugin using
  make CC=../other-build/gcc-with-cpychecker
and it found a genuine bug in itself: within this code:

   350  PyObject *
   351  gcc_Pass_get_by_name(PyObject *cls, PyObject *args, PyObject *kwargs)
   352  {
   353  const char *name;
   354  char *keywords[] = {"name",
   355  NULL};
   356  struct opt_pass *result;
   357  
   358  if (!PyArg_ParseTupleAndKeywords(args, kwargs,
   359   "s|get_by_name", keywords,
   360   &name)) {
   361  return NULL;
   362  }
   363  [...snip...]

it found this problem:

gcc-python-pass.c: In function ‘gcc_Pass_get_by_name’:
gcc-python-pass.c:358:37: error: unknown format char in "s|get_by_name": 'g' 
[-fpermissive]

It turned out that I'd typo-ed the format code: I was erroneously using
"|" (signifying that optional args follow), when I meant to use
":" (signifying that the rest of the string is the name of the function,
for use in error messages) [2].

Fixed in git; there are a few false positives, which I'm working on
fixing now.

I'm in two minds about whether this (minor) milestone is one I should
mention in public, but I guess it's proof that having a static checker
for this kind of mistake is worthwhile :)

Dave

[1] https://fedorahosted.org/gcc-python-plugin/

[2] fwiw, the API that it's checking is here:
http://docs.python.org/c-api/arg.html



Re: FDO and LTO on ARM

2011-08-05 Thread Xinliang David Li
On Fri, Aug 5, 2011 at 7:40 AM, Jan Hubicka  wrote:
> Am Fri 05 Aug 2011 09:32:05 AM CEST schrieb Richard Guenther
> :
>
>> On Thu, Aug 4, 2011 at 8:42 PM, Jan Hubicka  wrote:
>
> Did you try using FDO with -Os?  FDO should make hot code parts
> optimized similar to -O3 but leave other pieces optimized for size.
> Using FDO with -O3 gives you the opposite, cold portions optimized
> for size while the rest is optimized for speed.
>>>
>>> FDO with -Os still optimize for size, even in hot parts.
>>
>> I don't think so.  Or at least that would be a bug.  Shouldn't 'hot'
>> BBs/functions
>> be optimized for speed even at -Os?  Hm, I see predict.c indeed returns
>> always false for optimize_size :(
>
> It was outcome of discussion held some time ago.  I think it was Mark
> promoting point that users opitmize for size when they use -Os period.
>
> I thought we had just the neither cold or hot parts optimized according
> to optimize_size. I originally wanted to have attribute HOT to overwrite
> -Os, so the well annotateed sources (i.e. kernel) could compile with -Os by
> default and explicitely declare the hot parts hot and get them compiled
> appropriately.
>
> With profile feedback however the current logic is binary - i.e. blocks are
> either hot since their count is bigger than the threshold or cold. We don't
> really have "I don't really know" state there.  In some cases it would make
> sense - i.e. there are optimizations that we want to do only in the hottest
> parts of code, but we don't have any logic for that.

For profile summary at function/cgraph_node level, there are three
states: hot, unlikely, and normal.   At BB/EDGE level, there are three
states too, but implementation  turns it into 2 states (by querying
only 'maybe_hot_bb'): hot and not hot --- instead of 'hot', 'not hot
nor cold', and 'cold'.


David
>
> My plan is to extend ipa-profile to do better hot/cold partitioning first:
> at the moment we decide on fixed fraction of maximal count in the program.
> This is unnecesarily conservative for programs with not terribly flat
> profiles.  At IPA level we could collect histogram of counts of instructions
> (i.e. figure out how much time we spend on instructions executed N times)
> and then figure out where is the threshold so 99% of executed instructions
> belongs to hot region. This should give noticeably smaller binaries.
>>
>> I thought we had just the neither cold or hot parts optimized according
>> to optimize_size.
>
>
>>
>>>  So to get resonale
>>> speedups you need -O3+FDO.  -O3+FDO effectively defaults to -Os in cold
>>> portions of program.
>>
>> Well, but unless your training coverage is 100% all parts with no coverage
>> get optimized with -O3 instead of -Os.  And I bet coverage for mozilla
>> isn't even close to 100%.  Thus I think recommending -O3 for FDO is
>> usually a bad idea.
>
> Code with no coverage is cold in our model (as is code executed once or so)
> and thus optimized for -Os even at -O3+FDO. This is bit aggressive on
> optimizing for size side. We might consider changing this policy, but so far
> I didn't see any complains on this...
>
> Honza
>>
>> So - did you try FDO with -O2? ;)
>>
>>> Still -Os+FDO should be somewhat faster than -Os alone, so a slowdown is
>>> bug.  It is not very thoroughly since it is not really used in practice.
>>>
> Also do you get any warnings on profile mismatches? Perhaps something
> is wrong to the degree that the relevant part of profile gets
> misapplied.

 I don't get any warning on profile mismatches. I only get a "few"
 missing gcda files warning, but that's expected.
>>>
>>> Perhaps you could compile one of less trivial files you are sure that are
>>> covered by train run and send me -fdump-tree-all-blocks -fdump-ipa-all
>>> dumps
>>> of the compilation so I can double check the profile seems sane. This
>>> could
>>> be good start to rule out something stupid.
>>>
>>> Honza

 Cheers,

 Mike

>>>
>>>
>>>
>>
>
>
>


Re: FDO and LTO on ARM

2011-08-05 Thread Xinliang David Li
On Fri, Aug 5, 2011 at 12:32 AM, Richard Guenther
 wrote:
> On Thu, Aug 4, 2011 at 8:42 PM, Jan Hubicka  wrote:
 Did you try using FDO with -Os?  FDO should make hot code parts
 optimized similar to -O3 but leave other pieces optimized for size.
 Using FDO with -O3 gives you the opposite, cold portions optimized
 for size while the rest is optimized for speed.
>>
>> FDO with -Os still optimize for size, even in hot parts.
>
> I don't think so.  Or at least that would be a bug.  Shouldn't 'hot'
> BBs/functions
> be optimized for speed even at -Os?  Hm, I see predict.c indeed returns
> always false for optimize_size :(

That is function level query. At the BB/EDGE level, the condition is refined:

The BB (or instruction expansion) will be optimized for size if the bb
is not 'hot'. This logic here is probably not ideal. It means that
without specifying -Os, only the hot BBs are optimized for speed -->
the 'righter' way is 'without -Os, only cold BBs are optimize for
size' -- i.e., the lukewarm bbs are also optimize for speed. This will
match the function level logic.

David


>
> I thought we had just the neither cold or hot parts optimized according
> to optimize_size.
>
>>  So to get resonale
>> speedups you need -O3+FDO.  -O3+FDO effectively defaults to -Os in cold
>> portions of program.
>
> Well, but unless your training coverage is 100% all parts with no coverage
> get optimized with -O3 instead of -Os.  And I bet coverage for mozilla
> isn't even close to 100%.  Thus I think recommending -O3 for FDO is
> usually a bad idea.
>
> So - did you try FDO with -O2? ;)
>
>> Still -Os+FDO should be somewhat faster than -Os alone, so a slowdown is
>> bug.  It is not very thoroughly since it is not really used in practice.
>>
 Also do you get any warnings on profile mismatches? Perhaps something
 is wrong to the degree that the relevant part of profile gets
 misapplied.
>>>
>>> I don't get any warning on profile mismatches. I only get a "few"
>>> missing gcda files warning, but that's expected.
>>
>> Perhaps you could compile one of less trivial files you are sure that are
>> covered by train run and send me -fdump-tree-all-blocks -fdump-ipa-all dumps
>> of the compilation so I can double check the profile seems sane. This could
>> be good start to rule out something stupid.
>>
>> Honza
>>>
>>> Cheers,
>>>
>>> Mike
>>>
>>
>>
>>
>


[RFC PATCH, i386]: Allow zero_extended addresses (+ problems with reload and offsetable address, "o" constraint)

2011-08-05 Thread Uros Bizjak
Hello!

Attached patch introduces generation of addr32 prefixed addresses,
mainly intended to merge ZERO_EXTRACTed LEA calculations into address.
 After fixing various inconsistencies with "o" constraints, the patch
works surprisingly well (in its current form fixes all reported
problems in the PR [1]), but one problem remains w.r.t. handling of
"o" constraint.

Patched gcc ICEs on gcc.dg/torture/pr47744-2.c with:

$ ~/gcc-build-fast/gcc/cc1 -O2 -mx32 -std=gnu99 -quiet pr47744-2.c
pr47744-2.c: In function ‘matmul_i16’:
pr47744-2.c:40:1: error: insn does not satisfy its constraints:
(insn 116 66 67 4 (set (reg:TI 0 ax)
(mem:TI (zero_extend:DI (plus:SI (reg:SI 4 si [orig:114
ivtmp.26 ] [114])
(reg:SI 5 di [orig:101 dest_y ] [101]))) [6
MEM[base: dest_y_18, index: ivtmp.26_53, offset: 0B]+0 S16 A128]))
pr47744-2.c:34 60 {*movti_internal_rex64}
 (nil))
pr47744-2.c:40:1: internal compiler error: in
reload_cse_simplify_operands, at postreload.c:403
Please submit a full bug report,
...

... due to the fact that the address is not offsetable, and plus
((zero_extend (...)) (const_int ...)) gets rejected from
ix86_legitimate_address_p.

However, the section "16.8.1 Simple Constraints" of the documentation claims:

--quote--
   * A nonoffsettable memory reference can be reloaded by copying the
 address into a register.  So if the constraint uses the letter
 `o', all memory references are taken care of.
--/quote--

As I read this sentence, the RTX is forced into a temporary register,
and reload tries to satisfy "o" constraint with plus ((reg ...)
(const_int ...)), as said at the introduction of "o" constraint a
couple of pages earlier. Unfortunately, this does not seem to be the
case.

Is there anything wrong with my approach, or is there something wrong in reload?

2011-08-05  Uros Bizjak  

PR target/49781
* config/i386/i386.c (ix86_decompose_address): Allow zero-extended
SImode addresses.
(ix86_print_operand_address): Handle zero-extended addresses.
(memory_address_length): Add length of addr32 prefix for
zero-extended addresses.
* config/i386/predicates.md (lea_address_operand): Reject
zero-extended operands.

Patch is otherwise bootstrapped and tested on x86_64-pc-linux-gnu
{,-m32} without regressions.

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49781

Thanks,
Uros.
Index: config/i386/predicates.md
===
--- config/i386/predicates.md   (revision 177456)
+++ config/i386/predicates.md   (working copy)
@@ -801,6 +801,10 @@
   struct ix86_address parts;
   int ok;
 
+  /*  LEA handles zero-extend by itself.  */
+  if (GET_CODE (op) == ZERO_EXTEND)
+return false;
+
   ok = ix86_decompose_address (op, &parts);
   gcc_assert (ok);
   return parts.seg == SEG_DEFAULT;
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 177456)
+++ config/i386/i386.c  (working copy)
@@ -11146,6 +11146,14 @@ ix86_decompose_address (rtx addr, struct ix86_addr
   int retval = 1;
   enum ix86_address_seg seg = SEG_DEFAULT;
 
+  /* Allow zero-extended SImode addresses,
+ they will be emitted with addr32 prefix.  */
+  if (TARGET_64BIT
+  && GET_CODE (addr) == ZERO_EXTEND
+  && GET_MODE (addr) == DImode
+  && GET_MODE (XEXP (addr, 0)) == SImode)
+addr = XEXP (addr, 0);
+ 
   if (REG_P (addr))
 base = addr;
   else if (GET_CODE (addr) == SUBREG)
@@ -14163,9 +14171,13 @@ ix86_print_operand_address (FILE *file, rtx addr)
 }
   else
 {
-  /* Print DImode registers on 64bit targets to avoid addr32 prefixes.  */
-  int code = TARGET_64BIT ? 'q' : 0;
+  int code = 0;
 
+  /* Print SImode registers for zero-extended addresses to force
+addr32 prefix.  Otherwise print DImode registers to avoid it.  */
+  if (TARGET_64BIT)
+   code = (GET_CODE (addr) == ZERO_EXTEND) ? 'l' : 'q';
+
   if (ASSEMBLER_DIALECT == ASM_ATT)
{
  if (disp)
@@ -21776,7 +21788,8 @@ assign_386_stack_local (enum machine_mode mode, en
 }
 
 /* Calculate the length of the memory address in the instruction
-   encoding.  Does not include the one-byte modrm, opcode, or prefix.  */
+   encoding.  Includes addr32 prefix, does not include the one-byte modrm,
+   opcode, or other prefixes.  */
 
 int
 memory_address_length (rtx addr)
@@ -21803,8 +21816,10 @@ memory_address_length (rtx addr)
   base = parts.base;
   index = parts.index;
   disp = parts.disp;
-  len = 0;
 
+  /* Add length of addr32 prefix.  */
+  len = (GET_CODE (addr) == ZERO_EXTEND);
+
   /* Rule of thumb:
- esp as the base always wants an index,
- ebp as the base always wants a displacement,


Re: FDO and LTO on ARM

2011-08-05 Thread Jan Hubicka
Am Fri 05 Aug 2011 07:49:49 PM CEST schrieb Xinliang David Li  
:



On Fri, Aug 5, 2011 at 12:32 AM, Richard Guenther
 wrote:

On Thu, Aug 4, 2011 at 8:42 PM, Jan Hubicka  wrote:

Did you try using FDO with -Os?  FDO should make hot code parts
optimized similar to -O3 but leave other pieces optimized for size.
Using FDO with -O3 gives you the opposite, cold portions optimized
for size while the rest is optimized for speed.


FDO with -Os still optimize for size, even in hot parts.


I don't think so.  Or at least that would be a bug.  Shouldn't 'hot'
BBs/functions
be optimized for speed even at -Os?  Hm, I see predict.c indeed returns
always false for optimize_size :(


That is function level query. At the BB/EDGE level, the condition is refined:


Well we summarize function profile to:
 1) hot
 2) normal
 3) executed once
 4) unlikely

We summarize BB profile to:
 1) maybe_hot
 2) probably_cold (equivalent to !maybe_hot)
 3) probably_never_executed

Except for executed once that is special thing for function fed by  
discovery of main() and static ctors/dtors there is 1-1 correspondence  
in between BB and function predicates.  With profile feedback function  
is hot if it contain BB that is maybe_hot (with feedback it is also  
probably hot), it is normal if it contain BB that is  
!probably_never_executed and unlikely if all BBs are  
probably_never_executed. So with profile feedback the function profile  
summaries are no more refined that BB ones.


Without profile feedback things are more messy and the names of BB  
settings was more or less invented on what static profile estimate can  
tell you. Lacking function level profile estimate, we generally  
consider functions "normal" unless told otherwise in few special cases.
We also never autodetect probably_never_executed even though it would  
make a lot of sense to do so for EH/paths to exit. As I mentioned, I  
think we should start doing so.


Finally optimize_size comes into game that is independent of the  
summaries above and it is why I added the optimize_XXX_for_size/speed  
predicates. By default -Os imply optimize for size everything and  
-O123 optimize for size everything that is maybe_hot (i.e. not quite  
reliably proven otherwise).


In a way I like the current scheme since it is simple and extending it  
should IMO have some good reason. We could refine -Os behaviour  
without changing current predicates to optimize for speed in
a) functions declared as "hot" by user and BBs in them that are not  
proved cold.
b) based on profile feedback - i.e. we could have two thresholds, BBs  
with very arge counts wil be probably hot, BBs in between will be  
maybe hot/normal and BBs with low counts will be cold.
This would probably motivate introduction of probably_hot predicate  
that summarize the above.


If we want to refine things, we could also re-consider how we want to  
behave to BBs with 0 coverage. I.e. if we want to
 a) consider them "normal" and let the presence of -Os/-O123 to  
decide whether they are size/speed optimized,

 b) consider them "cold" since they are not executed at all,
 c) consider them "cold" in functions that are otherwise covered by  
the test run and "normal" in case the function is not covered at all  
(i.e. training X server on particular set of hardware may not convince  
GCC to optimize for size all the other drivers not covered by the  
train run).


We currently implement B and it sort of work well since users usually  
train for what matters for them and are happy to see binaries smaller.


What I don't like about the a&c is bit of inconsistency with small  
counts.  I.e. count 1 will imply optimizing for size, but roundoff  
error to 0 will cause it to be optimized for speed that is weird.
Of course also flipping the default here would cause significant grown  
of FDO binaries and users are already unhappy that FDO binaries are  
too large.


Honza



Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-05 Thread DJ Delorie

Was this reproducible for m32c also?  I can test it if so...


Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-05 Thread Ulrich Weigand
DJ Delorie wrote:

> Was this reproducible for m32c also?  I can test it if so...

The patch simply passes the destination address space through
to MODE_CODE_BASE_REG_CLASS and REGNO_MODE_CODE_OK_FOR_BASE_P,
to allow targets to make register allocation decisions based
on address space.

As long as m32c doesn't implement those, just applying the
patch wouldn't change anything.  But if that capability
*would* be helpful on your target, it would certainly be
good if you could try it out ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-05 Thread DJ Delorie

Nope, I don't use those :-)


Re: FDO and LTO on ARM

2011-08-05 Thread Xinliang David Li
>
> In a way I like the current scheme since it is simple and extending it
> should IMO have some good reason. We could refine -Os behaviour without
> changing current predicates to optimize for speed in
> a) functions declared as "hot" by user and BBs in them that are not proved
> cold.
> b) based on profile feedback - i.e. we could have two thresholds, BBs with
> very arge counts wil be probably hot, BBs in between will be maybe
> hot/normal and BBs with low counts will be cold.
> This would probably motivate introduction of probably_hot predicate that
> summarize the above.

Introducing a new 'probably_hot' will be very confusing -- unless you
also rename 'maybe_hot', but this leads to finer grained control:
very_hot, hot, normal, cold, unlikely which can be hard to use.  The
three state partition (not counting exec_once) seems ok, but

1) the unlikely state does not have controllable parameter
2) hot_bb_count_fraction parameter which is used to determine
maybe_hotness is shared for all FDO related passes. It is much more
flexible (in terms of tuning) to allow each pass (such as inlining) to
define its  own thresholds.


>
> If we want to refine things, we could also re-consider how we want to behave
> to BBs with 0 coverage. I.e. if we want to
>  a) consider them "normal" and let the presence of -Os/-O123 to decide
> whether they are size/speed optimized,
>  b) consider them "cold" since they are not executed at all,
>  c) consider them "cold" in functions that are otherwise covered by the test
> run and "normal" in case the function is not covered at all (i.e. training X
> server on particular set of hardware may not convince GCC to optimize for
> size all the other drivers not covered by the train run).
>
> We currently implement B and it sort of work well since users usually train
> for what matters for them and are happy to see binaries smaller.

Yes -- we assume user will do his best to find representative training
data to avoid bad optimizations, so b) should be fine.

David


>
> What I don't like about the a&c is bit of inconsistency with small counts.
>  I.e. count 1 will imply optimizing for size, but roundoff error to 0 will
> cause it to be optimized for speed that is weird.
> Of course also flipping the default here would cause significant grown of
> FDO binaries and users are already unhappy that FDO binaries are too large.
>
> Honza
>
>


The Linux binutils 2.21.53.0.2 is released

2011-08-05 Thread H.J. Lu
This is the beta release of binutils 2.21.53.0.2 for Linux, which is
based on binutils 2011 0804 in CVS on sourceware.org plus various
changes. It is purely for Linux.

All relevant patches in patches have been applied to the source tree.
You can take a look at patches/README to see what have been applied and
in what order they have been applied.

Starting from the 2.21.51.0.3 release, you must remove .ctors/.dtors
section sentinels when building glibc or other C run-time libraries.
Otherwise, you will run into:

http://sourceware.org/bugzilla/show_bug.cgi?id=12343

Starting from the 2.21.51.0.2 release, BFD linker has the working LTO
plugin support. It can be used with GCC 4.5 and above. For GCC 4.5, you
need to configure GCC with --enable-gold to enable LTO plugin support.

Starting from the 2.21.51.0.2 release, binutils fully supports compressed
debug sections.  However, compressed debug section isn't turned on by
default in assembler. I am planning to turn it on for x86 assembler in
the future release, which may lead to the Linux kernel bug messages like

WARNING: lib/ts_kmp.o (.zdebug_aranges): unexpected non-allocatable section.

But the resulting kernel works fine.

Starting from the 2.20.51.0.4 release, no diffs against the previous
release will be provided.

You can enable both gold and bfd ld with --enable-gold=both.  Gold will
be installed as ld.gold and bfd ld will be installed as ld.bfd.  By
default, ld.bfd will be installed as ld.  You can use the configure
option, --enable-gold=both/gold to choose gold as the default linker,
ld.  IA-32 binary and X64_64 binary tar balls are configured with
--enable-gold=both/ld --enable-plugins --enable-threads.

Starting from the 2.18.50.0.4 release, the x86 assembler no longer
accepts

fnstsw %eax

fnstsw stores 16bit into %ax and the upper 16bit of %eax is unchanged.
Please use

fnstsw %ax

Starting from the 2.17.50.0.4 release, the default output section LMA
(load memory address) has changed for allocatable sections from being
equal to VMA (virtual memory address), to keeping the difference between
LMA and VMA the same as the previous output section in the same region.

For

.data.init_task : { *(.data.init_task) }

LMA of .data.init_task section is equal to its VMA with the old linker.
With the new linker, it depends on the previous output section. You
can use

.data.init_task : AT (ADDR(.data.init_task)) { *(.data.init_task) }

to ensure that LMA of .data.init_task section is always equal to its
VMA. The linker script in the older 2.6 x86-64 kernel depends on the
old behavior.  You can add AT (ADDR(section)) to force LMA of
.data.init_task section equal to its VMA. It will work with both old
and new linkers. The x86-64 kernel linker script in kernel 2.6.13 and
above is OK.

The new x86_64 assembler no longer accepts

monitor %eax,%ecx,%edx

You should use

monitor %rax,%ecx,%edx

or
monitor

which works with both old and new x86_64 assemblers. They should
generate the same opcode.

The new i386/x86_64 assemblers no longer accept instructions for moving
between a segment register and a 32bit memory location, i.e.,

movl (%eax),%ds
movl %ds,(%eax)

To generate instructions for moving between a segment register and a
16bit memory location without the 16bit operand size prefix, 0x66,

mov (%eax),%ds
mov %ds,(%eax)

should be used. It will work with both new and old assemblers. The
assembler starting from 2.16.90.0.1 will also support

movw (%eax),%ds
movw %ds,(%eax)

without the 0x66 prefix. Patches for 2.4 and 2.6 Linux kernels are
available at

http://www.kernel.org/pub/linux/devel/binutils/linux-2.4-seg-4.patch
http://www.kernel.org/pub/linux/devel/binutils/linux-2.6-seg-5.patch

The ia64 assembler is now defaulted to tune for Itanium 2 processors.
To build a kernel for Itanium 1 processors, you will need to add

ifeq ($(CONFIG_ITANIUM),y)
CFLAGS += -Wa,-mtune=itanium1
AFLAGS += -Wa,-mtune=itanium1
endif

to arch/ia64/Makefile in your kernel source tree.

Please report any bugs related to binutils 2.21.53.0.2 to
hjl.to...@gmail.com

and

http://www.sourceware.org/bugzilla/

Changes from binutils 2.21.53.0.1:

1. Update from binutils 2011 0804.
2. Add Intel K1OM support.
3. Allow R_X86_64_64 relocation for x32 and check x32 relocation overflow.
PR ld/13048.
4. Support direct call in x86-64 assembly code.  PR gas/13046.
5. Add ia32 Google Native Client support. 
6. Add .debug_macro section support.
7. Improve gold.
8. Improve VMS support.
9. Improve arm support.
10. Improve hppa support.
11. Improve mips support.
12. Improve mmix support.
13. Improve ppc support.

Changes from binutils 2.21.52.0.2:

1. Update from binutils 2011 0716.
2. Fix LTO linker bugs.  PRs 12982/12942.
3. Fix rorx support in x86 assembler/disassembler for AVX Programming
Reference (June, 2011).
4. Fix an x86-64 ELFOSABI linker regression.
5. Update ELFOSABI_GNU support.  PR 12913.

Re: FDO and LTO on ARM

2011-08-05 Thread Jan Hubicka
> >
> > In a way I like the current scheme since it is simple and extending it
> > should IMO have some good reason. We could refine -Os behaviour without
> > changing current predicates to optimize for speed in
> > a) functions declared as "hot" by user and BBs in them that are not proved
> > cold.
> > b) based on profile feedback - i.e. we could have two thresholds, BBs with
> > very arge counts wil be probably hot, BBs in between will be maybe
> > hot/normal and BBs with low counts will be cold.
> > This would probably motivate introduction of probably_hot predicate that
> > summarize the above.
> 
> Introducing a new 'probably_hot' will be very confusing -- unless you
> also rename 'maybe_hot', but this leads to finer grained control:
> very_hot, hot, normal, cold, unlikely which can be hard to use.  The
> three state partition (not counting exec_once) seems ok, but

OK, I also preffer to have fewer stages than more ;)
> 
> 1) the unlikely state does not have controllable parameter

Well, it is defined as something that is not likely to be executed, so the 
requirement
on count to be less than 1/(number_of_test_runs*2) is very natural and don't 
seem
to need to be tuned.

> 2) hot_bb_count_fraction parameter which is used to determine
> maybe_hotness is shared for all FDO related passes. It is much more
> flexible (in terms of tuning) to allow each pass (such as inlining) to
> define its  own thresholds.

Some people call towards fewer parameters, other towards more, it is always
matter of some compromise.  So before forking the notion of hotness for 
individual
passes we would need to have some good reasoning on why this is very important.
> >
> > If we want to refine things, we could also re-consider how we want to behave
> > to BBs with 0 coverage. I.e. if we want to
> >  a) consider them "normal" and let the presence of -Os/-O123 to decide
> > whether they are size/speed optimized,
> >  b) consider them "cold" since they are not executed at all,
> >  c) consider them "cold" in functions that are otherwise covered by the test
> > run and "normal" in case the function is not covered at all (i.e. training X
> > server on particular set of hardware may not convince GCC to optimize for
> > size all the other drivers not covered by the train run).
> >
> > We currently implement B and it sort of work well since users usually train
> > for what matters for them and are happy to see binaries smaller.
> 
> Yes -- we assume user will do his best to find representative training
> data to avoid bad optimizations, so b) should be fine.

I also think so, one notable exception are however the hardware drivers where 
it is inherently
hard to test all possible combinations in common use.  However I guess one 
should avoid
FDO compiling those for this reason.

Honza


gcc-4.6-20110805 is now available

2011-08-05 Thread gccadmin
Snapshot gcc-4.6-20110805 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.6-20110805/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.6 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_6-branch 
revision 177489

You'll find:

 gcc-4.6-20110805.tar.bz2 Complete GCC

  MD5=7b55daa94de9a1269d5fe5ea3bacff2f
  SHA1=c373614567b284dab7efb8b3d1b3ebcba4774b8d

Diffs from 4.6-20110729 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.6
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Building C++ with --enable-languages=c,fortran

2011-08-05 Thread Thomas Koenig

Hello world,

I just noticed that C++ now appears to be built by default, even when 
only the C and fortran are specified.  The configure line



../trunk/configure  --prefix=$HOME --enable-languages=c,fortran 
--with-mpc=/usr/local --with-mpfr=/usr/local


leads to the message

checking for version 0.11 (revision 0 or later) of PPL... no 


The following languages will be built: c,c++,fortran,lto

I see recent changes by Ian in this area, but nothing in the ChangeLog
suggests to me that this was intentional.

Any ideas?


Re: Building C++ with --enable-languages=c,fortran

2011-08-05 Thread Steve Kargl
On Sat, Aug 06, 2011 at 12:52:02AM +0200, Thomas Koenig wrote:
> Hello world,
> 
> I just noticed that C++ now appears to be built by default, even when 
> only the C and fortran are specified.  The configure line
> 
> 
> ../trunk/configure  --prefix=$HOME --enable-languages=c,fortran 
> --with-mpc=/usr/local --with-mpfr=/usr/local
> 
> leads to the message
> 
> checking for version 0.11 (revision 0 or later) of PPL... no 
> 
> The following languages will be built: c,c++,fortran,lto
> 
> I see recent changes by Ian in this area, but nothing in the ChangeLog
> suggests to me that this was intentional.
> 
> Any ideas?

It appears the original thread starts here 

http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01304.html



-- 
Steve


Please replace/augment buildstat entry

2011-08-05 Thread Hin-Tak Leung
Please replace or augment the
alphaev68-dec-osf5.1a Test results: 4.4.6,
in
http://gcc.gnu.org/gcc-4.4/buildstat.html

from
http://gcc.gnu.org/ml/gcc-testresults/2011-05/msg00074.html
to 
http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg00586.html

The reason is explained at the top of the summary:
"This replaces my previous entry on 4.4.6 three months ago ...differs only from 
that earlier submission by the libgomp section ..."


Re: Building C++ with --enable-languages=c,fortran

2011-08-05 Thread Ian Lance Taylor
Thomas Koenig  writes:

> I just noticed that C++ now appears to be built by default, even when
> only the C and fortran are specified.  The configure line
>
>
> ../trunk/configure  --prefix=$HOME --enable-languages=c,fortran
> --with-mpc=/usr/local --with-mpfr=/usr/local
>
> leads to the message
>
> checking for version 0.11 (revision 0 or later) of PPL... no 
>
> The following languages will be built: c,c++,fortran,lto
>
> I see recent changes by Ian in this area, but nothing in the ChangeLog
> suggests to me that this was intentional.

It is intentional.  In current mainline stages 2 and 3 are now by
default built with the C++ compiler, not the C compiler.  Therefore, the
C++ compiler must be built in stages 1 and 2, in order to use to build
the stages 2 and 3 compiler.  And then of course we build the C++
compiler in stage 3 in order to compare it.

The ChangeLog entry says that if --enable-build-poststage1-with-cxx is
set, C++ becomes a boot language.  That is what you are seeing.  I guess
what the ChangeLog entry does not say is that
--enable-build-poststage1-with-cxx is set by default.

Ian


Re: Building C++ with --enable-languages=c,fortran

2011-08-05 Thread Steve Kargl
On Fri, Aug 05, 2011 at 06:51:12PM -0700, Ian Lance Taylor wrote:
> Thomas Koenig  writes:
> 
> > I just noticed that C++ now appears to be built by default, even when
> > only the C and fortran are specified.  The configure line
> >
> >
> > ../trunk/configure  --prefix=$HOME --enable-languages=c,fortran
> > --with-mpc=/usr/local --with-mpfr=/usr/local
> >
> > leads to the message
> >
> > checking for version 0.11 (revision 0 or later) of PPL... no 
> >
> > The following languages will be built: c,c++,fortran,lto
> >
> > I see recent changes by Ian in this area, but nothing in the ChangeLog
> > suggests to me that this was intentional.
> 
> It is intentional.  In current mainline stages 2 and 3 are now by
> default built with the C++ compiler, not the C compiler.  Therefore, the
> C++ compiler must be built in stages 1 and 2, in order to use to build
> the stages 2 and 3 compiler.  And then of course we build the C++
> compiler in stage 3 in order to compare it.
> 
> The ChangeLog entry says that if --enable-build-poststage1-with-cxx is
> set, C++ becomes a boot language.  That is what you are seeing.  I guess
> what the ChangeLog entry does not say is that
> --enable-build-poststage1-with-cxx is set by default.
> 

What are the additional resource requirements?  Some of
us have old hardware and limited $.

-- 
Steve


Re: Building C++ with --enable-languages=c,fortran

2011-08-05 Thread Ian Lance Taylor
Steve Kargl  writes:

>> The ChangeLog entry says that if --enable-build-poststage1-with-cxx is
>> set, C++ becomes a boot language.  That is what you are seeing.  I guess
>> what the ChangeLog entry does not say is that
>> --enable-build-poststage1-with-cxx is set by default.
>> 
>
> What are the additional resource requirements?  Some of
> us have old hardware and limited $.

The main additional resource requirement is building libstdc++ in stage
1 (and stages 2 and 3 if you were previously not building the C++
compiler at all).  The C++ compiler proper is fairly small by
comparison.

At present you can use --disable-build-poststage1-with-cxx.  However, in
the future, I would like to change gcc to always build with C++.  Yes,
this will take more resources.

Ian