Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

Xinliang David Li Mon, 15 Nov 2010 16:41:44 -0800

On Mon, Nov 15, 2010 at 4:25 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
>> This means O3 level inlining should be turned on also for lto build by
>> default -- as -O2 lto performance is too unimpressive.
>
> I am just re-tunning the inliner and hope to get more speedups for smaller
> costs than we get right now.  I however don't think we can resonably enable it
> as it is at LTO with -O2. We sort of declare that -O2 is the level where
> compiler optimize hard without bloating code size. Automatic inlining bloats a
> lot.  Enabling it at -O2 will make developers who care about code size 
> unhappy.


Looks like you want to brand LTO as a size optimization technology
more than performance one :) -- is that the right promotion for lto?
If more people care about performance, then the default should be
tuned toward it. For size optimization, use -Os -flto.

>
> Can you, please, try -O2 -fwhole-program, too?

Too many experiments -- but sure, I can do it.

>
> Testing Firefox I however noticed that enabling inlining and --param
> inline-unit-growth=5 gets most of speedups from inlining at very little cost 
> of
> code size (in fact code size gets smaller at firefox because of better
> optimization).  This is sort of logical: when not doing LTO, limiting unit
> growth at each separate comilation unit lose, since the inliner has too little
> freedom (some units require a lot of unit growth to copmile well, while most 
> of
> units won't need it at all).

Yes, that is what I call adaptive budget -- better with profiling.

> When doing LTO however the inliner can use the space constrain more resonably.
>

yes -- global decision can be made.

> I am wondering what to do here - I just tried that pushing down unit growth 
> from
> 30% to 15% hurts some of benchmarks (like tramp3d). I guess we will need to 
> make
> unit growth to depend on unit size somehow:

yes.

>at the moment we bypass unit growht
> at very tiny units via large-unit-insns parameter, but this is not good 
> enough.
> For medium sized units we need growths as big as 30%, for large units we need 
> 5%.
> I guess I can either define very-large-unit-growth and very-large-unit-insns
> to jump down in growth at some point, or define the growth to be function of 
> 1/size.
> Do we know of better alternatives?
>

Mark can provide some suggestions -- he has many inliner patches
related to performance/size trade off.

> Enabling such extensively trimmed down automatic inlining at -O2 IMO can make
> sense if we can prove it makes binaries of about same size and brings
> noticeable speedups.
> After all, we want to make LTO selling well - most people will probably repeat
> mistake you did and try it at -O2 without -fwhole-program.  The second I am 
> hoping to
> fight with enabling -fuse-linker-plugin by default as discussed on the summit
> (that has similar effects to -fwhole-program code quality wise even if 
> underlying
> implementation is different).
>

I don't think that is a mistake --  a large percent of people will
likely not (be able to) use -fwhole-program for various reasons -- for
instance shared library build, partially available source, option
limitations etc. It is therefore more (at least equally) important to
sell lto without -fwhole-program.

Thanks,

David

> Honza
>

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

Reply via email to