> This means O3 level inlining should be turned on also for lto build by
> default -- as -O2 lto performance is too unimpressive.

I am just re-tunning the inliner and hope to get more speedups for smaller
costs than we get right now.  I however don't think we can resonably enable it
as it is at LTO with -O2. We sort of declare that -O2 is the level where
compiler optimize hard without bloating code size. Automatic inlining bloats a
lot.  Enabling it at -O2 will make developers who care about code size unhappy.

Can you, please, try -O2 -fwhole-program, too?

Testing Firefox I however noticed that enabling inlining and --param
inline-unit-growth=5 gets most of speedups from inlining at very little cost of
code size (in fact code size gets smaller at firefox because of better
optimization).  This is sort of logical: when not doing LTO, limiting unit
growth at each separate comilation unit lose, since the inliner has too little
freedom (some units require a lot of unit growth to copmile well, while most of
units won't need it at all).
When doing LTO however the inliner can use the space constrain more resonably.

I am wondering what to do here - I just tried that pushing down unit growth from
30% to 15% hurts some of benchmarks (like tramp3d). I guess we will need to make
unit growth to depend on unit size somehow: at the moment we bypass unit growht
at very tiny units via large-unit-insns parameter, but this is not good enough.
For medium sized units we need growths as big as 30%, for large units we need 
5%.
I guess I can either define very-large-unit-growth and very-large-unit-insns
to jump down in growth at some point, or define the growth to be function of 
1/size.
Do we know of better alternatives?

Enabling such extensively trimmed down automatic inlining at -O2 IMO can make
sense if we can prove it makes binaries of about same size and brings
noticeable speedups.
After all, we want to make LTO selling well - most people will probably repeat
mistake you did and try it at -O2 without -fwhole-program.  The second I am 
hoping to
fight with enabling -fuse-linker-plugin by default as discussed on the summit
(that has similar effects to -fwhole-program code quality wise even if 
underlying
implementation is different).

Honza

Reply via email to