On Mon, Nov 15, 2010 at 4:25 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >> This means O3 level inlining should be turned on also for lto build by >> default -- as -O2 lto performance is too unimpressive. > > I am just re-tunning the inliner and hope to get more speedups for smaller > costs than we get right now. I however don't think we can resonably enable it > as it is at LTO with -O2. We sort of declare that -O2 is the level where > compiler optimize hard without bloating code size. Automatic inlining bloats a > lot. Enabling it at -O2 will make developers who care about code size > unhappy.
Looks like you want to brand LTO as a size optimization technology more than performance one :) -- is that the right promotion for lto? If more people care about performance, then the default should be tuned toward it. For size optimization, use -Os -flto. > > Can you, please, try -O2 -fwhole-program, too? Too many experiments -- but sure, I can do it. > > Testing Firefox I however noticed that enabling inlining and --param > inline-unit-growth=5 gets most of speedups from inlining at very little cost > of > code size (in fact code size gets smaller at firefox because of better > optimization). This is sort of logical: when not doing LTO, limiting unit > growth at each separate comilation unit lose, since the inliner has too little > freedom (some units require a lot of unit growth to copmile well, while most > of > units won't need it at all). Yes, that is what I call adaptive budget -- better with profiling. > When doing LTO however the inliner can use the space constrain more resonably. > yes -- global decision can be made. > I am wondering what to do here - I just tried that pushing down unit growth > from > 30% to 15% hurts some of benchmarks (like tramp3d). I guess we will need to > make > unit growth to depend on unit size somehow: yes. >at the moment we bypass unit growht > at very tiny units via large-unit-insns parameter, but this is not good > enough. > For medium sized units we need growths as big as 30%, for large units we need > 5%. > I guess I can either define very-large-unit-growth and very-large-unit-insns > to jump down in growth at some point, or define the growth to be function of > 1/size. > Do we know of better alternatives? > Mark can provide some suggestions -- he has many inliner patches related to performance/size trade off. > Enabling such extensively trimmed down automatic inlining at -O2 IMO can make > sense if we can prove it makes binaries of about same size and brings > noticeable speedups. > After all, we want to make LTO selling well - most people will probably repeat > mistake you did and try it at -O2 without -fwhole-program. The second I am > hoping to > fight with enabling -fuse-linker-plugin by default as discussed on the summit > (that has similar effects to -fwhole-program code quality wise even if > underlying > implementation is different). > I don't think that is a mistake -- a large percent of people will likely not (be able to) use -fwhole-program for various reasons -- for instance shared library build, partially available source, option limitations etc. It is therefore more (at least equally) important to sell lto without -fwhole-program. Thanks, David > Honza >