> This means O3 level inlining should be turned on also for lto build by > default -- as -O2 lto performance is too unimpressive.
I am just re-tunning the inliner and hope to get more speedups for smaller costs than we get right now. I however don't think we can resonably enable it as it is at LTO with -O2. We sort of declare that -O2 is the level where compiler optimize hard without bloating code size. Automatic inlining bloats a lot. Enabling it at -O2 will make developers who care about code size unhappy. Can you, please, try -O2 -fwhole-program, too? Testing Firefox I however noticed that enabling inlining and --param inline-unit-growth=5 gets most of speedups from inlining at very little cost of code size (in fact code size gets smaller at firefox because of better optimization). This is sort of logical: when not doing LTO, limiting unit growth at each separate comilation unit lose, since the inliner has too little freedom (some units require a lot of unit growth to copmile well, while most of units won't need it at all). When doing LTO however the inliner can use the space constrain more resonably. I am wondering what to do here - I just tried that pushing down unit growth from 30% to 15% hurts some of benchmarks (like tramp3d). I guess we will need to make unit growth to depend on unit size somehow: at the moment we bypass unit growht at very tiny units via large-unit-insns parameter, but this is not good enough. For medium sized units we need growths as big as 30%, for large units we need 5%. I guess I can either define very-large-unit-growth and very-large-unit-insns to jump down in growth at some point, or define the growth to be function of 1/size. Do we know of better alternatives? Enabling such extensively trimmed down automatic inlining at -O2 IMO can make sense if we can prove it makes binaries of about same size and brings noticeable speedups. After all, we want to make LTO selling well - most people will probably repeat mistake you did and try it at -O2 without -fwhole-program. The second I am hoping to fight with enabling -fuse-linker-plugin by default as discussed on the summit (that has similar effects to -fwhole-program code quality wise even if underlying implementation is different). Honza