Hi, the previous patch (which enabled -finline-functions with limited parameters at -O2) had bit unexpected effect on trashing tramp3d and dlv performance which as not caught by LNT testing but is clearly visible in legacy testers. I did know about the tramp3d problems but since LNT was happy I tought it magically went away.
The issue is that the patch reduces max-inline-insns-single which in turn leads to less inlining at -O2 for functions declared inline. The motivation for reducing this parameters is the fact that we pay quite substantial code size/compile time costs for it. However since the regression is quite large and not limited to single testcase, I want to collect more data and revisit this decssion. One reason why the flag is important is the fact that it controls not only functions delcared inline but also those function where heuristics says inlining them will be cool (for example, because loop bound become constant to enable inlining/vectorization). Here is how tramp3d performance behaves with different max-inline-insns-single-O2 values param size time ======================= 10 396850 0m7.736s 20 475810 0m6.048s 30 490194 0m7.320s (current default) 40 491314 0m7.204s 50 477442 0m5.448s 60 482274 0m6.152s 70 478226 0m5.736s 80 473026 0m4.584s 90 478466 0m4.508s 100 475746 0m3.036s 110 482866 0m3.008s 120 481842 0m2.596s (min needed to get full speed) 130 481474 0m2.604s 140 481714 0m2.656s 150 482962 0m2.656s 160 484290 0m2.612s 170 483826 0m2.652s 180 477442 0m2.640s 190 478242 0m2.640s 200 478242 0m2.640s (old default) Even bigger values are needed when -fprofile-generate is used since profiling instrumentation adds into the estimated time/size quite a lot. It is a nature of tramp3d benchmark that it needs a lot of inlining happen for things to optimize well and in this it is bit on the extreme (as shown by no significant regressions in spec nor firefox benchmarks) Clearly to get the performance back I would need to go to 120 which is quite high. For -O2 we probably want to take the perofrmance/code size ratios into account more carefully so I plan to collect more data on this parameter alone. In this patch I however untangle it from the hinting mechanism. Inline has two limits. - max-inline-insns-single (currently set to 30 for -O2 and 200 for -O3) for functions delcared inline - max-inline-insns-auto (currently set to 15 for -O2 and 30 for -O3) for other functions Now when hints happens on inline function, inliner bumps the limit up 16 times (3200 instructions; basically infinity) and for non-inline it swithces from auto to single limit (6 times) It seems twe want to take those auto-generated inline hints less agressively at -O2 and bumping auto to single makes it hard to get consistent results. For this reason I added new parameter that sets the scale applied currently set to 200% for O2 and 1600% otherwise. Bootstrapped/regtested x86_64-linux.