-O2 inliner retuning 3/n: make inline hints more systematic

Jan Hubicka Thu, 03 Oct 2019 08:29:50 -0700

Hi,
the previous patch (which enabled -finline-functions with limited
parameters at -O2) had bit unexpected effect on trashing tramp3d and dlv
performance which as not caught by LNT testing but is clearly visible in
legacy testers. I did know about the tramp3d problems but since LNT was
happy I tought it magically went away.


The issue is that the patch reduces max-inline-insns-single which in
turn leads to less inlining at -O2 for functions declared inline. 
The motivation for reducing this parameters is the fact that we pay
quite substantial code size/compile time costs for it.

However since the regression is quite large and not limited to single
testcase, I want to collect more data and revisit this decssion.
One reason why the flag is important is the fact that it controls not
only functions delcared inline but also those function where heuristics
says inlining them will be cool (for example, because loop bound become
constant to enable inlining/vectorization).

Here is how tramp3d performance behaves with different
max-inline-insns-single-O2 values

param   size   time
=======================
 10     396850 0m7.736s
 20     475810 0m6.048s
 30     490194 0m7.320s (current default)
 40     491314 0m7.204s
 50     477442 0m5.448s
 60     482274 0m6.152s
 70     478226 0m5.736s
 80     473026 0m4.584s
 90     478466 0m4.508s
100     475746 0m3.036s
110     482866 0m3.008s
120     481842 0m2.596s (min needed to get full speed)
130     481474 0m2.604s
140     481714 0m2.656s
150     482962 0m2.656s
160     484290 0m2.612s
170     483826 0m2.652s
180     477442 0m2.640s
190     478242 0m2.640s
200     478242 0m2.640s (old default)

Even bigger values are needed when -fprofile-generate is used since
profiling instrumentation adds into the estimated time/size quite a lot.
It is a nature of tramp3d benchmark that it needs a lot of inlining
happen for things to optimize well and in this it is bit on the extreme
(as shown by no significant regressions in spec nor firefox benchmarks)

Clearly to get the performance back I would need to go to 120 which is
quite high. For -O2 we probably want to take the perofrmance/code size
ratios into account more carefully so I plan to collect more data on
this parameter alone.

In this patch I however untangle it from the hinting mechanism.

Inline has two limits.
- max-inline-insns-single (currently set to 30 for -O2 and 200 for -O3)
  for functions delcared inline
- max-inline-insns-auto (currently set to 15 for -O2 and 30 for -O3)
  for other functions

Now when hints happens on inline function, inliner bumps the limit up 16
times (3200 instructions; basically infinity) and for non-inline it
swithces from auto to single limit (6 times)

It seems twe want to take those auto-generated inline hints less
agressively at -O2 and bumping auto to single makes it hard to get
consistent results. For this reason I added new parameter that sets
the scale applied currently set to 200% for O2 and 1600% otherwise.

Bootstrapped/regtested x86_64-linux.

-O2 inliner retuning 3/n: make inline hints more systematic

Reply via email to