On Fri, 9 Feb 2018, Jan Hubicka wrote: > Hi, > this patch addresses the code size regression by reducing > max-inline-insns-auto 40->30 and increasing inline-min-speedup 8->15. > > The main reason why we need retuning is following > > - inline-min-speedup works in a way that if expected runtime > of caller+calee combo after inlining reduces by more than 8% > the inliner is going to bypass inline-insns-auto (because it knows the > inlining is benefical rather than just inlining in hope it will be). > The decrease either happens because callee is very lightweight at > average or because we can track it will optimize well. > > During GCC 8 development I have switched time estimates from int to sreal. > Original estimates was capping time to about 1000 instructions and thus > large function rarely saw speedup because that was based comparing caped > numbers. With sreals we can now track benefits better > > - We made quite some progress on early optimizations making function > bodies to appear smaller to inliner which in turn inlines more of them. > This is reason why we want to decrease inline-min-speedup to gain some code > size back. > > The code size estimate difference at beggining of inlning is about 6% to > gcc 6 and about 12% to gcc 4.9. > > I have benchmarked patch on Haswell SPEC2000, SPEC2006, polyhedron and our C++ > benchmarks. Here I found no off-noise changes on SPEC2000/2006. I know that > reducing inline-insns-auto to 10 still produces no regressions and even > improves facerec 6600->8000 but that seems bit of effect of good luck (it also > depends on setting of branch predictor weights and needs to be analyzed > independently). min-speedup can be increased to 30 without measurable effects > as well. > > On C++ benchmark suite I know that cray degrades with min-speedup set to 30 > (it > needs value of 22). Also there is degradation with profile-generate on > tramp3d. > > So overall I believe that for Haswell the reduction of inline limits is doing > very consistent code size improvement without perofrmance tradeoffs. > > I also tested Itanium and here things are slightly more sensitive. The > reduction of limits affects gzip 337->332 (-1.5%), vpr 1000->980 (-2%), crafty > (925->935) (+2%) and vortex (1165->1180) (+1%). So overall it is specint2000 > neutral. Reducing inline-isns-auto to 10 brings off noise overall degradation > by -1% and 20 is in-between. > > specfp2000 reacts positively by improving applu 520->525 (+1%) and mgrid > 391->397 (+1.3%). It would let me to reduct inline-isns-auto to 10 without > any other regressions. > > C++ benchmarks does not show any off-noise changes. > > I have also did some limited testing on ppc and arm. They reacted more > similarly > to Haswell also showing no important changes for reducing the inlining limits. > > Now reducing inline limits triggers failure of testsuite/g++.dg/pr83239.C > which tests that inlining happens. The reason why it does not happen is > becuae ipa-fnsplit is trying to second guess if inliner will evnetually > consider > function for inlining and the test is out of date. I decided to hack around > it for stage4 and will try to clean these things up next stage1. > > Bootstraped/regtested x86_64-linux. I know it is late in stage4, but would it > be OK to for GCC 8?
Ok. Richard. > PR middle-end/83665 > * params.def (inline-min-speedup): Increase from 8 to 15. > (max-inline-insns-auto): Decrease from 40 to 30. > * ipa-split.c (consider_split): Add some buffer for function to > be considered inlining candidate. > * invoke.texi (max-inline-insns-auto, inline-min-speedup): UPdate > default values. > Index: params.def > =================================================================== > --- params.def (revision 257520) > +++ params.def (working copy) > @@ -52,13 +52,13 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCO > DEFPARAM (PARAM_INLINE_MIN_SPEEDUP, > "inline-min-speedup", > "The minimal estimated speedup allowing inliner to ignore > inline-insns-single and inline-insns-auto.", > - 8, 0, 0) > + 15, 0, 0) > > /* The single function inlining limit. This is the maximum size > of a function counted in internal gcc instructions (not in > real machine instructions) that is eligible for inlining > by the tree inliner. > - The default value is 450. > + The default value is 400. > Only functions marked inline (or methods defined in the class > definition for C++) are affected by this. > There are more restrictions to inlining: If inlined functions > @@ -77,11 +77,11 @@ DEFPARAM (PARAM_MAX_INLINE_INSNS_SINGLE, > that is applied to functions marked inlined (or defined in the > class declaration in C++) given by the "max-inline-insns-single" > parameter. > - The default value is 40. */ > + The default value is 30. */ > DEFPARAM (PARAM_MAX_INLINE_INSNS_AUTO, > "max-inline-insns-auto", > "The maximum number of instructions when automatically inlining.", > - 40, 0, 0) > + 30, 0, 0) > > DEFPARAM (PARAM_MAX_INLINE_INSNS_RECURSIVE, > "max-inline-insns-recursive", > Index: ipa-split.c > =================================================================== > --- ipa-split.c (revision 257520) > +++ ipa-split.c (working copy) > @@ -558,10 +558,13 @@ consider_split (struct split_point *curr > " Refused: split size is smaller than call overhead\n"); > return; > } > + /* FIXME: The logic here is not very precise, because inliner does use > + inline predicates to reduce function body size. We add 10 to anticipate > + that. Next stage1 we should try to be more meaningful here. */ > if (current->header_size + call_overhead > >= (unsigned int)(DECL_DECLARED_INLINE_P (current_function_decl) > ? MAX_INLINE_INSNS_SINGLE > - : MAX_INLINE_INSNS_AUTO)) > + : MAX_INLINE_INSNS_AUTO) + 10) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, > @@ -574,7 +577,7 @@ consider_split (struct split_point *curr > Limit this duplication. This is consistent with limit in tree-sra.c > FIXME: with LTO we ought to be able to do better! */ > if (DECL_ONE_ONLY (current_function_decl) > - && current->split_size >= (unsigned int) MAX_INLINE_INSNS_AUTO) > + && current->split_size >= (unsigned int) MAX_INLINE_INSNS_AUTO + 10) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, > Index: doc/invoke.texi > =================================================================== > --- doc/invoke.texi (revision 257520) > +++ doc/invoke.texi (working copy) > @@ -10131,13 +10131,14 @@ a lot of functions that would otherwise > by the compiler are investigated. To those functions, a different > (more restrictive) limit compared to functions declared inline can > be applied. > -The default value is 40. > +The default value is 30. > > @item inline-min-speedup > When estimated performance improvement of caller + callee runtime exceeds > this > threshold (in percent), the function can be inlined regardless of the limit > on > @option{--param max-inline-insns-single} and @option{--param > max-inline-insns-auto}. > +The default value is 15. > > @item large-function-insns > The limit specifying really large functions. For functions larger than this > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)