Hi, this patch enables -finline-functions at -O2 and trottles it down by means of new parameters
--param max-inline-insns-auto-O2 (set to 15 instead 30) --param max-inline-insns-single-O2 (set to 30 instead 200) --param inline-min-speedup-O2 (set to 30 instead 15) Overall effect of both patches is https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00952.html The first patches just reduced early inlining so it mostly has positive effects on code size and few negative performance impats. Those are quite small as benchmarked by LNT tonight. For a record, tonight changes in SPEC scores are (this includes the patch 1 trottling down early inliner for -O2; non -O2 changes are clearly unrelated): https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&min_percentage_change=0.02&revisions=ed76597323f2005730596f3a85583691621aa616%2Cb2902434a2c7dd89c5b808ff6db4d3f0f0839603 CPP tests: https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&min_percentage_change=0.02&revisions=ed76597323f2005730596f3a85583691621aa616%2Cb2902434a2c7dd89c5b808ff6db4d3f0f0839603 So off-noise regresions are 2.8% for povray for SPEC and relatively large regressions for botan and nbench/FP EMULATION The later two should be fixed by this patch. Bootstrapped/regtested x86_64-linux. The patch triggers abi_check failure which is due to overactive checking fix. I will wait for Jonathan to commit fix for it before commiting the patch to mainline. Honza * cif-code.def (MAX_INLINE_INSNS_SINGLE_O2_LIMIT, MAX_INLINE_INSNS_AUTO_O2_LIMIT): New. * ipa-inline.c (inline_insns_single, inline_insns_auto): New functions. (can_inline_edge_by_limits_p): Use it. (big_speedup_p): Use PARAM_INLINE_MIN_SPEEDUP_O2. (want_inline_small_function_p): Use O2 bounds. (edge_badness): LIkewise. * opts.c (default_options): Add OPT_finline_functions. * params.def (PARAM_INLINE_MIN_SPEEDUP_O2, PARAM_MAX_INLINE_INSNS_SINGLE_O2, PARAM_MAX_INLINE_INSNS_AUTO_O2): New parameters. * g++.dg/tree-ssa/pr53844.C: Add -fno-inline-functions --param max-inline-insns-single-O2=200. * gcc.c-torture/execute/builtins/builtins.exp: Add -fno-inline-functions to additional_flags. * gcc.dg/ipa/inline-7.c: Add -fno-inline-functions. * gcc.dg/optimize-bswapsi-5.c: Add -fno-inline-functions. * gcc.dg/tree-ssa/ssa-thread-12.c: Add --param early-inlining-insns-O2=14 -fno-inline-functions; revert previous change. * gcc.dg/winline-3.c: Use --param max-inline-insns-single-O2=1 --param inline-min-speedup-O2=100 instead of --param max-inline-insns-single=1 --param inline-min-speedup=100 * invoke.texi (-finline-functions): Update documentation. (max-inline-insns-single-O2, max-inline-insns-auto-O2, inline-min-speedup-O2): Document. (early-inlining-insns-O2): Simplify docs. Index: cif-code.def =================================================================== --- cif-code.def (revision 276441) +++ cif-code.def (working copy) @@ -70,8 +70,12 @@ DEFCIFCODE(LARGE_STACK_FRAME_GROWTH_LIMI N_("--param large-stack-frame-growth limit reached")) DEFCIFCODE(MAX_INLINE_INSNS_SINGLE_LIMIT, CIF_FINAL_NORMAL, N_("--param max-inline-insns-single limit reached")) +DEFCIFCODE(MAX_INLINE_INSNS_SINGLE_O2_LIMIT, CIF_FINAL_NORMAL, + N_("--param max-inline-insns-single-O2 limit reached")) DEFCIFCODE(MAX_INLINE_INSNS_AUTO_LIMIT, CIF_FINAL_NORMAL, N_("--param max-inline-insns-auto limit reached")) +DEFCIFCODE(MAX_INLINE_INSNS_AUTO_O2_LIMIT, CIF_FINAL_NORMAL, + N_("--param max-inline-insns-auto-O2 limit reached")) DEFCIFCODE(INLINE_UNIT_GROWTH_LIMIT, CIF_FINAL_NORMAL, N_("--param inline-unit-growth limit reached")) Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 276441) +++ doc/invoke.texi (working copy) @@ -8346,6 +8346,7 @@ also turns on the following optimization -ffinite-loops @gol -fgcse -fgcse-lm @gol -fhoist-adjacent-loads @gol +-finline-functions @gol -finline-small-functions @gol -findirect-inlining @gol -fipa-bit-cp -fipa-cp -fipa-icf @gol @@ -8379,7 +8380,6 @@ by @option{-O2} and also turns on the fo @c Please keep the following list alphabetized! @gccoptlist{-fgcse-after-reload @gol --finline-functions @gol -fipa-cp-clone -floop-interchange @gol -floop-unroll-and-jam @gol @@ -8559,7 +8559,7 @@ If all calls to a given function are int declared @code{static}, then the function is normally not output as assembler code in its own right. -Enabled at levels @option{-O3}, @option{-Os}. Also enabled +Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}. Also enabled by @option{-fprofile-use} and @option{-fauto-profile}. @item -finline-functions-called-once @@ -11175,19 +11175,30 @@ when modulo scheduling a loop. Larger v compilation time. @item max-inline-insns-single -Several parameters control the tree inliner used in GCC@. -This number sets the maximum number of instructions (counted in GCC's -internal representation) in a single function that the tree inliner -considers for inlining. This only affects functions declared -inline and methods implemented in a class declaration (C++). +@item max-inline-insns-single-O2 +Several parameters control the tree inliner used in GCC@. This number sets the +maximum number of instructions (counted in GCC's internal representation) in a +single function that the tree inliner considers for inlining. This only +affects functions declared inline and methods implemented in a class +declaration (C++). + +For functions compiled with optimization levels +@option{-O3} and @option{-Ofast} parameter @option{max-inline-insns-single} is +applied. In other cases @option{max-inline-insns-single-O2} is applied. + @item max-inline-insns-auto +@item max-inline-insns-auto-O2 When you use @option{-finline-functions} (included in @option{-O3}), a lot of functions that would otherwise not be considered for inlining by the compiler are investigated. To those functions, a different (more restrictive) limit compared to functions declared inline can be applied. +For functions compiled with optimization levels +@option{-O3} and @option{-Ofast} parameter @option{max-inline-insns-auto} is +applied. In other cases @option{max-inline-insns-auto-O2} is applied. + @item max-inline-insns-small This is bound applied to calls which are considered relevant with @option{-finline-small-functions}. @@ -11210,11 +11221,16 @@ Same as @option{--param uninlined-functi @option{--param uninlined-function-time} but applied to function thunks @item inline-min-speedup +@item inline-min-speedup-O2 When estimated performance improvement of caller + callee runtime exceeds this threshold (in percent), the function can be inlined regardless of the limit on @option{--param max-inline-insns-single} and @option{--param max-inline-insns-auto}. +For functions compiled with optimization levels +@option{-O3} and @option{-Ofast} parameter @option{inline-min-speedup} is +applied. In other cases @option{inline-min-speedup-O2} is applied. + @item large-function-insns The limit specifying really large functions. For functions larger than this limit after inlining, inlining is constrained by @@ -11291,17 +11307,14 @@ recursion depth can be guessed from the via a given call expression. This parameter limits inlining only to call expressions whose probability exceeds the given threshold (in percents). +@item early-inlining-insns @item early-inlining-insns-O2 Specify growth that the early inliner can make. In effect it increases the amount of inlining for code having a large abstraction penalty. -This is applied to functions compiled with @option{-O1} or @option{-O2} -optimization levels. -@item early-inlining-insns -Specify growth that the early inliner can make. In effect it increases -the amount of inlining for code having a large abstraction penalty. -This is applied to functions compiled with @option{-O3} or @option{-Ofast} -optimization levels. +For functions compiled with optimization levels +@option{-O3} and @option{-Ofast} parameter @option{early-inlining-insns} is +applied. In other cases @option{early-inlining-insns-O2} is applied. @item max-early-inliner-iterations Limit of iterations of the early inliner. This basically bounds Index: ipa-inline.c =================================================================== --- ipa-inline.c (revision 276441) +++ ipa-inline.c (working copy) @@ -390,6 +390,28 @@ can_inline_edge_p (struct cgraph_edge *e return inlinable; } +/* Return inlining_insns_single limit for function N */ + +static int +inline_insns_single (cgraph_node *n) +{ + if (opt_for_fn (n->decl, optimize >= 3)) + return PARAM_VALUE (PARAM_MAX_INLINE_INSNS_SINGLE); + else + return PARAM_VALUE (PARAM_MAX_INLINE_INSNS_SINGLE_O2); +} + +/* Return inlining_insns_auto limit for function N */ + +static int +inline_insns_auto (cgraph_node *n) +{ + if (opt_for_fn (n->decl, optimize >= 3)) + return PARAM_VALUE (PARAM_MAX_INLINE_INSNS_AUTO); + else + return PARAM_VALUE (PARAM_MAX_INLINE_INSNS_AUTO_O2); +} + /* Decide if we can inline the edge and possibly update inline_failed reason. We check whether inlining is possible at all and whether @@ -532,8 +554,8 @@ can_inline_edge_by_limits_p (struct cgra int growth = estimate_edge_growth (e); if (growth > PARAM_VALUE (PARAM_MAX_INLINE_INSNS_SIZE) && (!DECL_DECLARED_INLINE_P (callee->decl) - && growth >= MAX (MAX_INLINE_INSNS_SINGLE, - MAX_INLINE_INSNS_AUTO))) + && growth >= MAX (inline_insns_single (caller), + inline_insns_auto (caller)))) { e->inline_failed = CIF_OPTIMIZATION_MISMATCH; inlinable = false; @@ -745,9 +767,14 @@ big_speedup_p (struct cgraph_edge *e) sreal spec_time = estimate_edge_time (e, &unspec_time); sreal time = compute_uninlined_call_time (e, unspec_time); sreal inlined_time = compute_inlined_call_time (e, spec_time); + cgraph_node *caller = e->caller->global.inlined_to + ? e->caller->global.inlined_to + : e->caller; + int limit = opt_for_fn (caller->decl, optimize) >= 3 + ? PARAM_VALUE (PARAM_INLINE_MIN_SPEEDUP) + : PARAM_VALUE (PARAM_INLINE_MIN_SPEEDUP_O2); - if ((time - inlined_time) * 100 - > (sreal) (time * PARAM_VALUE (PARAM_INLINE_MIN_SPEEDUP))) + if ((time - inlined_time) * 100 > time * limit) return true; return false; } @@ -781,20 +808,29 @@ want_inline_small_function_p (struct cgr && (!e->count.ipa ().initialized_p () || !e->maybe_hot_p ())) && ipa_fn_summaries->get (callee)->min_size - ipa_call_summaries->get (e)->call_stmt_size - > MAX (MAX_INLINE_INSNS_SINGLE, MAX_INLINE_INSNS_AUTO)) + > MAX (inline_insns_single (e->caller), + inline_insns_auto (e->caller))) { - e->inline_failed = CIF_MAX_INLINE_INSNS_AUTO_LIMIT; + if (opt_for_fn (e->caller->decl, optimize) >= 3) + e->inline_failed = CIF_MAX_INLINE_INSNS_AUTO_LIMIT; + else + e->inline_failed = CIF_MAX_INLINE_INSNS_AUTO_O2_LIMIT; want_inline = false; } else if ((DECL_DECLARED_INLINE_P (callee->decl) || e->count.ipa ().nonzero_p ()) && ipa_fn_summaries->get (callee)->min_size - ipa_call_summaries->get (e)->call_stmt_size - > 16 * MAX_INLINE_INSNS_SINGLE) + > 16 * inline_insns_single (e->caller)) { - e->inline_failed = (DECL_DECLARED_INLINE_P (callee->decl) - ? CIF_MAX_INLINE_INSNS_SINGLE_LIMIT - : CIF_MAX_INLINE_INSNS_AUTO_LIMIT); + if (opt_for_fn (e->caller->decl, optimize) >= 3) + e->inline_failed = (DECL_DECLARED_INLINE_P (callee->decl) + ? CIF_MAX_INLINE_INSNS_SINGLE_LIMIT + : CIF_MAX_INLINE_INSNS_AUTO_LIMIT); + else + e->inline_failed = (DECL_DECLARED_INLINE_P (callee->decl) + ? CIF_MAX_INLINE_INSNS_SINGLE_O2_LIMIT + : CIF_MAX_INLINE_INSNS_AUTO_O2_LIMIT); want_inline = false; } else @@ -808,15 +844,18 @@ want_inline_small_function_p (struct cgr /* Apply MAX_INLINE_INSNS_SINGLE limit. Do not do so when hints suggests that inlining given function is very profitable. */ else if (DECL_DECLARED_INLINE_P (callee->decl) - && growth >= MAX_INLINE_INSNS_SINGLE - && (growth >= MAX_INLINE_INSNS_SINGLE * 16 + && growth >= inline_insns_single (e->caller) + && (growth >= inline_insns_single (e->caller) * 16 || (!(hints & (INLINE_HINT_indirect_call | INLINE_HINT_known_hot | INLINE_HINT_loop_iterations | INLINE_HINT_loop_stride)) && !(big_speedup = big_speedup_p (e))))) { - e->inline_failed = CIF_MAX_INLINE_INSNS_SINGLE_LIMIT; + if (opt_for_fn (e->caller->decl, optimize) >= 3) + e->inline_failed = CIF_MAX_INLINE_INSNS_SINGLE_LIMIT; + else + e->inline_failed = CIF_MAX_INLINE_INSNS_SINGLE_O2_LIMIT; want_inline = false; } else if (!DECL_DECLARED_INLINE_P (callee->decl) @@ -824,12 +863,12 @@ want_inline_small_function_p (struct cgr && growth >= PARAM_VALUE (PARAM_MAX_INLINE_INSNS_SMALL)) { /* growth_likely_positive is expensive, always test it last. */ - if (growth >= MAX_INLINE_INSNS_SINGLE + if (growth >= inline_insns_single (e->caller) || growth_likely_positive (callee, growth)) { - e->inline_failed = CIF_NOT_DECLARED_INLINED; + e->inline_failed = CIF_NOT_DECLARED_INLINED; want_inline = false; - } + } } /* Apply MAX_INLINE_INSNS_AUTO limit for functions not declared inline Upgrade it to MAX_INLINE_INSNS_SINGLE when hints suggests that @@ -839,28 +878,28 @@ want_inline_small_function_p (struct cgr && growth >= ((hints & (INLINE_HINT_indirect_call | INLINE_HINT_loop_iterations | INLINE_HINT_loop_stride)) - ? MAX (MAX_INLINE_INSNS_AUTO, - MAX_INLINE_INSNS_SINGLE) - : MAX_INLINE_INSNS_AUTO) + ? MAX (inline_insns_auto (e->caller), + inline_insns_single (e->caller)) + : inline_insns_auto (e->caller)) && !(big_speedup == -1 ? big_speedup_p (e) : big_speedup)) { /* growth_likely_positive is expensive, always test it last. */ - if (growth >= MAX_INLINE_INSNS_SINGLE + if (growth >= inline_insns_single (e->caller) || growth_likely_positive (callee, growth)) { - e->inline_failed = CIF_MAX_INLINE_INSNS_AUTO_LIMIT; + if (opt_for_fn (e->caller->decl, optimize) >= 3) + e->inline_failed = CIF_MAX_INLINE_INSNS_AUTO_LIMIT; + else + e->inline_failed = CIF_MAX_INLINE_INSNS_AUTO_O2_LIMIT; want_inline = false; - } + } } /* If call is cold, do not inline when function body would grow. */ else if (!e->maybe_hot_p () - && (growth >= MAX_INLINE_INSNS_SINGLE + && (growth >= inline_insns_single (e->caller) || growth_likely_positive (callee, growth))) { - if (e->count.ipa () == profile_count::zero ()) - e->inline_failed = CIF_NEVER_CALL; - else - e->inline_failed = CIF_UNLIKELY_CALL; + e->inline_failed = CIF_UNLIKELY_CALL; want_inline = false; } } @@ -1166,7 +1205,7 @@ edge_badness (struct cgraph_edge *edge, && caller_info->inlinable && caller_info->size < (DECL_DECLARED_INLINE_P (caller->decl) - ? MAX_INLINE_INSNS_SINGLE : MAX_INLINE_INSNS_AUTO)) + ? inline_insns_single (caller) : inline_insns_auto (caller))) { if (dump) fprintf (dump_file, Index: opts.c =================================================================== --- opts.c (revision 276441) +++ opts.c (working copy) @@ -527,6 +527,7 @@ static const struct default_options defa { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_CHEAP }, + { OPT_LEVELS_2_PLUS, OPT_finline_functions, NULL, 1 }, /* -O2 and -Os optimizations. */ { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_falign_functions, NULL, 1 }, @@ -542,9 +543,6 @@ static const struct default_options defa #endif /* -O3 and -Os optimizations. */ - /* Inlining of functions reducing size is a good idea with -Os - regardless of them being declared inline. */ - { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 }, /* -O3 optimizations. */ { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 }, Index: params.def =================================================================== --- params.def (revision 276441) +++ params.def (working copy) @@ -51,8 +51,13 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCO DEFPARAM (PARAM_INLINE_MIN_SPEEDUP, "inline-min-speedup", + "The minimal estimated speedup allowing inliner to ignore inline-insns-single and inline-insns-auto with -O3 and -Ofast.", + 15, 0, 100) + +DEFPARAM (PARAM_INLINE_MIN_SPEEDUP_O2, + "inline-min-speedup-O2", "The minimal estimated speedup allowing inliner to ignore inline-insns-single and inline-insns-auto.", - 15, 0, 0) + 30, 0, 100) /* The single function inlining limit. This is the maximum size of a function counted in internal gcc instructions (not in @@ -67,9 +72,14 @@ DEFPARAM (PARAM_INLINE_MIN_SPEEDUP, gets decreased. */ DEFPARAM (PARAM_MAX_INLINE_INSNS_SINGLE, "max-inline-insns-single", - "The maximum number of instructions in a single function eligible for inlining.", + "The maximum number of instructions in a single function eligible for inlining with -O3 and -Ofast.", 200, 0, 0) +DEFPARAM (PARAM_MAX_INLINE_INSNS_SINGLE_O2, + "max-inline-insns-single-O2", + "The maximum number of instructions in a single function eligible for inlining.", + 30, 0, 0) + /* The single function inlining limit for functions that are inlined by virtue of -finline-functions (-O3). This limit should be chosen to be below or equal to the limit @@ -79,9 +89,14 @@ DEFPARAM (PARAM_MAX_INLINE_INSNS_SINGLE, The default value is 30. */ DEFPARAM (PARAM_MAX_INLINE_INSNS_AUTO, "max-inline-insns-auto", - "The maximum number of instructions when automatically inlining.", + "The maximum number of instructions when automatically inlining with -O3 and -Ofast.", 30, 0, 0) +DEFPARAM (PARAM_MAX_INLINE_INSNS_AUTO_O2, + "max-inline-insns-auto-O2", + "The maximum number of instructions when automatically inlining.", + 15, 0, 0) + DEFPARAM (PARAM_MAX_INLINE_INSNS_SMALL, "max-inline-insns-small", "The maximum number of instructions when automatically inlining small functions.", Index: testsuite/g++.dg/tree-ssa/pr53844.C =================================================================== --- testsuite/g++.dg/tree-ssa/pr53844.C (revision 276441) +++ testsuite/g++.dg/tree-ssa/pr53844.C (working copy) @@ -1,5 +1,5 @@ // { dg-do compile } -// { dg-options "-O2 -fdump-tree-optimized-vops" } +// { dg-options "-O2 -fdump-tree-optimized-vops -fno-inline-functions --param max-inline-insns-single-O2=200" } struct VBase; Index: testsuite/gcc.c-torture/execute/builtins/builtins.exp =================================================================== --- testsuite/gcc.c-torture/execute/builtins/builtins.exp (revision 276441) +++ testsuite/gcc.c-torture/execute/builtins/builtins.exp (working copy) @@ -37,7 +37,7 @@ load_lib c-torture.exp torture-init set-torture-options $C_TORTURE_OPTIONS {{}} $LTO_TORTURE_OPTIONS -set additional_flags "-fno-tree-dse -fno-tree-loop-distribute-patterns -fno-tracer -fno-ipa-ra" +set additional_flags "-fno-tree-dse -fno-tree-loop-distribute-patterns -fno-tracer -fno-ipa-ra -fno-inline-functions" if [istarget "powerpc-*-darwin*"] { lappend additional_flags "-Wl,-multiply_defined,suppress" } Index: testsuite/gcc.dg/ipa/inline-7.c =================================================================== --- testsuite/gcc.dg/ipa/inline-7.c (revision 276441) +++ testsuite/gcc.dg/ipa/inline-7.c (working copy) @@ -1,6 +1,6 @@ /* Check that early inliner works out that a is empty of parameter 0. */ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-einline-optimized -fopt-info-inline -fno-partial-inlining" } */ +/* { dg-options "-O2 -fdump-tree-einline-optimized -fopt-info-inline -fno-partial-inlining -fno-inline-functions" } */ void t(void); int a (int b) { Index: testsuite/gcc.dg/optimize-bswapsi-5.c =================================================================== --- testsuite/gcc.dg/optimize-bswapsi-5.c (revision 276441) +++ testsuite/gcc.dg/optimize-bswapsi-5.c (working copy) @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target bswap } */ -/* { dg-options "-O2 -fdump-tree-bswap" } */ +/* { dg-options "-O2 -fdump-tree-bswap -fno-inline-functions" } */ /* { dg-additional-options "-march=z900" { target s390-*-* } } */ struct L { unsigned int l[2]; }; Index: testsuite/gcc.dg/tree-ssa/ssa-thread-12.c =================================================================== --- testsuite/gcc.dg/tree-ssa/ssa-thread-12.c (revision 276441) +++ testsuite/gcc.dg/tree-ssa/ssa-thread-12.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread2-details -fdump-tree-thread3-details -fdump-tree-thread4-details -fno-finite-loops" } */ +/* { dg-options "-O2 -fdump-tree-thread2-details -fdump-tree-thread3-details -fdump-tree-thread4-details -fno-finite-loops --param early-inlining-insns-O2=14 -fno-inline-functions" } */ /* { dg-final { scan-tree-dump "FSM" "thread2" } } */ /* { dg-final { scan-tree-dump "FSM" "thread3" } } */ /* { dg-final { scan-tree-dump "FSM" "thread4" { xfail *-*-* } } } */ @@ -56,7 +56,7 @@ bmp_iter_and_compl (bitmap_iterator * bi } extern int VEC_int_base_length (VEC_int_base *); -static __inline__ bitmap +bitmap compute_idf (bitmap def_blocks, bitmap_head * dfs) { bitmap_iterator bi; Index: testsuite/gcc.dg/winline-3.c =================================================================== --- testsuite/gcc.dg/winline-3.c (revision 276441) +++ testsuite/gcc.dg/winline-3.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-Winline -O2 --param max-inline-insns-single=1 --param inline-min-speedup=100 -fgnu89-inline" } */ +/* { dg-options "-Winline -O2 --param max-inline-insns-single-O2=1 --param inline-min-speedup-O2=100 -fgnu89-inline" } */ void big (void); inline int q(void) /* { dg-warning "max-inline-insns-single" } */