On Mon, Nov 25, 2019 at 1:44 PM Graham Markall <graham.mark...@embecosm.com> wrote: > > Hi Richard, > > Many thanks for the suggestion of an alternative implementation. I tried > implementing the suggestion, and I had a couple of observations: > > 1. As well as applying the bias in compute_fn_summary, it seemed to also > be necessary to apply it in ip_update_overall_fn_summary to avoid an ICE > resulting from size_info->size and size_info->self_size differing at the > end of compute_fn_summary. > > 2. The results that it produced were exactly the same as those using the > param uninlined-function-insns, so it would probably be redundant to add > the additional parameter implemented this way. > > The implementation I tried is at: > https://github.com/gmarkall/gcc/commit/1d22f65b8a392cffc235ecb49143a93d1720b91c > > When I benchmarked the code size again using Embench with both the > inline-growth-bias param implemented this way, and the > uninlined-function-insns param the results were, in both cases: > > Benchmark 0 1 2 3 4 > --------- ------ ------ ------ ------ ------ > aha-mont64 99.05 99.05 99.05 99.05 99.05 > crc32 100.00 100.00 100.00 100.00 100.00 > cubic 94.66 94.66 94.66 94.66 94.66 > edn 96.14 96.14 96.14 96.14 96.14 > huffbench 99.88 99.88 99.88 99.88 99.88 > matmult-int 100.00 100.00 100.00 100.00 100.00 > minver 100.56 100.56 100.56 100.56 100.56 > nbody 101.13 101.13 101.13 101.13 101.13 > nettle-aes 99.03 99.03 99.03 99.03 99.03 > nettle-sha256 99.89 99.89 99.89 99.89 99.89 > nsichneu 100.00 100.00 100.00 100.00 100.00 > picojpeg 99.35 99.35 100.10 100.10 100.10 > qrduino 101.81 101.81 101.81 101.81 101.81 > sglib-combined 100.00 100.00 100.00 100.00 100.00 > slre 99.09 99.42 99.42 100.49 100.49 > st 96.36 96.36 96.36 96.36 96.36 > statemate 100.22 100.22 100.22 100.22 100.22 > ud 100.00 100.00 100.00 100.00 100.00 > wikisort 99.48 99.48 99.48 99.48 99.48 > Mean 99.28 99.30 99.34 99.40 99.40 > > These results show that: > > 1. Setting the uninlined-function-insns value to 0 rather than 2 > generally seems to yield an improvement in code size on RISC-V, without > making any individual benchmark larger than the current default of 2. > > 2. There are two cases where the patch in the original version of the > patch (v1) makes a slightly greater improvement than using > uninlined-function-insns or the patch linked above (ufi). These are: > > Benchmark ufi v1 > --------- --- -- > qrduino 101.81 101.61 > sglib-combined 100.00 95.87 > > (note that the v1 values are slightly different to those posted in the > original patch - I rebased the original patch on a more recent version > of the master branch and the values changed slightly.) > > Additionally, for a proprietary application that we tested the flags > with, the code sizes were for using neither flag (Original) for > using uninlined-function-insns, and using the original patch, with the > parameter values that resulted in the smallest code size, were: > > Original ufi (val=0) v1 (val=-2) > 57380 57184 56756 > (100.0%) (99.66%) (98.91%) > > > Patch v2 > -------- > > In conclusion, it seems that in some cases there is some additional code > size saving that can be gained in specific cases using the original > implementation of the patch. I have tidied up the original version of > the patch and adjusted the option so that it is a param instead of a > flag as suggested by both Jeff and yourself. I chose a default param > value of 5 to allow for a reasonable scope for setting negative values - > although I haven't seen any result where setting the value beneath an > effective setting of -2 yielded an improvement, this does allow a little > more room in case there are cases where going below -2 would help. > > Is there sufficient data to indicate that the additional parameter as > implemented in the attached patch is worth adding?
There needs to be more explanation on what this really achieves and as I said IIRC we should already have a --param that accounts for this, --param uninlined-function-insns which maybe only misses the ability to become negative. Richard. > > Many thanks, > Graham. > > > > --- > gcc/ipa-inline.h | 21 ++++++++++++++++++++- > gcc/params.def | 5 +++++ > 2 files changed, 25 insertions(+), 1 deletion(-) > > diff --git a/gcc/ipa-inline.h b/gcc/ipa-inline.h > index 18c8e1eebd0..fb10677ea8f 100644 > --- a/gcc/ipa-inline.h > +++ b/gcc/ipa-inline.h > @@ -21,6 +21,8 @@ along with GCC; see the file COPYING3. If not see > #ifndef GCC_IPA_INLINE_H > #define GCC_IPA_INLINE_H > > +#include "params.h" > + > /* Data we cache about callgraph edges during inlining to avoid expensive > re-computations during the greedy algorithm. */ > class edge_growth_cache_entry > @@ -84,7 +86,24 @@ estimate_edge_growth (struct cgraph_edge *edge) > { > ipa_call_summary *s = ipa_call_summaries->get (edge); > gcc_checking_assert (s->call_stmt_size || !edge->callee->analyzed); > - return (estimate_edge_size (edge) - s->call_stmt_size); > + > + int growth = (estimate_edge_size (edge) - s->call_stmt_size); > + > + /* Bias function growth according to the bias parameter. The default is > + parameter value is 5 to allow for slight negative biases, so we > subtract 5 > + to allow an effective default value of 0. */ > + growth += PARAM_VALUE (PARAM_INLINE_GROWTH_BIAS) - 5; > + > + /* Function size cannot be negative, so if the growth is negative to the > point > + that it will reduce function size below 0, then we cap the growth such > that > + it makes the function size exactly zero. */ > + struct cgraph_node *caller = edge->caller; > + ipa_size_summary *fs = ipa_size_summaries->get (caller); > + if (fs->size + growth < 0) { > + growth = -fs->size; > + } > + > + return growth; > } > > /* Return estimated callee runtime increase after inlining > diff --git a/gcc/params.def b/gcc/params.def > index 7928f6f071e..4274d8f36e0 100644 > --- a/gcc/params.def > +++ b/gcc/params.def > @@ -112,6 +112,11 @@ DEFPARAM (PARAM_INLINE_HEURISTICS_HINT_PERCENT_O2, > "The scale (in percents) applied to inline-insns-single and auto > limits when heuristics hints that inlining is very profitable.", > 200, 100, 1000000) > > +DEFPARAM (PARAM_INLINE_GROWTH_BIAS, > + "inline-growth-bias", > + "Bias of inlining growth overhead (default 5 is equal to no bias).", > + 5, 0, 1000000) > + > DEFPARAM (PARAM_MAX_INLINE_INSNS_SIZE, > "max-inline-insns-size", > "The maximum number of instructions when inlining for size.", > -- > 2.21.0 >