On Mon, Nov 25, 2019 at 1:44 PM Graham Markall
<graham.mark...@embecosm.com> wrote:
>
> Hi Richard,
>
> Many thanks for the suggestion of an alternative implementation. I tried
> implementing the suggestion, and I had a couple of observations:
>
> 1. As well as applying the bias in compute_fn_summary, it seemed to also
> be necessary to apply it in ip_update_overall_fn_summary to avoid an ICE
> resulting from size_info->size and size_info->self_size differing at the
> end of compute_fn_summary.
>
> 2. The results that it produced were exactly the same as those using the
> param uninlined-function-insns, so it would probably be redundant to add
> the additional parameter implemented this way.
>
> The implementation I tried is at:
> https://github.com/gmarkall/gcc/commit/1d22f65b8a392cffc235ecb49143a93d1720b91c
>
> When I benchmarked the code size again using Embench with both the
> inline-growth-bias param implemented this way, and the
> uninlined-function-insns param the results were, in both cases:
>
> Benchmark       0       1       2       3       4
> ---------       ------  ------  ------  ------  ------
> aha-mont64       99.05   99.05   99.05   99.05   99.05
> crc32           100.00  100.00  100.00  100.00  100.00
> cubic            94.66   94.66   94.66   94.66   94.66
> edn              96.14   96.14   96.14   96.14   96.14
> huffbench        99.88   99.88   99.88   99.88   99.88
> matmult-int     100.00  100.00  100.00  100.00  100.00
> minver          100.56  100.56  100.56  100.56  100.56
> nbody           101.13  101.13  101.13  101.13  101.13
> nettle-aes       99.03   99.03   99.03   99.03   99.03
> nettle-sha256    99.89   99.89   99.89   99.89   99.89
> nsichneu        100.00  100.00  100.00  100.00  100.00
> picojpeg         99.35   99.35  100.10  100.10  100.10
> qrduino         101.81  101.81  101.81  101.81  101.81
> sglib-combined  100.00  100.00  100.00  100.00  100.00
> slre             99.09   99.42   99.42  100.49  100.49
> st               96.36   96.36   96.36   96.36   96.36
> statemate       100.22  100.22  100.22  100.22  100.22
> ud              100.00  100.00  100.00  100.00  100.00
> wikisort         99.48   99.48   99.48   99.48   99.48
> Mean             99.28   99.30   99.34   99.40   99.40
>
> These results show that:
>
> 1. Setting the uninlined-function-insns value to 0 rather than 2
> generally seems to yield an improvement in code size on RISC-V, without
> making any individual benchmark larger than the current default of 2.
>
> 2. There are two cases where the patch in the original version of the
> patch (v1) makes a slightly greater improvement than using
> uninlined-function-insns or the patch linked above (ufi). These are:
>
> Benchmark       ufi      v1
> ---------       ---      --
> qrduino         101.81   101.61
> sglib-combined  100.00    95.87
>
> (note that the v1 values are slightly different to those posted in the
> original patch - I rebased the original patch on a more recent version
> of the master branch and the values changed slightly.)
>
> Additionally, for a proprietary application that we tested the flags
> with, the code sizes were for using neither flag (Original) for
> using uninlined-function-insns, and using the original patch, with the
> parameter values that resulted in the smallest code size, were:
>
> Original   ufi (val=0)  v1 (val=-2)
> 57380      57184        56756
> (100.0%)   (99.66%)     (98.91%)
>
>
> Patch v2
> --------
>
> In conclusion, it seems that in some cases there is some additional code
> size saving that can be gained in specific cases using the original
> implementation of the patch. I have tidied up the original version of
> the patch and adjusted the option so that it is a param instead of a
> flag as suggested by both Jeff and yourself. I chose a default param
> value of 5 to allow for a reasonable scope for setting negative values -
> although I haven't seen any result where setting the value beneath an
> effective setting of -2 yielded an improvement, this does allow a little
> more room in case there are cases where going below -2 would help.
>
> Is there sufficient data to indicate that the additional parameter as
> implemented in the attached patch is worth adding?

There needs to be more explanation on what this really achieves
and as I said IIRC we should already have a --param that accounts
for this, --param uninlined-function-insns which maybe only misses
the ability to become negative.

Richard.

>
> Many thanks,
> Graham.
>
>
>
> ---
>  gcc/ipa-inline.h | 21 ++++++++++++++++++++-
>  gcc/params.def   |  5 +++++
>  2 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/ipa-inline.h b/gcc/ipa-inline.h
> index 18c8e1eebd0..fb10677ea8f 100644
> --- a/gcc/ipa-inline.h
> +++ b/gcc/ipa-inline.h
> @@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
>  #ifndef GCC_IPA_INLINE_H
>  #define GCC_IPA_INLINE_H
>
> +#include "params.h"
> +
>  /* Data we cache about callgraph edges during inlining to avoid expensive
>     re-computations during the greedy algorithm.  */
>  class edge_growth_cache_entry
> @@ -84,7 +86,24 @@ estimate_edge_growth (struct cgraph_edge *edge)
>  {
>    ipa_call_summary *s = ipa_call_summaries->get (edge);
>    gcc_checking_assert (s->call_stmt_size || !edge->callee->analyzed);
> -  return (estimate_edge_size (edge) - s->call_stmt_size);
> +
> +  int growth = (estimate_edge_size (edge) - s->call_stmt_size);
> +
> +  /* Bias function growth according to the bias parameter. The default is
> +     parameter value is 5 to allow for slight negative biases, so we 
> subtract 5
> +     to allow an effective default value of 0.  */
> +  growth += PARAM_VALUE (PARAM_INLINE_GROWTH_BIAS) - 5;
> +
> +  /* Function size cannot be negative, so if the growth is negative to the 
> point
> +     that it will reduce function size below 0, then we cap the growth such 
> that
> +     it makes the function size exactly zero.  */
> +  struct cgraph_node *caller = edge->caller;
> +  ipa_size_summary *fs = ipa_size_summaries->get (caller);
> +  if (fs->size + growth < 0) {
> +    growth = -fs->size;
> +  }
> +
> +  return growth;
>  }
>
>  /* Return estimated callee runtime increase after inlining
> diff --git a/gcc/params.def b/gcc/params.def
> index 7928f6f071e..4274d8f36e0 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -112,6 +112,11 @@ DEFPARAM (PARAM_INLINE_HEURISTICS_HINT_PERCENT_O2,
>           "The scale (in percents) applied to inline-insns-single and auto 
> limits when heuristics hints that inlining is very profitable.",
>           200, 100, 1000000)
>
> +DEFPARAM (PARAM_INLINE_GROWTH_BIAS,
> +         "inline-growth-bias",
> +         "Bias of inlining growth overhead (default 5 is equal to no bias).",
> +         5, 0, 1000000)
> +
>  DEFPARAM (PARAM_MAX_INLINE_INSNS_SIZE,
>           "max-inline-insns-size",
>           "The maximum number of instructions when inlining for size.",
> --
> 2.21.0
>

Reply via email to