On Mon, Nov 25, 2019 at 1:44 PM Graham Markall
<[email protected]> wrote:
>
> Hi Richard,
>
> Many thanks for the suggestion of an alternative implementation. I tried
> implementing the suggestion, and I had a couple of observations:
>
> 1. As well as applying the bias in compute_fn_summary, it seemed to also
> be necessary to apply it in ip_update_overall_fn_summary to avoid an ICE
> resulting from size_info->size and size_info->self_size differing at the
> end of compute_fn_summary.
>
> 2. The results that it produced were exactly the same as those using the
> param uninlined-function-insns, so it would probably be redundant to add
> the additional parameter implemented this way.
>
> The implementation I tried is at:
> https://github.com/gmarkall/gcc/commit/1d22f65b8a392cffc235ecb49143a93d1720b91c
>
> When I benchmarked the code size again using Embench with both the
> inline-growth-bias param implemented this way, and the
> uninlined-function-insns param the results were, in both cases:
>
> Benchmark 0 1 2 3 4
> --------- ------ ------ ------ ------ ------
> aha-mont64 99.05 99.05 99.05 99.05 99.05
> crc32 100.00 100.00 100.00 100.00 100.00
> cubic 94.66 94.66 94.66 94.66 94.66
> edn 96.14 96.14 96.14 96.14 96.14
> huffbench 99.88 99.88 99.88 99.88 99.88
> matmult-int 100.00 100.00 100.00 100.00 100.00
> minver 100.56 100.56 100.56 100.56 100.56
> nbody 101.13 101.13 101.13 101.13 101.13
> nettle-aes 99.03 99.03 99.03 99.03 99.03
> nettle-sha256 99.89 99.89 99.89 99.89 99.89
> nsichneu 100.00 100.00 100.00 100.00 100.00
> picojpeg 99.35 99.35 100.10 100.10 100.10
> qrduino 101.81 101.81 101.81 101.81 101.81
> sglib-combined 100.00 100.00 100.00 100.00 100.00
> slre 99.09 99.42 99.42 100.49 100.49
> st 96.36 96.36 96.36 96.36 96.36
> statemate 100.22 100.22 100.22 100.22 100.22
> ud 100.00 100.00 100.00 100.00 100.00
> wikisort 99.48 99.48 99.48 99.48 99.48
> Mean 99.28 99.30 99.34 99.40 99.40
>
> These results show that:
>
> 1. Setting the uninlined-function-insns value to 0 rather than 2
> generally seems to yield an improvement in code size on RISC-V, without
> making any individual benchmark larger than the current default of 2.
>
> 2. There are two cases where the patch in the original version of the
> patch (v1) makes a slightly greater improvement than using
> uninlined-function-insns or the patch linked above (ufi). These are:
>
> Benchmark ufi v1
> --------- --- --
> qrduino 101.81 101.61
> sglib-combined 100.00 95.87
>
> (note that the v1 values are slightly different to those posted in the
> original patch - I rebased the original patch on a more recent version
> of the master branch and the values changed slightly.)
>
> Additionally, for a proprietary application that we tested the flags
> with, the code sizes were for using neither flag (Original) for
> using uninlined-function-insns, and using the original patch, with the
> parameter values that resulted in the smallest code size, were:
>
> Original ufi (val=0) v1 (val=-2)
> 57380 57184 56756
> (100.0%) (99.66%) (98.91%)
>
>
> Patch v2
> --------
>
> In conclusion, it seems that in some cases there is some additional code
> size saving that can be gained in specific cases using the original
> implementation of the patch. I have tidied up the original version of
> the patch and adjusted the option so that it is a param instead of a
> flag as suggested by both Jeff and yourself. I chose a default param
> value of 5 to allow for a reasonable scope for setting negative values -
> although I haven't seen any result where setting the value beneath an
> effective setting of -2 yielded an improvement, this does allow a little
> more room in case there are cases where going below -2 would help.
>
> Is there sufficient data to indicate that the additional parameter as
> implemented in the attached patch is worth adding?
There needs to be more explanation on what this really achieves
and as I said IIRC we should already have a --param that accounts
for this, --param uninlined-function-insns which maybe only misses
the ability to become negative.
Richard.
>
> Many thanks,
> Graham.
>
>
>
> ---
> gcc/ipa-inline.h | 21 ++++++++++++++++++++-
> gcc/params.def | 5 +++++
> 2 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/ipa-inline.h b/gcc/ipa-inline.h
> index 18c8e1eebd0..fb10677ea8f 100644
> --- a/gcc/ipa-inline.h
> +++ b/gcc/ipa-inline.h
> @@ -21,6 +21,8 @@ along with GCC; see the file COPYING3. If not see
> #ifndef GCC_IPA_INLINE_H
> #define GCC_IPA_INLINE_H
>
> +#include "params.h"
> +
> /* Data we cache about callgraph edges during inlining to avoid expensive
> re-computations during the greedy algorithm. */
> class edge_growth_cache_entry
> @@ -84,7 +86,24 @@ estimate_edge_growth (struct cgraph_edge *edge)
> {
> ipa_call_summary *s = ipa_call_summaries->get (edge);
> gcc_checking_assert (s->call_stmt_size || !edge->callee->analyzed);
> - return (estimate_edge_size (edge) - s->call_stmt_size);
> +
> + int growth = (estimate_edge_size (edge) - s->call_stmt_size);
> +
> + /* Bias function growth according to the bias parameter. The default is
> + parameter value is 5 to allow for slight negative biases, so we
> subtract 5
> + to allow an effective default value of 0. */
> + growth += PARAM_VALUE (PARAM_INLINE_GROWTH_BIAS) - 5;
> +
> + /* Function size cannot be negative, so if the growth is negative to the
> point
> + that it will reduce function size below 0, then we cap the growth such
> that
> + it makes the function size exactly zero. */
> + struct cgraph_node *caller = edge->caller;
> + ipa_size_summary *fs = ipa_size_summaries->get (caller);
> + if (fs->size + growth < 0) {
> + growth = -fs->size;
> + }
> +
> + return growth;
> }
>
> /* Return estimated callee runtime increase after inlining
> diff --git a/gcc/params.def b/gcc/params.def
> index 7928f6f071e..4274d8f36e0 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -112,6 +112,11 @@ DEFPARAM (PARAM_INLINE_HEURISTICS_HINT_PERCENT_O2,
> "The scale (in percents) applied to inline-insns-single and auto
> limits when heuristics hints that inlining is very profitable.",
> 200, 100, 1000000)
>
> +DEFPARAM (PARAM_INLINE_GROWTH_BIAS,
> + "inline-growth-bias",
> + "Bias of inlining growth overhead (default 5 is equal to no bias).",
> + 5, 0, 1000000)
> +
> DEFPARAM (PARAM_MAX_INLINE_INSNS_SIZE,
> "max-inline-insns-size",
> "The maximum number of instructions when inlining for size.",
> --
> 2.21.0
>