> Recursive call graph edges, even when they are hot and important for
> the compiled program, can never have frequency bigger than one, even
> when the actual time savings in the next recursion call are not
> realized just once but depend on the depth of recursion. The current
> IPA-CP effect propagation code did not take that into account and just
> used the frequency, thus severely underestimating the effect.
>
> This patch artificially boosts values taking part in such calls. If a
> value feeds into itself through a recursive call, the frequency of the
> edge is multiplied by a parameter with default value of 6, basically
> assuming that the recursion will take place 6 times. This value can
> of course be subject to change.
>
> Moreover, values which do not feed into themselves but which were
> generated for a self-recursive call with an arithmetic
> pass-function (aka the 548.exchange "hack" which however is generally
> applicable for recursive functions which count the recursion depth in
> a parameter) have the edge frequency multiplied as many times as there
> are generated values in the chain. In essence, we will assume they
> are all useful.
>
> This patch partially fixes the current situation when we fail to
> optimize 548.exchange with PGO. In the benchmark one recursive edge
> count overwhelmingly dominates all other counts in the program and so
> we fail to perform the first cloning (for the nonrecursive entry call)
> because it looks totally insignificant.
>
> gcc/ChangeLog:
>
> 2021-07-16 Martin Jambor <[email protected]>
>
> * params.opt (ipa-cp-recursive-freq-factor): New.
> * ipa-cp.c (ipcp_value): Switch to inline initialization. New members
> scc_no, self_recursion_generated_level, same_scc and
> self_recursion_generated_p.
> (ipcp_lattice::add_value): Replaced parameter unlimited with
> same_lat_gen_level, usit it determine limit of values and store it to
> the value.
> (ipcp_lattice<valtype>::print): Dump the new fileds.
> (allocate_and_init_ipcp_value): Take same_lat_gen_level as a new
> parameter and store it to the new value.
> (self_recursively_generated_p): Removed.
> (propagate_vals_across_arith_jfunc): Use self_recursion_generated_p
> instead of self_recursively_generated_p, store self generation level
> to such values.
> (value_topo_info<valtype>::add_val): Set scc_no.
> (value_topo_info<valtype>::propagate_effects): Multiply frequencies of
> recursively feeding values and self generated values by appropriate
> new factors.
If you boost every self fed value by factor of 6, I wonder how quickly
we run into exponential explosion of the cost (since the frequency
should be close to 1 and 6^9=10077696....
I think it would be more robust to simply assume that the job will
distribute evenly across the clones. How hard is to implement that?
Honza
> ---
> gcc/ipa-cp.c | 161 ++++++++++++++++++++++++-------------------------
> gcc/params.opt | 4 ++
> 2 files changed, 84 insertions(+), 81 deletions(-)
>
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 55b9216337f..b987d975793 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -184,30 +184,52 @@ public:
> /* The actual value for the given parameter. */
> valtype value;
> /* The list of sources from which this value originates. */
> - ipcp_value_source <valtype> *sources;
> + ipcp_value_source <valtype> *sources = nullptr;
> /* Next pointers in a linked list of all values in a lattice. */
> - ipcp_value *next;
> + ipcp_value *next = nullptr;
> /* Next pointers in a linked list of values in a strongly connected
> component
> of values. */
> - ipcp_value *scc_next;
> + ipcp_value *scc_next = nullptr;
> /* Next pointers in a linked list of SCCs of values sorted topologically
> according their sources. */
> - ipcp_value *topo_next;
> + ipcp_value *topo_next = nullptr;
> /* A specialized node created for this value, NULL if none has been (so
> far)
> created. */
> - cgraph_node *spec_node;
> + cgraph_node *spec_node = nullptr;
> /* Depth first search number and low link for topological sorting of
> values. */
> - int dfs, low_link;
> + int dfs = 0;
> + int low_link = 0;
> + /* SCC number to identify values which recursively feed into each other.
> + Values in the same SCC have the same SCC number. */
> + int scc_no = 0;
> + /* Non zero if the value is generated from another value in the same
> lattice
> + for a self-recursive call, the actual number is how many times the
> + operation has been performed. In the unlikely event of the value being
> + present in two chains fo self-recursive value generation chains, it is
> the
> + maximum. */
> + unsigned self_recursion_generated_level = 0;
> /* True if this value is currently on the topo-sort stack. */
> - bool on_stack;
> -
> - ipcp_value()
> - : sources (0), next (0), scc_next (0), topo_next (0),
> - spec_node (0), dfs (0), low_link (0), on_stack (false) {}
> + bool on_stack = false;
>
> void add_source (cgraph_edge *cs, ipcp_value *src_val, int src_idx,
> HOST_WIDE_INT offset);
> +
> + /* Return true if both THIS value and O feed into each other. */
> +
> + bool same_scc (const ipcp_value<valtype> *o)
> + {
> + return o->scc_no == scc_no;
> + }
> +
> +/* Return true, if a this value has been generated for a self-recursive call
> as
> + a result of an arithmetic pass-through jump-function acting on a value in
> + the same lattice function. */
> +
> + bool self_recursion_generated_p ()
> + {
> + return self_recursion_generated_level > 0;
> + }
> };
>
> /* Lattice describing potential values of a formal parameter of a function,
> or
> @@ -239,7 +261,7 @@ public:
> ipcp_value<valtype> *src_val = NULL,
> int src_idx = 0, HOST_WIDE_INT offset = -1,
> ipcp_value<valtype> **val_p = NULL,
> - bool unlimited = false);
> + unsigned same_lat_gen_level = 0);
> void print (FILE * f, bool dump_sources, bool dump_benefits);
> };
>
> @@ -498,7 +520,11 @@ ipcp_lattice<valtype>::print (FILE * f, bool
> dump_sources, bool dump_benefits)
> {
> ipcp_value_source<valtype> *s;
>
> - fprintf (f, " [from:");
> + if (val->self_recursion_generated_p ())
> + fprintf (f, " [self_gen(%i), from:",
> + val->self_recursion_generated_level);
> + else
> + fprintf (f, " [scc: %i, from:", val->scc_no);
> for (s = val->sources; s; s = s->next)
> fprintf (f, " %i(%f)", s->cs->caller->order,
> s->cs->sreal_frequency ().to_double ());
> @@ -1837,12 +1863,13 @@ ipcp_value<valtype>::add_source (cgraph_edge *cs,
> ipcp_value *src_val,
> SOURCE and clear all other fields. */
>
> static ipcp_value<tree> *
> -allocate_and_init_ipcp_value (tree source)
> +allocate_and_init_ipcp_value (tree cst, unsigned same_lat_gen_level)
> {
> ipcp_value<tree> *val;
>
> val = new (ipcp_cst_values_pool.allocate ()) ipcp_value<tree>();
> - val->value = source;
> + val->value = cst;
> + val->self_recursion_generated_level = same_lat_gen_level;
> return val;
> }
>
> @@ -1850,14 +1877,15 @@ allocate_and_init_ipcp_value (tree source)
> value to SOURCE and clear all other fields. */
>
> static ipcp_value<ipa_polymorphic_call_context> *
> -allocate_and_init_ipcp_value (ipa_polymorphic_call_context source)
> +allocate_and_init_ipcp_value (ipa_polymorphic_call_context ctx,
> + unsigned same_lat_gen_level)
> {
> ipcp_value<ipa_polymorphic_call_context> *val;
>
> - // TODO
> val = new (ipcp_poly_ctx_values_pool.allocate ())
> ipcp_value<ipa_polymorphic_call_context>();
> - val->value = source;
> + val->value = ctx;
> + val->self_recursion_generated_level = same_lat_gen_level;
> return val;
> }
>
> @@ -1865,8 +1893,12 @@ allocate_and_init_ipcp_value
> (ipa_polymorphic_call_context source)
> SRC_VAL SRC_INDEX and OFFSET are meant for add_source and have the same
> meaning. OFFSET -1 means the source is scalar and not a part of an
> aggregate. If non-NULL, VAL_P records address of existing or newly added
> - ipcp_value. UNLIMITED means whether value count should not exceed the
> limit
> - given by PARAM_IPA_CP_VALUE_LIST_SIZE. */
> + ipcp_value.
> +
> + If the value is generated for a self-recursive call as a result of an
> + arithmetic pass-through jump-function acting on a value in the same
> lattice,
> + SAME_LAT_GEN_LEVEL must be the length of such chain, otherwise it must be
> + zero. If it is non-zero, PARAM_IPA_CP_VALUE_LIST_SIZE limit is ignored.
> */
>
> template <typename valtype>
> bool
> @@ -1874,7 +1906,7 @@ ipcp_lattice<valtype>::add_value (valtype newval,
> cgraph_edge *cs,
> ipcp_value<valtype> *src_val,
> int src_idx, HOST_WIDE_INT offset,
> ipcp_value<valtype> **val_p,
> - bool unlimited)
> + unsigned same_lat_gen_level)
> {
> ipcp_value<valtype> *val, *last_val = NULL;
>
> @@ -1890,6 +1922,9 @@ ipcp_lattice<valtype>::add_value (valtype newval,
> cgraph_edge *cs,
> if (val_p)
> *val_p = val;
>
> + if (val->self_recursion_generated_level < same_lat_gen_level)
> + val->self_recursion_generated_level = same_lat_gen_level;
> +
> if (ipa_edge_within_scc (cs))
> {
> ipcp_value_source<valtype> *s;
> @@ -1904,7 +1939,7 @@ ipcp_lattice<valtype>::add_value (valtype newval,
> cgraph_edge *cs,
> return false;
> }
>
> - if (!unlimited && values_count == opt_for_fn (cs->caller->decl,
> + if (!same_lat_gen_level && values_count == opt_for_fn (cs->caller->decl,
> param_ipa_cp_value_list_size))
> {
> /* We can only free sources, not the values themselves, because sources
> @@ -1923,7 +1958,7 @@ ipcp_lattice<valtype>::add_value (valtype newval,
> cgraph_edge *cs,
> }
>
> values_count++;
> - val = allocate_and_init_ipcp_value (newval);
> + val = allocate_and_init_ipcp_value (newval, same_lat_gen_level);
> val->add_source (cs, src_val, src_idx, offset);
> val->next = NULL;
>
> @@ -1940,60 +1975,6 @@ ipcp_lattice<valtype>::add_value (valtype newval,
> cgraph_edge *cs,
> return true;
> }
>
> -/* Return true, if a ipcp_value VAL is orginated from parameter value of
> - self-feeding recursive function via some kind of pass-through jump
> - function. */
> -
> -static bool
> -self_recursively_generated_p (ipcp_value<tree> *val)
> -{
> - class ipa_node_params *info = NULL;
> -
> - for (ipcp_value_source<tree> *src = val->sources; src; src = src->next)
> - {
> - cgraph_edge *cs = src->cs;
> -
> - if (!src->val || cs->caller != cs->callee->function_symbol ())
> - return false;
> -
> - if (src->val == val)
> - continue;
> -
> - if (!info)
> - info = ipa_node_params_sum->get (cs->caller);
> -
> - class ipcp_param_lattices *plats = ipa_get_parm_lattices (info,
> - src->index);
> - ipcp_lattice<tree> *src_lat;
> - ipcp_value<tree> *src_val;
> -
> - if (src->offset == -1)
> - src_lat = &plats->itself;
> - else
> - {
> - struct ipcp_agg_lattice *src_aglat;
> -
> - for (src_aglat = plats->aggs; src_aglat; src_aglat = src_aglat->next)
> - if (src_aglat->offset == src->offset)
> - break;
> -
> - if (!src_aglat)
> - return false;
> -
> - src_lat = src_aglat;
> - }
> -
> - for (src_val = src_lat->values; src_val; src_val = src_val->next)
> - if (src_val == val)
> - break;
> -
> - if (!src_val)
> - return false;
> - }
> -
> - return true;
> -}
> -
> /* A helper function that returns result of operation specified by OPCODE on
> the value of SRC_VAL. If non-NULL, OPND1_TYPE is expected type for the
> value of SRC_VAL. If the operation is binary, OPND2 is a constant value
> @@ -2068,7 +2049,7 @@ propagate_vals_across_arith_jfunc (cgraph_edge *cs,
> source, this is absolutely conservative, but could avoid explosion
> of lattice's value space, especially when one recursive function
> calls another recursive. */
> - if (self_recursively_generated_p (src_val))
> + if (src_val->self_recursion_generated_p ())
> {
> ipcp_value_source<tree> *s;
>
> @@ -2096,7 +2077,7 @@ propagate_vals_across_arith_jfunc (cgraph_edge *cs,
> break;
>
> ret |= dest_lat->add_value (cstval, cs, src_val, src_idx,
> - src_offset, &src_val, true);
> + src_offset, &src_val, j);
> gcc_checking_assert (src_val);
> }
> }
> @@ -2108,7 +2089,7 @@ propagate_vals_across_arith_jfunc (cgraph_edge *cs,
> /* Now we do not use self-recursively generated value as propagation
> source, otherwise it is easy to make value space of normal lattice
> overflow. */
> - if (self_recursively_generated_p (src_val))
> + if (src_val->self_recursion_generated_p ())
> {
> ret |= dest_lat->set_contains_variable ();
> continue;
> @@ -3732,6 +3713,7 @@ value_topo_info<valtype>::add_val (ipcp_value<valtype>
> *cur_val)
> v = stack;
> stack = v->topo_next;
> v->on_stack = false;
> + v->scc_no = cur_val->dfs;
>
> v->scc_next = scc_list;
> scc_list = v;
> @@ -3905,8 +3887,25 @@ value_topo_info<valtype>::propagate_effects ()
> else
> continue;
> }
> +
> + int special_factor = 1;
> + if (val->same_scc (src->val))
> + special_factor
> + = opt_for_fn(src->cs->caller->decl,
> + param_ipa_cp_recursive_freq_factor);
> + else if (val->self_recursion_generated_p ()
> + && (src->cs->callee->function_symbol ()
> + == src->cs->caller))
> + {
> + int max_recur_gen_depth
> + = opt_for_fn(src->cs->caller->decl,
> + param_ipa_cp_max_recursive_depth);
> + special_factor = max_recur_gen_depth
> + - val->self_recursion_generated_level + 1;
> + }
> +
> src->val->prop_time_benefit
> - += time * src->cs->sreal_frequency ();
> + += time * special_factor * src->cs->sreal_frequency ();
> }
>
> if (size < INT_MAX)
> diff --git a/gcc/params.opt b/gcc/params.opt
> index 92b003e38cb..8d772309407 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -266,6 +266,10 @@ Maximum depth of recursive cloning for self-recursive
> function.
> Common Joined UInteger Var(param_ipa_cp_min_recursive_probability) Init(2)
> Param Optimization
> Recursive cloning only when the probability of call being executed exceeds
> the parameter.
>
> +-param=ipa-cp-recursive-freq-factor=
> +Common Joined UInteger Var(param_ipa_cp_recursive_freq_factor) Init(6) Param
> Optimization
> +When propagating IPA-CP effect estimates, multiply frequencies of recursive
> edges that that bring back an unchanged value by this factor.
> +
> -param=ipa-cp-recursion-penalty=
> Common Joined UInteger Var(param_ipa_cp_recursion_penalty) Init(40)
> IntegerRange(0, 100) Param Optimization
> Percentage penalty the recursive functions will receive when they are
> evaluated for cloning.
> --
> 2.32.0
>