Hi,

this is an updated patch based on our conversation on IRC today.  So far
I have had a look at the effects on only tramp3d and although it makes
the heuristics more pessimistic more times than optimistic (number of
clones at -Ofast drops from 559 to 557), there are also lattices which
are massively boosted.

When looking at the testcase of PR 97816 I realized that the reason
why we were hitting overflows in size growth estimates in IPA-CP is
not because the chains of how lattices feed values to each other are
so long but mainly because we add estimates in callee lattices to
caller lattices for each value source, which roughly corresponds to a
call graph edge, and therefore if there are multiple calls between two
functions passing the same value in a parameter we end up doing it
more than once, sometimes actually quite many times.

This patch avoids it by using a has_set to remember the source values
we have already updated and not increasing their size again.
Furhtermore, to improve estimation of times we scale the propagated
time benefits with edge frequencies as we accumulate them.

This should make any overflows very unlikely but not impossible, so I
still included checks for overflows but decided to restructure the
code to only need it in the propagate_effects function and modified it
so that it does not need to perform the check before each sum.

This is because I decided to add local estimates to propagated
estimates already in propagate_effects and not at the evaluation time.
The function can then do the sums in a wide type and discard them in
the unlikely case of an overflow.  I also decided to use the
opportunity to make propagated effect stats now include stats from
other values in the same SCCs.  In the dumps I have seen this tended
to increase size cost a tiny bit more than the estimated time benefit
but both increases were small.

Bootstrapped and LTO bootstrapped on x86_64-linux.  OK for trunk?

Thanks,

Martin


gcc/ChangeLog:

2020-11-20  Martin Jambor  <mjam...@suse.cz>

        PR ipa/97816
        * ipa-cp.c (safe_add): Removed.
        (good_cloning_opportunity_p): Remove special handling of INT_MAX.
        (value_topo_info<valtype>::propagate_effects): Take care not to
        propagate from size one value to another through more sources.  Scale
        propagated times with edge frequencies.  Include local time and size
        in propagates ones here.  Take care not to overflow size.
        (decide_about_value): Do not add local and propagated effects when
        passing them to good_cloning_opportunity_p.
---
 gcc/ipa-cp.c | 68 +++++++++++++++++++++++++---------------------------
 1 file changed, 32 insertions(+), 36 deletions(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index c3ee71e16e1..863b4f7d228 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3264,13 +3264,6 @@ good_cloning_opportunity_p (struct cgraph_node *node, 
sreal time_benefit,
     return false;
 
   gcc_assert (size_cost > 0);
-  if (size_cost == INT_MAX)
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-       fprintf (dump_file, "     good_cloning_opportunity_p returning "
-                "false because of size overflow.\n");
-      return false;
-    }
 
   class ipa_node_params *info = IPA_NODE_REF (node);
   int eval_threshold = opt_for_fn (node->decl, param_ipa_cp_eval_threshold);
@@ -3840,20 +3833,6 @@ propagate_constants_topo (class ipa_topo_info *topo)
     }
 }
 
-
-/* Return the sum of A and B if none of them is bigger than INT_MAX/2, return
-   INT_MAX.  */
-
-static int
-safe_add (int a, int b)
-{
-  if (a > INT_MAX/2 || b > INT_MAX/2)
-    return INT_MAX;
-  else
-    return a + b;
-}
-
-
 /* Propagate the estimated effects of individual values along the topological
    from the dependent values to those they depend on.  */
 
@@ -3862,30 +3841,51 @@ void
 value_topo_info<valtype>::propagate_effects ()
 {
   ipcp_value<valtype> *base;
+  hash_set<ipcp_value<valtype> *> processed_srcvals;
 
   for (base = values_topo; base; base = base->topo_next)
     {
       ipcp_value_source<valtype> *src;
       ipcp_value<valtype> *val;
       sreal time = 0;
-      int size = 0;
+      HOST_WIDE_INT size = 0;
 
       for (val = base; val; val = val->scc_next)
        {
          time = time + val->local_time_benefit + val->prop_time_benefit;
-         size = safe_add (size, safe_add (val->local_size_cost,
-                                          val->prop_size_cost));
+         size = size + val->local_size_cost + val->prop_size_cost;
        }
 
       for (val = base; val; val = val->scc_next)
-       for (src = val->sources; src; src = src->next)
-         if (src->val
-             && src->cs->maybe_hot_p ())
+       {
+         processed_srcvals.empty ();
+         for (src = val->sources; src; src = src->next)
+           if (src->val
+               && src->cs->maybe_hot_p ())
+             {
+               if (!processed_srcvals.add (src->val))
+                 {
+                   HOST_WIDE_INT prop_size = size + src->val->prop_size_cost;
+                   if (prop_size < INT_MAX)
+                     src->val->prop_size_cost = prop_size;
+                   else
+                     continue;
+                 }
+               src->val->prop_time_benefit
+                 += time * src->cs->sreal_frequency ();
+             }
+
+         if (size < INT_MAX)
            {
-             src->val->prop_time_benefit = time + src->val->prop_time_benefit;
-             src->val->prop_size_cost = safe_add (size,
-                                                  src->val->prop_size_cost);
+             val->prop_time_benefit = time;
+             val->prop_size_cost = size;
            }
+         else
+           {
+             val->prop_time_benefit = 0;
+             val->prop_size_cost = 0;
+           }
+       }
     }
 }
 
@@ -5500,12 +5500,8 @@ decide_about_value (struct cgraph_node *node, int index, 
HOST_WIDE_INT offset,
   if (!good_cloning_opportunity_p (node, val->local_time_benefit,
                                   freq_sum, count_sum,
                                   val->local_size_cost)
-      && !good_cloning_opportunity_p (node,
-                                     val->local_time_benefit
-                                     + val->prop_time_benefit,
-                                     freq_sum, count_sum,
-                                     safe_add (val->local_size_cost,
-                                               val->prop_size_cost)))
+      && !good_cloning_opportunity_p (node, val->prop_time_benefit,
+                                     freq_sum, count_sum, val->prop_size_cost))
     return false;
 
   if (dump_file)
-- 
2.28.0

Reply via email to