On Fri, Jul 12, 2024 at 12:09 PM Ajit Agarwal <aagar...@linux.ibm.com> wrote:
>
> Hello Richard:
>
> On 11/07/24 2:21 pm, Richard Biener wrote:
> > On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal <aagar...@linux.ibm.com> 
> > wrote:
> >>
> >> Hello All:
> >>
> >> Unroll factor is determined with max distance across loop iterations.
> >> The logic for determining the loop unroll factor is based on
> >> data dependency across loop iterations.
> >>
> >> The max distance across loop iterations is the unrolling factor
> >> that helps in predictive commoning.
> >
> > The old comment in the code says
> >
> >> -      /* The best unroll factor for this chain is equal to the number of
> >> -        temporary variables that we create for it.  */
> >
> > why is that wrong and why is the max dependence distance more correct?
> >
> > Do you have a testcase that shows how this makes a (positive) difference?
> >
>
> There is nothing wrong in the existing implementation of unroll
> factor for predictive commoning.
>
> But with max dependence distance we get performance improvement
> with spec 2017 benchmarks (INT) of 0.01% (Geomean) with and without
> changes. Improvement in benchmarks with max dependence distance
> changes.
>
> I have used the following flags:
> -O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning 
> -fno-tree-pre
>
> With above flags I ran with and without changes.

A 0.01% geomean improvement is noise.  Why did you disable PRE?

> There is no degradation with spec 2017 (FP benchmarks).
>
> Because in predictive commoning we reuse values computed in
> earlier iterations of a loop in the later ones, max distance is the
> better choice.

The re-use distance is the same though.  So your change merely increases
the unroll factor?  Or can you explain why there is more re-use with
your change.

Richard.

> > Richard.
> >
>
> Thanks & Regards
> Ajit
>
> >> Bootstrapped and regtested on powerpc64-linux-gnu.
> >>
> >> Thanks & Regards
> >> Ajit
> >>
> >> tree-optimization, predcom: Improve unroll factor for predictive commoning
> >>
> >> Unroll factor is determined with max distance across loop iterations.
> >> The logic for determining the loop unroll factor is based on
> >> data dependency across loop iterations.
> >>
> >> The max distance across loop iterations is the unrolling factor
> >> that helps in predictive commoning.
> >>
> >> 2024-07-11  Ajit Kumar Agarwal  <aagar...@linux.ibm.com>
> >>
> >> gcc/ChangeLog:
> >>
> >>         * tree-predcom.cc: Change in determining unroll factor with
> >>         data dependence across loop iterations.
> >> ---
> >>  gcc/tree-predcom.cc | 51 ++++++++++++++++++++++++++++++++++-----------
> >>  1 file changed, 39 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc
> >> index 9844fee1e97..029b02f5990 100644
> >> --- a/gcc/tree-predcom.cc
> >> +++ b/gcc/tree-predcom.cc
> >> @@ -409,6 +409,7 @@ public:
> >>    /* Perform the predictive commoning optimization for chains, make this
> >>       public for being called in callback execute_pred_commoning_cbck.  */
> >>    void execute_pred_commoning (bitmap tmp_vars);
> >> +  unsigned determine_unroll_factor (const vec<chain_p> &chains);
> >>
> >>  private:
> >>    /* The pointer to the given loop.  */
> >> @@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain (chain_p 
> >> chain,
> >>     copies as possible.  CHAINS is the list of chains that will be
> >>     optimized.  */
> >>
> >> -static unsigned
> >> -determine_unroll_factor (const vec<chain_p> &chains)
> >> +unsigned
> >> +pcom_worker::determine_unroll_factor (const vec<chain_p> &chains)
> >>  {
> >>    chain_p chain;
> >> -  unsigned factor = 1, af, nfactor, i;
> >> +  unsigned factor = 1, i;
> >>    unsigned max = param_max_unroll_times;
> >> +  struct data_dependence_relation *ddr;
> >> +  unsigned nfactor = 0;
> >> +  int nzfactor = 0;
> >> +
> >> +  /* Best unroll factor is the maximum distance across loop
> >> +     iterations.  */
> >> +  FOR_EACH_VEC_ELT (m_dependences, i, ddr)
> >> +    {
> >> +      for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++)
> >> +       {
> >> +         lambda_vector vec = DDR_DIST_VECT (ddr, j);
> >> +         widest_int distance = vec[j];
> >> +         unsigned offset = distance.to_uhwi ();
> >> +         if (offset == 0)
> >> +           continue;
> >> +
> >> +         int dist = offset - nzfactor;
> >> +         if (dist  == 0)
> >> +           continue;
> >>
> >> +         if (nfactor == 0)
> >> +           {
> >> +             nfactor = offset;
> >> +             nzfactor = offset;
> >> +           }
> >> +         else if (dist <= nzfactor)
> >> +           nfactor = offset;
> >> +
> >> +         if (nfactor > 0 && nfactor <= max)
> >> +           factor = nfactor;
> >> +       }
> >> +    }
> >> +
> >> +  int max_use = 0;
> >>    FOR_EACH_VEC_ELT (chains, i, chain)
> >>      {
> >>        if (chain->type == CT_INVARIANT)
> >> @@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec<chain_p> 
> >> &chains)
> >>           continue;
> >>         }
> >>
> >> -      /* The best unroll factor for this chain is equal to the number of
> >> -        temporary variables that we create for it.  */
> >> -      af = chain->length;
> >>        if (chain->has_max_use_after)
> >> -       af++;
> >> -
> >> -      nfactor = factor * af / gcd (factor, af);
> >> -      if (nfactor <= max)
> >> -       factor = nfactor;
> >> +       max_use++;
> >>      }
> >> -
> >> +  factor += max_use;
> >>    return factor;
> >>  }
> >>
> >> --
> >> 2.43.5
> >>

Reply via email to