Hello Richard:

On 11/07/24 2:21 pm, Richard Biener wrote:
> On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal <aagar...@linux.ibm.com> wrote:
>>
>> Hello All:
>>
>> Unroll factor is determined with max distance across loop iterations.
>> The logic for determining the loop unroll factor is based on
>> data dependency across loop iterations.
>>
>> The max distance across loop iterations is the unrolling factor
>> that helps in predictive commoning.
> 
> The old comment in the code says
> 
>> -      /* The best unroll factor for this chain is equal to the number of
>> -        temporary variables that we create for it.  */
> 
> why is that wrong and why is the max dependence distance more correct?
> 
> Do you have a testcase that shows how this makes a (positive) difference?
>

There is nothing wrong in the existing implementation of unroll
factor for predictive commoning.

But with max dependence distance we get performance improvement
with spec 2017 benchmarks (INT) of 0.01% (Geomean) with and without
changes. Improvement in benchmarks with max dependence distance
changes.

I have used the following flags:
-O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning 
-fno-tree-pre

With above flags I ran with and without changes.

There is no degradation with spec 2017 (FP benchmarks).

Because in predictive commoning we reuse values computed in
earlier iterations of a loop in the later ones, max distance is the
better choice.

> Richard.
>

Thanks & Regards
Ajit
 
>> Bootstrapped and regtested on powerpc64-linux-gnu.
>>
>> Thanks & Regards
>> Ajit
>>
>> tree-optimization, predcom: Improve unroll factor for predictive commoning
>>
>> Unroll factor is determined with max distance across loop iterations.
>> The logic for determining the loop unroll factor is based on
>> data dependency across loop iterations.
>>
>> The max distance across loop iterations is the unrolling factor
>> that helps in predictive commoning.
>>
>> 2024-07-11  Ajit Kumar Agarwal  <aagar...@linux.ibm.com>
>>
>> gcc/ChangeLog:
>>
>>         * tree-predcom.cc: Change in determining unroll factor with
>>         data dependence across loop iterations.
>> ---
>>  gcc/tree-predcom.cc | 51 ++++++++++++++++++++++++++++++++++-----------
>>  1 file changed, 39 insertions(+), 12 deletions(-)
>>
>> diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc
>> index 9844fee1e97..029b02f5990 100644
>> --- a/gcc/tree-predcom.cc
>> +++ b/gcc/tree-predcom.cc
>> @@ -409,6 +409,7 @@ public:
>>    /* Perform the predictive commoning optimization for chains, make this
>>       public for being called in callback execute_pred_commoning_cbck.  */
>>    void execute_pred_commoning (bitmap tmp_vars);
>> +  unsigned determine_unroll_factor (const vec<chain_p> &chains);
>>
>>  private:
>>    /* The pointer to the given loop.  */
>> @@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain (chain_p 
>> chain,
>>     copies as possible.  CHAINS is the list of chains that will be
>>     optimized.  */
>>
>> -static unsigned
>> -determine_unroll_factor (const vec<chain_p> &chains)
>> +unsigned
>> +pcom_worker::determine_unroll_factor (const vec<chain_p> &chains)
>>  {
>>    chain_p chain;
>> -  unsigned factor = 1, af, nfactor, i;
>> +  unsigned factor = 1, i;
>>    unsigned max = param_max_unroll_times;
>> +  struct data_dependence_relation *ddr;
>> +  unsigned nfactor = 0;
>> +  int nzfactor = 0;
>> +
>> +  /* Best unroll factor is the maximum distance across loop
>> +     iterations.  */
>> +  FOR_EACH_VEC_ELT (m_dependences, i, ddr)
>> +    {
>> +      for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++)
>> +       {
>> +         lambda_vector vec = DDR_DIST_VECT (ddr, j);
>> +         widest_int distance = vec[j];
>> +         unsigned offset = distance.to_uhwi ();
>> +         if (offset == 0)
>> +           continue;
>> +
>> +         int dist = offset - nzfactor;
>> +         if (dist  == 0)
>> +           continue;
>>
>> +         if (nfactor == 0)
>> +           {
>> +             nfactor = offset;
>> +             nzfactor = offset;
>> +           }
>> +         else if (dist <= nzfactor)
>> +           nfactor = offset;
>> +
>> +         if (nfactor > 0 && nfactor <= max)
>> +           factor = nfactor;
>> +       }
>> +    }
>> +
>> +  int max_use = 0;
>>    FOR_EACH_VEC_ELT (chains, i, chain)
>>      {
>>        if (chain->type == CT_INVARIANT)
>> @@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec<chain_p> &chains)
>>           continue;
>>         }
>>
>> -      /* The best unroll factor for this chain is equal to the number of
>> -        temporary variables that we create for it.  */
>> -      af = chain->length;
>>        if (chain->has_max_use_after)
>> -       af++;
>> -
>> -      nfactor = factor * af / gcd (factor, af);
>> -      if (nfactor <= max)
>> -       factor = nfactor;
>> +       max_use++;
>>      }
>> -
>> +  factor += max_use;
>>    return factor;
>>  }
>>
>> --
>> 2.43.5
>>

Reply via email to