On Fri, Jul 12, 2024 at 12:09 PM Ajit Agarwal <aagar...@linux.ibm.com> wrote: > > Hello Richard: > > On 11/07/24 2:21 pm, Richard Biener wrote: > > On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal <aagar...@linux.ibm.com> > > wrote: > >> > >> Hello All: > >> > >> Unroll factor is determined with max distance across loop iterations. > >> The logic for determining the loop unroll factor is based on > >> data dependency across loop iterations. > >> > >> The max distance across loop iterations is the unrolling factor > >> that helps in predictive commoning. > > > > The old comment in the code says > > > >> - /* The best unroll factor for this chain is equal to the number of > >> - temporary variables that we create for it. */ > > > > why is that wrong and why is the max dependence distance more correct? > > > > Do you have a testcase that shows how this makes a (positive) difference? > > > > There is nothing wrong in the existing implementation of unroll > factor for predictive commoning. > > But with max dependence distance we get performance improvement > with spec 2017 benchmarks (INT) of 0.01% (Geomean) with and without > changes. Improvement in benchmarks with max dependence distance > changes. > > I have used the following flags: > -O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning > -fno-tree-pre > > With above flags I ran with and without changes.
A 0.01% geomean improvement is noise. Why did you disable PRE? > There is no degradation with spec 2017 (FP benchmarks). > > Because in predictive commoning we reuse values computed in > earlier iterations of a loop in the later ones, max distance is the > better choice. The re-use distance is the same though. So your change merely increases the unroll factor? Or can you explain why there is more re-use with your change. Richard. > > Richard. > > > > Thanks & Regards > Ajit > > >> Bootstrapped and regtested on powerpc64-linux-gnu. > >> > >> Thanks & Regards > >> Ajit > >> > >> tree-optimization, predcom: Improve unroll factor for predictive commoning > >> > >> Unroll factor is determined with max distance across loop iterations. > >> The logic for determining the loop unroll factor is based on > >> data dependency across loop iterations. > >> > >> The max distance across loop iterations is the unrolling factor > >> that helps in predictive commoning. > >> > >> 2024-07-11 Ajit Kumar Agarwal <aagar...@linux.ibm.com> > >> > >> gcc/ChangeLog: > >> > >> * tree-predcom.cc: Change in determining unroll factor with > >> data dependence across loop iterations. > >> --- > >> gcc/tree-predcom.cc | 51 ++++++++++++++++++++++++++++++++++----------- > >> 1 file changed, 39 insertions(+), 12 deletions(-) > >> > >> diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc > >> index 9844fee1e97..029b02f5990 100644 > >> --- a/gcc/tree-predcom.cc > >> +++ b/gcc/tree-predcom.cc > >> @@ -409,6 +409,7 @@ public: > >> /* Perform the predictive commoning optimization for chains, make this > >> public for being called in callback execute_pred_commoning_cbck. */ > >> void execute_pred_commoning (bitmap tmp_vars); > >> + unsigned determine_unroll_factor (const vec<chain_p> &chains); > >> > >> private: > >> /* The pointer to the given loop. */ > >> @@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain (chain_p > >> chain, > >> copies as possible. CHAINS is the list of chains that will be > >> optimized. */ > >> > >> -static unsigned > >> -determine_unroll_factor (const vec<chain_p> &chains) > >> +unsigned > >> +pcom_worker::determine_unroll_factor (const vec<chain_p> &chains) > >> { > >> chain_p chain; > >> - unsigned factor = 1, af, nfactor, i; > >> + unsigned factor = 1, i; > >> unsigned max = param_max_unroll_times; > >> + struct data_dependence_relation *ddr; > >> + unsigned nfactor = 0; > >> + int nzfactor = 0; > >> + > >> + /* Best unroll factor is the maximum distance across loop > >> + iterations. */ > >> + FOR_EACH_VEC_ELT (m_dependences, i, ddr) > >> + { > >> + for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++) > >> + { > >> + lambda_vector vec = DDR_DIST_VECT (ddr, j); > >> + widest_int distance = vec[j]; > >> + unsigned offset = distance.to_uhwi (); > >> + if (offset == 0) > >> + continue; > >> + > >> + int dist = offset - nzfactor; > >> + if (dist == 0) > >> + continue; > >> > >> + if (nfactor == 0) > >> + { > >> + nfactor = offset; > >> + nzfactor = offset; > >> + } > >> + else if (dist <= nzfactor) > >> + nfactor = offset; > >> + > >> + if (nfactor > 0 && nfactor <= max) > >> + factor = nfactor; > >> + } > >> + } > >> + > >> + int max_use = 0; > >> FOR_EACH_VEC_ELT (chains, i, chain) > >> { > >> if (chain->type == CT_INVARIANT) > >> @@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec<chain_p> > >> &chains) > >> continue; > >> } > >> > >> - /* The best unroll factor for this chain is equal to the number of > >> - temporary variables that we create for it. */ > >> - af = chain->length; > >> if (chain->has_max_use_after) > >> - af++; > >> - > >> - nfactor = factor * af / gcd (factor, af); > >> - if (nfactor <= max) > >> - factor = nfactor; > >> + max_use++; > >> } > >> - > >> + factor += max_use; > >> return factor; > >> } > >> > >> -- > >> 2.43.5 > >>