Hello Richard: On 11/07/24 2:21 pm, Richard Biener wrote: > On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal <aagar...@linux.ibm.com> wrote: >> >> Hello All: >> >> Unroll factor is determined with max distance across loop iterations. >> The logic for determining the loop unroll factor is based on >> data dependency across loop iterations. >> >> The max distance across loop iterations is the unrolling factor >> that helps in predictive commoning. > > The old comment in the code says > >> - /* The best unroll factor for this chain is equal to the number of >> - temporary variables that we create for it. */ > > why is that wrong and why is the max dependence distance more correct? > > Do you have a testcase that shows how this makes a (positive) difference? >
There is nothing wrong in the existing implementation of unroll factor for predictive commoning. But with max dependence distance we get performance improvement with spec 2017 benchmarks (INT) of 0.01% (Geomean) with and without changes. Improvement in benchmarks with max dependence distance changes. I have used the following flags: -O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning -fno-tree-pre With above flags I ran with and without changes. There is no degradation with spec 2017 (FP benchmarks). Because in predictive commoning we reuse values computed in earlier iterations of a loop in the later ones, max distance is the better choice. > Richard. > Thanks & Regards Ajit >> Bootstrapped and regtested on powerpc64-linux-gnu. >> >> Thanks & Regards >> Ajit >> >> tree-optimization, predcom: Improve unroll factor for predictive commoning >> >> Unroll factor is determined with max distance across loop iterations. >> The logic for determining the loop unroll factor is based on >> data dependency across loop iterations. >> >> The max distance across loop iterations is the unrolling factor >> that helps in predictive commoning. >> >> 2024-07-11 Ajit Kumar Agarwal <aagar...@linux.ibm.com> >> >> gcc/ChangeLog: >> >> * tree-predcom.cc: Change in determining unroll factor with >> data dependence across loop iterations. >> --- >> gcc/tree-predcom.cc | 51 ++++++++++++++++++++++++++++++++++----------- >> 1 file changed, 39 insertions(+), 12 deletions(-) >> >> diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc >> index 9844fee1e97..029b02f5990 100644 >> --- a/gcc/tree-predcom.cc >> +++ b/gcc/tree-predcom.cc >> @@ -409,6 +409,7 @@ public: >> /* Perform the predictive commoning optimization for chains, make this >> public for being called in callback execute_pred_commoning_cbck. */ >> void execute_pred_commoning (bitmap tmp_vars); >> + unsigned determine_unroll_factor (const vec<chain_p> &chains); >> >> private: >> /* The pointer to the given loop. */ >> @@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain (chain_p >> chain, >> copies as possible. CHAINS is the list of chains that will be >> optimized. */ >> >> -static unsigned >> -determine_unroll_factor (const vec<chain_p> &chains) >> +unsigned >> +pcom_worker::determine_unroll_factor (const vec<chain_p> &chains) >> { >> chain_p chain; >> - unsigned factor = 1, af, nfactor, i; >> + unsigned factor = 1, i; >> unsigned max = param_max_unroll_times; >> + struct data_dependence_relation *ddr; >> + unsigned nfactor = 0; >> + int nzfactor = 0; >> + >> + /* Best unroll factor is the maximum distance across loop >> + iterations. */ >> + FOR_EACH_VEC_ELT (m_dependences, i, ddr) >> + { >> + for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++) >> + { >> + lambda_vector vec = DDR_DIST_VECT (ddr, j); >> + widest_int distance = vec[j]; >> + unsigned offset = distance.to_uhwi (); >> + if (offset == 0) >> + continue; >> + >> + int dist = offset - nzfactor; >> + if (dist == 0) >> + continue; >> >> + if (nfactor == 0) >> + { >> + nfactor = offset; >> + nzfactor = offset; >> + } >> + else if (dist <= nzfactor) >> + nfactor = offset; >> + >> + if (nfactor > 0 && nfactor <= max) >> + factor = nfactor; >> + } >> + } >> + >> + int max_use = 0; >> FOR_EACH_VEC_ELT (chains, i, chain) >> { >> if (chain->type == CT_INVARIANT) >> @@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec<chain_p> &chains) >> continue; >> } >> >> - /* The best unroll factor for this chain is equal to the number of >> - temporary variables that we create for it. */ >> - af = chain->length; >> if (chain->has_max_use_after) >> - af++; >> - >> - nfactor = factor * af / gcd (factor, af); >> - if (nfactor <= max) >> - factor = nfactor; >> + max_use++; >> } >> - >> + factor += max_use; >> return factor; >> } >> >> -- >> 2.43.5 >>