On Mon, Nov 25, 2019 at 5:29 PM Li Jia He <heli...@linux.ibm.com> wrote: > > > > On 2019/11/24 2:26 PM, Bin.Cheng wrote: > > On Fri, Nov 22, 2019 at 3:23 PM Bin.Cheng <amker.ch...@gmail.com> wrote: > >> > >> On Fri, Nov 22, 2019 at 3:19 PM Richard Biener > >> <richard.guent...@gmail.com> wrote: > >>> > >>> On November 22, 2019 6:51:38 AM GMT+01:00, Li Jia He > >>> <heli...@linux.ibm.com> wrote: > >>>> > >>>> > >>>> On 2019/11/21 8:10 PM, Richard Biener wrote: > >>>>> On Thu, Nov 21, 2019 at 10:22 AM Li Jia He <heli...@linux.ibm.com> > >>>> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> I found for the follow code: > >>>>>> > >>>>>> #define N 256 > >>>>>> int a[N][N][N], b[N][N][N]; > >>>>>> int d[N][N], c[N][N]; > >>>>>> void __attribute__((noinline)) > >>>>>> double_reduc (int n) > >>>>>> { > >>>>>> for (int k = 0; k < n; k++) > >>>>>> { > >>>>>> for (int l = 0; l < n; l++) > >>>>>> { > >>>>>> c[k][l] = 0; > >>>>>> for (int m = 0; m < n; m++) > >>>>>> c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m]; > >>>>>> } > >>>>>> } > >>>>>> } > >>>>>> > >>>>>> I dumped the file after loop interchange and got the following > >>>> information: > >>>>>> > >>>>>> <bb 3> [local count: 118111600]: > >>>>>> # m_46 = PHI <0(7), m_45(11)> > >>>>>> # ivtmp_44 = PHI <_42(7), ivtmp_43(11)> > >>>>>> _39 = _49 + 1; > >>>>>> > >>>>>> <bb 4> [local count: 955630224]: > >>>>>> # l_48 = PHI <0(3), l_47(12)> > >>>>>> # ivtmp_41 = PHI <_39(3), ivtmp_40(12)> > >>>>>> c_I_I_lsm.5_18 = c[k_28][l_48]; > >>>>>> c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0; > >>>>>> _2 = a[k_28][m_46][l_48]; > >>>>>> _3 = d[k_28][m_46]; > >>>>>> _4 = _2 * _3; > >>>>>> _5 = b[k_28][m_46][l_48]; > >>>>>> _6 = _3 * _5; > >>>>>> _7 = _4 + _6; > >>>>>> _8 = _7 + c_I_I_lsm.5_53; > >>>>>> c[k_28][l_48] = _8; > >>>>>> l_47 = l_48 + 1; > >>>>>> ivtmp_40 = ivtmp_41 - 1; > >>>>>> if (ivtmp_40 != 0) > >>>>>> goto <bb 12>; [89.00%] > >>>>>> else > >>>>>> goto <bb 5>; [11.00%] > >>>>>> > >>>>>> we can see '_3 = d[k_28][m_46];' is a loop invariant. > >>>>>> Do we need to add a loop invariant motion pass after the loop > >>>> interchange? > >>>>> > >>>>> There is one at the end of the loop pipeline. > >>>> > >>>> Hi, > >>>> > >>>> The one at the end of the loop pipeline may miss some optimization > >>>> opportunities. If we vectorize the above code (a.c.158t.vect), we > >>>> can get information similar to the following: > >>>> > >>>> bb 3: > >>>> # m_46 = PHI <0(7), m_45(11)> // loop m, outer loop > >>>> if (_59 <= 2) > >>>> goto bb 20; > >>>> else > >>>> goto bb 15; > >>>> > >>>> bb 15: > >>>> _89 = d[k_28][m_46]; > >>>> vect_cst__90 = {_89, _89, _89, _89}; > >>>> > >>>> bb 4: > >>>> # l_48 = PHI <l_47(12), 0(15)> // loop l, inner loop > >>>> vect__6.23_100 = vect_cst__99 * vect__5.22_98; > >>>> if (ivtmp_110 < bnd.8_1) > >>>> goto bb 12; > >>>> else > >>>> goto bb 17; > >>>> > >>>> bb 20: > >>>> bb 18: > >>>> _27 = d[k_28][m_46]; > >>>> if (ivtmp_12 != 0) > >>>> goto bb 19; > >>>> else > >>>> goto bb 21; > >>>> > >>>> Vectorization will do some conversions in this case. We can see > >>>> ‘ _89 = d[k_28][m_46];’ and ‘_27 = d[k_28][m_46];’ are loop invariant > >>>> relative to loop l. We can move ‘d[k_28][m_46]’ to the front of > >>>> ‘if (_59 <= 2)’ to get rid of loading data from memory in both > >>>> branches. > >>>> > >>>> The one at at the end of the loop pipeline can't handle this situation. > >>>> If we move d[k_28][m_46] from loop l to loop m before doing > >>>> vectorization, we can get rid of this situation. > >>> > >>> But we can't run every pass after every other. With multiple passes > >>> having ordering issues is inevitable. > >>> > >>> Now - interchange could trigger a region based invariant motion just for > >>> the nest it interchanged. But that doesn't exist right now. > >> With data reference/dependence information in the pass, I think it > >> could be quite straightforward. Didn't realize that we need it > >> before. > > FYI, attachment is a simple fix in loop interchange for the reported > > issue. It's untested, neither for GCC10. > > Hi, > > Thank you for providing a patch so quickly. I did some tests for this > patch. > I found it will cause some test cases failed such as > testsuite/gfortran.dg/vect/pr81303.f. My bad. It fails because the patch only hoists memory reference, but not variables used by it.
Thanks, bin > And I also found that this patch can’t do loop invariant motion in some > cases. > Take pr81303.f as an example > > <bb 12> [local count: 118111600]: > # m_55 = PHI <1(9), m_81(20)> // loop m, outer loop > # ivtmp_82 = PHI <_167(9), ivtmp_169(20)> > inv_temp_439 = (*x_138(D))[_94]; > inv_temp_440 = (*x_138(D))[_85]; > inv_temp_441 = (*x_138(D))[_76]; > inv_temp_442 = (*x_138(D))[_68]; > inv_temp_443 = (*x_138(D))[_58]; > inv_temp_444 = (*x_138(D))[_49]; > inv_temp_445 = (*x_138(D))[_41]; > _446 = _21 + 1; > > <bb 13> [local count: 955630224]: > # l_32 = PHI <1(12), l_54(21)> // loop l, inner loop > # ivtmp_165 = PHI <_446(12), ivtmp_155(21)> > _26 = (integer(kind=8)) l_32; > _27 = _25 + _26; > y__I_lsm.119_136 = (*y_135(D))[_27]; > y__I_lsm.119_90 = m_55 != 1 ? y__I_lsm.119_136 : 0.0; > _36 = (integer(kind=8)) m_55; > _37 = _36 * stride.88_111; > _38 = _35 + _37; > _39 = _26 + _38; > _40 = (*a_137(D))[_39]; > _41 = _25 + _36; > _42 = inv_temp_445; > _43 = _40 * _42; > _45 = (*axp_139(D))[_39]; > > We can see that variables such as inv_temp_439 moved to outer loop m. > However, loop invariants such as _36, _37 do not moved to outer loop m. > > -- > BR, > Lijia He > > > > > Thanks, > > bin > >> > >> Thanks, > >> bin > >>> > >>> Richard. > >