On Mon, Nov 25, 2019 at 5:29 PM Li Jia He <heli...@linux.ibm.com> wrote:
>
>
>
> On 2019/11/24 2:26 PM, Bin.Cheng wrote:
> > On Fri, Nov 22, 2019 at 3:23 PM Bin.Cheng <amker.ch...@gmail.com> wrote:
> >>
> >> On Fri, Nov 22, 2019 at 3:19 PM Richard Biener
> >> <richard.guent...@gmail.com> wrote:
> >>>
> >>> On November 22, 2019 6:51:38 AM GMT+01:00, Li Jia He 
> >>> <heli...@linux.ibm.com> wrote:
> >>>>
> >>>>
> >>>> On 2019/11/21 8:10 PM, Richard Biener wrote:
> >>>>> On Thu, Nov 21, 2019 at 10:22 AM Li Jia He <heli...@linux.ibm.com>
> >>>> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I found for the follow code:
> >>>>>>
> >>>>>> #define N 256
> >>>>>> int a[N][N][N], b[N][N][N];
> >>>>>> int d[N][N], c[N][N];
> >>>>>> void __attribute__((noinline))
> >>>>>> double_reduc (int n)
> >>>>>> {
> >>>>>>      for (int k = 0; k < n; k++)
> >>>>>>      {
> >>>>>>        for (int l = 0; l < n; l++)
> >>>>>>         {
> >>>>>>           c[k][l] = 0;
> >>>>>>            for (int m = 0; m < n; m++)
> >>>>>>              c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m];
> >>>>>>         }
> >>>>>>      }
> >>>>>> }
> >>>>>>
> >>>>>> I dumped the file after loop interchange and got the following
> >>>> information:
> >>>>>>
> >>>>>> <bb 3> [local count: 118111600]:
> >>>>>>      # m_46 = PHI <0(7), m_45(11)>
> >>>>>>      # ivtmp_44 = PHI <_42(7), ivtmp_43(11)>
> >>>>>>      _39 = _49 + 1;
> >>>>>>
> >>>>>>      <bb 4> [local count: 955630224]:
> >>>>>>      # l_48 = PHI <0(3), l_47(12)>
> >>>>>>      # ivtmp_41 = PHI <_39(3), ivtmp_40(12)>
> >>>>>>      c_I_I_lsm.5_18 = c[k_28][l_48];
> >>>>>>      c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0;
> >>>>>>      _2 = a[k_28][m_46][l_48];
> >>>>>>      _3 = d[k_28][m_46];
> >>>>>>      _4 = _2 * _3;
> >>>>>>      _5 = b[k_28][m_46][l_48];
> >>>>>>      _6 = _3 * _5;
> >>>>>>      _7 = _4 + _6;
> >>>>>>      _8 = _7 + c_I_I_lsm.5_53;
> >>>>>>      c[k_28][l_48] = _8;
> >>>>>>      l_47 = l_48 + 1;
> >>>>>>      ivtmp_40 = ivtmp_41 - 1;
> >>>>>>      if (ivtmp_40 != 0)
> >>>>>>        goto <bb 12>; [89.00%]
> >>>>>>      else
> >>>>>>        goto <bb 5>; [11.00%]
> >>>>>>
> >>>>>> we can see '_3 = d[k_28][m_46];'  is a loop invariant.
> >>>>>> Do we need to add a loop invariant motion pass after the loop
> >>>> interchange?
> >>>>>
> >>>>> There is one at the end of the loop pipeline.
> >>>>
> >>>> Hi,
> >>>>
> >>>> The one at the end of the loop pipeline may miss some optimization
> >>>> opportunities.  If we vectorize the above code (a.c.158t.vect), we
> >>>> can get information similar to the following:
> >>>>
> >>>> bb 3:
> >>>>   # m_46 = PHI <0(7), m_45(11)>  // loop m, outer loop
> >>>>    if (_59 <= 2)
> >>>>      goto bb 20;
> >>>>    else
> >>>>      goto bb 15;
> >>>>
> >>>> bb 15:
> >>>>    _89 = d[k_28][m_46];
> >>>>    vect_cst__90 = {_89, _89, _89, _89};
> >>>>
> >>>> bb 4:
> >>>>     # l_48 = PHI <l_47(12), 0(15)> // loop l, inner loop
> >>>>    vect__6.23_100 = vect_cst__99 * vect__5.22_98;
> >>>>     if (ivtmp_110 < bnd.8_1)
> >>>>      goto bb 12;
> >>>>    else
> >>>>      goto bb 17;
> >>>>
> >>>> bb 20:
> >>>> bb 18:
> >>>>     _27 = d[k_28][m_46];
> >>>> if (ivtmp_12 != 0)
> >>>>      goto bb 19;
> >>>>    else
> >>>>      goto bb 21;
> >>>>
> >>>> Vectorization will do some conversions in this case.  We can see
> >>>> ‘ _89 = d[k_28][m_46];’ and ‘_27 = d[k_28][m_46];’ are loop invariant
> >>>> relative to loop l.  We can move ‘d[k_28][m_46]’ to the front of
> >>>> ‘if (_59 <= 2)’ to get rid of loading data from memory in both
> >>>> branches.
> >>>>
> >>>> The one at at the end of the loop pipeline can't handle this situation.
> >>>> If we move d[k_28][m_46] from loop l to loop m before doing
> >>>> vectorization, we can get rid of this situation.
> >>>
> >>> But we can't run every pass after every other. With multiple passes 
> >>> having ordering issues is inevitable.
> >>>
> >>> Now - interchange could trigger a region based invariant motion just for 
> >>> the nest it interchanged. But that doesn't exist right now.
> >> With data reference/dependence information in the pass, I think it
> >> could be quite straightforward.  Didn't realize that we need it
> >> before.
> > FYI, attachment is a simple fix in loop interchange for the reported
> > issue. It's untested, neither for GCC10.
>
> Hi,
>
> Thank you for providing a patch so quickly. I did some tests for this
> patch.
> I found it will cause some test cases failed such as
> testsuite/gfortran.dg/vect/pr81303.f.
My bad. It fails because the patch only hoists memory reference, but
not variables used by it.

Thanks,
bin
> And I also found that this patch can’t do loop invariant motion in some
> cases.
> Take pr81303.f as an example
>
> <bb 12> [local count: 118111600]:
>    # m_55 = PHI <1(9), m_81(20)> // loop m, outer loop
>    # ivtmp_82 = PHI <_167(9), ivtmp_169(20)>
>    inv_temp_439 = (*x_138(D))[_94];
>    inv_temp_440 = (*x_138(D))[_85];
>    inv_temp_441 = (*x_138(D))[_76];
>    inv_temp_442 = (*x_138(D))[_68];
>    inv_temp_443 = (*x_138(D))[_58];
>    inv_temp_444 = (*x_138(D))[_49];
>    inv_temp_445 = (*x_138(D))[_41];
>    _446 = _21 + 1;
>
>    <bb 13> [local count: 955630224]:
>    # l_32 = PHI <1(12), l_54(21)>  // loop l, inner loop
>    # ivtmp_165 = PHI <_446(12), ivtmp_155(21)>
>    _26 = (integer(kind=8)) l_32;
>    _27 = _25 + _26;
>    y__I_lsm.119_136 = (*y_135(D))[_27];
>    y__I_lsm.119_90 = m_55 != 1 ? y__I_lsm.119_136 : 0.0;
>    _36 = (integer(kind=8)) m_55;
>    _37 = _36 * stride.88_111;
>    _38 = _35 + _37;
>    _39 = _26 + _38;
>    _40 = (*a_137(D))[_39];
>    _41 = _25 + _36;
>    _42 = inv_temp_445;
>    _43 = _40 * _42;
>    _45 = (*axp_139(D))[_39];
>
> We can see that variables such as inv_temp_439 moved to outer loop m.
> However, loop invariants such as _36, _37 do not moved to outer loop m.
>
> --
> BR,
> Lijia He
>
> >
> > Thanks,
> > bin
> >>
> >> Thanks,
> >> bin
> >>>
> >>> Richard.
>
>

Reply via email to