On 2019/11/24 2:26 PM, Bin.Cheng wrote:
On Fri, Nov 22, 2019 at 3:23 PM Bin.Cheng <amker.ch...@gmail.com> wrote:

On Fri, Nov 22, 2019 at 3:19 PM Richard Biener
<richard.guent...@gmail.com> wrote:

On November 22, 2019 6:51:38 AM GMT+01:00, Li Jia He <heli...@linux.ibm.com> 
wrote:


On 2019/11/21 8:10 PM, Richard Biener wrote:
On Thu, Nov 21, 2019 at 10:22 AM Li Jia He <heli...@linux.ibm.com>
wrote:

Hi,

I found for the follow code:

#define N 256
int a[N][N][N], b[N][N][N];
int d[N][N], c[N][N];
void __attribute__((noinline))
double_reduc (int n)
{
     for (int k = 0; k < n; k++)
     {
       for (int l = 0; l < n; l++)
        {
          c[k][l] = 0;
           for (int m = 0; m < n; m++)
             c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m];
        }
     }
}

I dumped the file after loop interchange and got the following
information:

<bb 3> [local count: 118111600]:
     # m_46 = PHI <0(7), m_45(11)>
     # ivtmp_44 = PHI <_42(7), ivtmp_43(11)>
     _39 = _49 + 1;

     <bb 4> [local count: 955630224]:
     # l_48 = PHI <0(3), l_47(12)>
     # ivtmp_41 = PHI <_39(3), ivtmp_40(12)>
     c_I_I_lsm.5_18 = c[k_28][l_48];
     c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0;
     _2 = a[k_28][m_46][l_48];
     _3 = d[k_28][m_46];
     _4 = _2 * _3;
     _5 = b[k_28][m_46][l_48];
     _6 = _3 * _5;
     _7 = _4 + _6;
     _8 = _7 + c_I_I_lsm.5_53;
     c[k_28][l_48] = _8;
     l_47 = l_48 + 1;
     ivtmp_40 = ivtmp_41 - 1;
     if (ivtmp_40 != 0)
       goto <bb 12>; [89.00%]
     else
       goto <bb 5>; [11.00%]

we can see '_3 = d[k_28][m_46];'  is a loop invariant.
Do we need to add a loop invariant motion pass after the loop
interchange?

There is one at the end of the loop pipeline.

Hi,

The one at the end of the loop pipeline may miss some optimization
opportunities.  If we vectorize the above code (a.c.158t.vect), we
can get information similar to the following:

bb 3:
  # m_46 = PHI <0(7), m_45(11)>  // loop m, outer loop
   if (_59 <= 2)
     goto bb 20;
   else
     goto bb 15;

bb 15:
   _89 = d[k_28][m_46];
   vect_cst__90 = {_89, _89, _89, _89};

bb 4:
    # l_48 = PHI <l_47(12), 0(15)> // loop l, inner loop
   vect__6.23_100 = vect_cst__99 * vect__5.22_98;
    if (ivtmp_110 < bnd.8_1)
     goto bb 12;
   else
     goto bb 17;

bb 20:
bb 18:
    _27 = d[k_28][m_46];
if (ivtmp_12 != 0)
     goto bb 19;
   else
     goto bb 21;

Vectorization will do some conversions in this case.  We can see
‘ _89 = d[k_28][m_46];’ and ‘_27 = d[k_28][m_46];’ are loop invariant
relative to loop l.  We can move ‘d[k_28][m_46]’ to the front of
‘if (_59 <= 2)’ to get rid of loading data from memory in both
branches.

The one at at the end of the loop pipeline can't handle this situation.
If we move d[k_28][m_46] from loop l to loop m before doing
vectorization, we can get rid of this situation.

But we can't run every pass after every other. With multiple passes having 
ordering issues is inevitable.

Now - interchange could trigger a region based invariant motion just for the 
nest it interchanged. But that doesn't exist right now.
With data reference/dependence information in the pass, I think it
could be quite straightforward.  Didn't realize that we need it
before.
FYI, attachment is a simple fix in loop interchange for the reported
issue. It's untested, neither for GCC10.

Hi,

Thank you for providing a patch so quickly. I did some tests for this patch. I found it will cause some test cases failed such as testsuite/gfortran.dg/vect/pr81303.f. And I also found that this patch can’t do loop invariant motion in some cases.
Take pr81303.f as an example

<bb 12> [local count: 118111600]:
  # m_55 = PHI <1(9), m_81(20)> // loop m, outer loop
  # ivtmp_82 = PHI <_167(9), ivtmp_169(20)>
  inv_temp_439 = (*x_138(D))[_94];
  inv_temp_440 = (*x_138(D))[_85];
  inv_temp_441 = (*x_138(D))[_76];
  inv_temp_442 = (*x_138(D))[_68];
  inv_temp_443 = (*x_138(D))[_58];
  inv_temp_444 = (*x_138(D))[_49];
  inv_temp_445 = (*x_138(D))[_41];
  _446 = _21 + 1;

  <bb 13> [local count: 955630224]:
  # l_32 = PHI <1(12), l_54(21)>  // loop l, inner loop
  # ivtmp_165 = PHI <_446(12), ivtmp_155(21)>
  _26 = (integer(kind=8)) l_32;
  _27 = _25 + _26;
  y__I_lsm.119_136 = (*y_135(D))[_27];
  y__I_lsm.119_90 = m_55 != 1 ? y__I_lsm.119_136 : 0.0;
  _36 = (integer(kind=8)) m_55;
  _37 = _36 * stride.88_111;
  _38 = _35 + _37;
  _39 = _26 + _38;
  _40 = (*a_137(D))[_39];
  _41 = _25 + _36;
  _42 = inv_temp_445;
  _43 = _40 * _42;
  _45 = (*axp_139(D))[_39];

We can see that variables such as inv_temp_439 moved to outer loop m.
However, loop invariants such as _36, _37 do not moved to outer loop m.

--
BR,
Lijia He


Thanks,
bin

Thanks,
bin

Richard.


Reply via email to