https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037

--- Comment #27 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to amker from comment #26)
> (In reply to amker from comment #25)
> > I tend to believe this is an register pressure based strength-reduction +
> > lim problem than ivopts.
> > 
> > So given class of memory references like:
> > 
> >   reg = ...
> > Loop:
> >   MEM[iv_base + reg * 0];
> >   MEM[iv_base + reg * 1];
> >   MEM[iv_base + reg * 2];
> >   MEM[iv_base + reg * 3];
> >   MEM[iv_base + reg * 4];
> >   MEM[iv_base + reg * 5];
> >   MEM[iv_base + reg * 6];
> >   MEM[iv_base + reg * 7];
> > 
> > The best arrangement probably would be:
> > 
> >   reg = ...
> >   regX = reg * 3;
> > Loop:
> >   MEM[iv_base + reg * 0];
> >   MEM[iv_base + reg * 1];
> >   MEM[iv_base + reg * 2];
> >   t = iv_base + regX;
> >   MEM[t];
> >   MEM[iv_base + reg * 4];
> >   MEM[t + reg * 2];
> >   MEM[iv_base + regX * 2];
> >   MEM[t + reg * 4];
> > 
> > Depending on the register pressure, regX should be re-materialized in loop
> > or hoisted as invariant.
> 
> Of course supported scales in addressing modes should be considered, but I
> guess most targets only (if) support [base + reg * 2^x]?

Yes.

Note that the input to IVOPTs is the strength-reduced variant and IVOPTs
generates the variant with multiple hoisted invariants.

The IVOPTs issue I mentioned happens when you compare the following two
functions - for foo it ends up with two IVs (good).

float A[1024];
float B[1024];

void foo(int s, int r)
{
  for (int i = 0; i < 128; i+=4)
    {
      B[(i+0)*r + 0] = A[(i+0)*s + 0];
      B[(i+0)*r + 1] = A[(i+0)*s + 1];

      B[(i+1)*r + 0] = A[(i+1)*s + 0];
      B[(i+1)*r + 1] = A[(i+1)*s + 1];

      B[(i+2)*r + 0] = A[(i+2)*s + 0];
      B[(i+2)*r + 1] = A[(i+2)*s + 1];

      B[(i+3)*r + 0] = A[(i+3)*s + 0];
      B[(i+3)*r + 1] = A[(i+3)*s + 1];
    }
}

void bar(int s, int r)
{
  for (int i = 0; i < 128; i+=4)
    {
      float *b = &B[i*r];
      float *a = &A[i*s];
      b[0] = a[0];
      b[1] = a[1];

      b += r;
      a += s;
      b[0] = a[0];
      b[1] = a[1];

      b += r;
      a += s;
      b[0] = a[0];
      b[1] = a[1];

      b += r;
      a += s;
      b[0] = a[0];
      b[1] = a[1];
    }
}

The vectorizer actually emits sth like the following but it behaves like bar.

void baz(int s, int r)
{
  float *IVb = &B[0];
  float *IVa = &A[0];
  for (int i = 0; i < 128; i+=4)
    {
      float *b = IVb;
      float *a = IVa;
      b[0] = a[0];
      b[1] = a[1];

      b += r;
      a += s;
      b[0] = a[0];
      b[1] = a[1];

      b += r;
      a += s;
      b[0] = a[0];
      b[1] = a[1];

      b += r;
      a += s;
      b[0] = a[0];
      b[1] = a[1];

      IVb += 4*r;
      IVa += 4*s;
    }
}

Reply via email to