https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #27 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to amker from comment #26)
> (In reply to amker from comment #25)
> > I tend to believe this is an register pressure based strength-reduction +
> > lim problem than ivopts.
> >
> > So given class of memory references like:
> >
> > reg = ...
> > Loop:
> > MEM[iv_base + reg * 0];
> > MEM[iv_base + reg * 1];
> > MEM[iv_base + reg * 2];
> > MEM[iv_base + reg * 3];
> > MEM[iv_base + reg * 4];
> > MEM[iv_base + reg * 5];
> > MEM[iv_base + reg * 6];
> > MEM[iv_base + reg * 7];
> >
> > The best arrangement probably would be:
> >
> > reg = ...
> > regX = reg * 3;
> > Loop:
> > MEM[iv_base + reg * 0];
> > MEM[iv_base + reg * 1];
> > MEM[iv_base + reg * 2];
> > t = iv_base + regX;
> > MEM[t];
> > MEM[iv_base + reg * 4];
> > MEM[t + reg * 2];
> > MEM[iv_base + regX * 2];
> > MEM[t + reg * 4];
> >
> > Depending on the register pressure, regX should be re-materialized in loop
> > or hoisted as invariant.
>
> Of course supported scales in addressing modes should be considered, but I
> guess most targets only (if) support [base + reg * 2^x]?
Yes.
Note that the input to IVOPTs is the strength-reduced variant and IVOPTs
generates the variant with multiple hoisted invariants.
The IVOPTs issue I mentioned happens when you compare the following two
functions - for foo it ends up with two IVs (good).
float A[1024];
float B[1024];
void foo(int s, int r)
{
for (int i = 0; i < 128; i+=4)
{
B[(i+0)*r + 0] = A[(i+0)*s + 0];
B[(i+0)*r + 1] = A[(i+0)*s + 1];
B[(i+1)*r + 0] = A[(i+1)*s + 0];
B[(i+1)*r + 1] = A[(i+1)*s + 1];
B[(i+2)*r + 0] = A[(i+2)*s + 0];
B[(i+2)*r + 1] = A[(i+2)*s + 1];
B[(i+3)*r + 0] = A[(i+3)*s + 0];
B[(i+3)*r + 1] = A[(i+3)*s + 1];
}
}
void bar(int s, int r)
{
for (int i = 0; i < 128; i+=4)
{
float *b = &B[i*r];
float *a = &A[i*s];
b[0] = a[0];
b[1] = a[1];
b += r;
a += s;
b[0] = a[0];
b[1] = a[1];
b += r;
a += s;
b[0] = a[0];
b[1] = a[1];
b += r;
a += s;
b[0] = a[0];
b[1] = a[1];
}
}
The vectorizer actually emits sth like the following but it behaves like bar.
void baz(int s, int r)
{
float *IVb = &B[0];
float *IVa = &A[0];
for (int i = 0; i < 128; i+=4)
{
float *b = IVb;
float *a = IVa;
b[0] = a[0];
b[1] = a[1];
b += r;
a += s;
b[0] = a[0];
b[1] = a[1];
b += r;
a += s;
b[0] = a[0];
b[1] = a[1];
b += r;
a += s;
b[0] = a[0];
b[1] = a[1];
IVb += 4*r;
IVa += 4*s;
}
}