https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #27 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to amker from comment #26) > (In reply to amker from comment #25) > > I tend to believe this is an register pressure based strength-reduction + > > lim problem than ivopts. > > > > So given class of memory references like: > > > > reg = ... > > Loop: > > MEM[iv_base + reg * 0]; > > MEM[iv_base + reg * 1]; > > MEM[iv_base + reg * 2]; > > MEM[iv_base + reg * 3]; > > MEM[iv_base + reg * 4]; > > MEM[iv_base + reg * 5]; > > MEM[iv_base + reg * 6]; > > MEM[iv_base + reg * 7]; > > > > The best arrangement probably would be: > > > > reg = ... > > regX = reg * 3; > > Loop: > > MEM[iv_base + reg * 0]; > > MEM[iv_base + reg * 1]; > > MEM[iv_base + reg * 2]; > > t = iv_base + regX; > > MEM[t]; > > MEM[iv_base + reg * 4]; > > MEM[t + reg * 2]; > > MEM[iv_base + regX * 2]; > > MEM[t + reg * 4]; > > > > Depending on the register pressure, regX should be re-materialized in loop > > or hoisted as invariant. > > Of course supported scales in addressing modes should be considered, but I > guess most targets only (if) support [base + reg * 2^x]? Yes. Note that the input to IVOPTs is the strength-reduced variant and IVOPTs generates the variant with multiple hoisted invariants. The IVOPTs issue I mentioned happens when you compare the following two functions - for foo it ends up with two IVs (good). float A[1024]; float B[1024]; void foo(int s, int r) { for (int i = 0; i < 128; i+=4) { B[(i+0)*r + 0] = A[(i+0)*s + 0]; B[(i+0)*r + 1] = A[(i+0)*s + 1]; B[(i+1)*r + 0] = A[(i+1)*s + 0]; B[(i+1)*r + 1] = A[(i+1)*s + 1]; B[(i+2)*r + 0] = A[(i+2)*s + 0]; B[(i+2)*r + 1] = A[(i+2)*s + 1]; B[(i+3)*r + 0] = A[(i+3)*s + 0]; B[(i+3)*r + 1] = A[(i+3)*s + 1]; } } void bar(int s, int r) { for (int i = 0; i < 128; i+=4) { float *b = &B[i*r]; float *a = &A[i*s]; b[0] = a[0]; b[1] = a[1]; b += r; a += s; b[0] = a[0]; b[1] = a[1]; b += r; a += s; b[0] = a[0]; b[1] = a[1]; b += r; a += s; b[0] = a[0]; b[1] = a[1]; } } The vectorizer actually emits sth like the following but it behaves like bar. void baz(int s, int r) { float *IVb = &B[0]; float *IVa = &A[0]; for (int i = 0; i < 128; i+=4) { float *b = IVb; float *a = IVa; b[0] = a[0]; b[1] = a[1]; b += r; a += s; b[0] = a[0]; b[1] = a[1]; b += r; a += s; b[0] = a[0]; b[1] = a[1]; b += r; a += s; b[0] = a[0]; b[1] = a[1]; IVb += 4*r; IVa += 4*s; } }