Richard Biener <richard.guent...@gmail.com> writes: > On Tue, Mar 26, 2019 at 1:56 AM Richard Sandiford > <richard.sandif...@arm.com> wrote: >> Based on the "complete unrolling" view, if we number statements as >> (i, n), where i is the outer loop iteration and n is a statement number >> in the (completely unrolled) loop body, then the original scalar code >> executes in lexicographical order while for the vector loop: >> >> (1) (i,n) executes before (i+ix,n+nx) for all ix>=0, nx>=1, regardless of VF >> (2) (i,n) executes before (i+ix,n-nx) for all ix>=VF, nx>=0 >> (well, nx unrestricted, but only nx>=0 is useful given (1)) >> >> So for any kind of dependence between (i,n) and (i+ix,n-nx), ix>=1, nx>=0 >> we need to restrict VF to ix so that (2) ensures the right order. >> This means that the unnormalised distances of interest are: >> >> - (ix, -nx), ix>=1, nx>=0 >> - (-ix, nx), ix>=1, nx>=0 >> >> But the second gets normalised to the first, which is actually useful >> in this case :-). >> >> In terms of the existing code, I think that means we want to change >> the handling of nested statements (only) to: >> >> - ignore DDR_REVERSED_P (ddr) >> - restrict the main dist > 0 case to when the inner distance is <= 0. >> >> This should have the side effect of allowing outer-loop vectorisation for: >> >> void __attribute__ ((noipa)) >> f (int a[][N], int b[restrict]) >> { >> for (int i = N - 1; i-- > 0; ) >> for (int j = 0; j < N - 1; ++j) >> a[j + 1][i] = a[j][i + 1] + b[i]; >> } >> >> At the moment we reject this, but AFAICT it should be OK. >> (We do allow it for s/i + 1/i/, since then the outer distance is 0.) > > Can you file an enhancement request so we don't forget?
OK, for the record it's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89908