https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115663
Bug ID: 115663 Summary: outer loop vectorization with inner loop grouped access and SLP should be possible Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- We do not support interleaving of accesses in the inner loop but SLP should be possible if the group is contiguous with respect to the outer loop evolution. void foo (double * __restrict a, double *b, int n) { for (int i = 0; i < 1024; ++i) { double res = a[i]; for (int j = 0; j < 8; ++j) res += b[j * 16 + 2*i]; a[i] = res; } } or void foo (double * __restrict a, double *b, int n) { for (int i = 0; i < 1024; ++i) { double res = a[i]; for (int j = 0; j < 8; ++j) res += b[j * 16 + 2*i] + b[j * 16 + 2*i + 1]; a[i] = res; } } should be possible to vectorize (the former is with a gap, the latter not). In practice this is likely relevant for both image (pixel, w/ and w/o gap) and complex numbers.