https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117605
Bug ID: 117605 Summary: SLP vectorization fails for negative stride interleaving Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- Testing with --param vect-force-slp=1 reveals (via gcc.target/aarch64/sve/strided_load_4.c and a few others) that we do not SLP vectorize void foo (unsigned *restrict dest, unsigned *src, int n) { for (int i = 0; i < n; ++i) dest[i] += src[i * -100]; } note: Detected single element interleaving *_8 step -400 missed: permutation requires at least three vectors _9 = *_8; the non-SLP path classifies this as VMAT_ELEMENTWISE, SLP as VMAT_CONTIGUOUS_REVERSE. The non-SLP path never cosiders that for grouped accesses. The easiest solution is to extend the existing VMAT_CONTIGUOUS demotion to VMAT_ELEMENTWISE for large groups to also cover VMAT_CONTIGUOUS_REVERSE.