https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117605

            Bug ID: 117605
           Summary: SLP vectorization fails for negative stride
                    interleaving
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

Testing with --param vect-force-slp=1 reveals (via
gcc.target/aarch64/sve/strided_load_4.c and a few others) that we do not SLP
vectorize

void foo (unsigned *restrict dest, unsigned *src, int n)
{
  for (int i = 0; i < n; ++i)
    dest[i] += src[i * -100];
}

note:   Detected single element interleaving *_8 step -400

missed:   permutation requires at least three vectors _9 = *_8;

the non-SLP path classifies this as VMAT_ELEMENTWISE, SLP as
VMAT_CONTIGUOUS_REVERSE.  The non-SLP path never cosiders that for
grouped accesses.

The easiest solution is to extend the existing VMAT_CONTIGUOUS demotion
to VMAT_ELEMENTWISE for large groups to also cover VMAT_CONTIGUOUS_REVERSE.

Reply via email to