https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117606

            Bug ID: 117606
           Summary: single element interleaving behavior for SLP does not
                    exactly match non-SLP behavior
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

For the following, derived from gcc.target/aarch64/sve/strided_load_5.c

void foo (unsigned *restrict dest, unsigned *src, int n)
{
  for (int i = 0; i < n; ++i)
    dest[i] += src[i * 5];
}

when not using SLP we classify the load from src[] as VMAT_ELEMENTWISE and
thus consider using gathers.  This is because we do not have general permute
support there but only load-lanes and the static set of "group loads" and
both vect_load_lanes_supported and vect_grouped_load_supported FAIL
(on aarch64, riscv supports ld5).

With SLP this gets classified as (permuted) VMAT_CONTIGUOUS since load-lanes
isn't supported.  We do not attempt to lower the permutation to something
supported and thus run into the 3 vector limit.  The fallback to
VMAT_ELEMENTWISE is only for groups larger than the vector size so it doesn't
apply here.

One possible approach would be to check for the permute to be supported
during VMAT classification rather than after the fact in
vectorizable_load/store.  That matches how we verify
vect_grouped_load_supported.

Reply via email to