https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115663

            Bug ID: 115663
           Summary: outer loop vectorization with inner loop grouped
                    access and SLP should be possible
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

We do not support interleaving of accesses in the inner loop but SLP should
be possible if the group is contiguous with respect to the outer loop
evolution.

void foo (double * __restrict a, double *b, int n)
{
  for (int i = 0; i < 1024; ++i)
    {
      double res = a[i];
      for (int j = 0; j < 8; ++j)
        res += b[j * 16 + 2*i];
      a[i] = res;
    }
}

or

void foo (double * __restrict a, double *b, int n)
{
  for (int i = 0; i < 1024; ++i)
    {
      double res = a[i];
      for (int j = 0; j < 8; ++j)
        res += b[j * 16 + 2*i] + b[j * 16 + 2*i + 1];
      a[i] = res;
    }
}

should be possible to vectorize (the former is with a gap, the latter not).

In practice this is likely relevant for both image (pixel, w/ and w/o gap)
and complex numbers.

Reply via email to