https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104368

            Bug ID: 104368
           Summary: [12 Regression] Failure to vectorise conditional
                    grouped accesses after PR102659
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rsandifo at gcc dot gnu.org
  Target Milestone: ---

The following test regressed with PR102659, compiled with
-O3 -march=armv8.2-a+sve:

void f(int *restrict x, int *restrict y, int n)
{
  for (int i = 0; i < n; ++i)
    if (x[i] > 0)
      x[i] = y[i * 2] + y[i * 2 + 1];
}

Previously we treated the y[] accesses as a linear group
and so could use LD2W.  Now we treat them as individual
gather loads instead:

.L3:
        ld1w    z1.s, p0/z, [x0, x3, lsl 2]
        lsl     z0.s, z2.s, #1
        cmpgt   p0.s, p0/z, z1.s, #0
        ld1w    z1.s, p0/z, [x1, z0.s, sxtw 2]   // Gather
        ld1w    z0.s, p0/z, [x5, z0.s, sxtw 2]   // Gather
        add     z0.s, z1.s, z0.s
        st1w    z0.s, p0, [x0, x3, lsl 2]
        incw    z2.s
        add     x3, x3, x4
        whilelo p0.s, w3, w2
        b.any   .L3

Reply via email to