https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104368
Bug ID: 104368 Summary: [12 Regression] Failure to vectorise conditional grouped accesses after PR102659 Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- The following test regressed with PR102659, compiled with -O3 -march=armv8.2-a+sve: void f(int *restrict x, int *restrict y, int n) { for (int i = 0; i < n; ++i) if (x[i] > 0) x[i] = y[i * 2] + y[i * 2 + 1]; } Previously we treated the y[] accesses as a linear group and so could use LD2W. Now we treat them as individual gather loads instead: .L3: ld1w z1.s, p0/z, [x0, x3, lsl 2] lsl z0.s, z2.s, #1 cmpgt p0.s, p0/z, z1.s, #0 ld1w z1.s, p0/z, [x1, z0.s, sxtw 2] // Gather ld1w z0.s, p0/z, [x5, z0.s, sxtw 2] // Gather add z0.s, z1.s, z0.s st1w z0.s, p0, [x0, x3, lsl 2] incw z2.s add x3, x3, x4 whilelo p0.s, w3, w2 b.any .L3