https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120396
Bug ID: 120396 Summary: unprofitable SLP vectorization, leaves scalar parts live Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- A variant of PR 109892. static double muladd(double x, double y, double z) { return x * y + z; } double g(double x[], long n) { double r0 = 0, r1 = 0; for (; n; x += 2, n--) { r0 = muladd(x[0], x[0], r0); r1 = muladd(x[1], x[1], r1); x[0] = r0; x[1] = r1; } return r0 + r1; } The SLP-vectorized loop at -O2 -mfma (or plain -O2 on AArch64) does strictly more work than a scalar loop.