https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120398
Bug ID: 120398 Summary: vectorization emits shuffles followed by scalar adds Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-* Another variant of PR 109892. GCC manages to emit vector multiplications at -O2, but corresponding additions are all scalar, and there's tons of shuffles in between. At the same time, on AArch64 this loop is vectorized properly. static float muladd(float x, float y, float z) { return x * y + z; } float g(float x[], long n) { float r0 = 0, r1 = 0; for (; n; x += 2, n--) { r0 = muladd(x[0], x[0], r0); r1 = muladd(x[1], x[1], r1); } return r0 + r1; }