https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120398

            Bug ID: 120398
           Summary: vectorization emits shuffles followed by scalar adds
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64-*-*

Another variant of PR 109892. GCC manages to emit vector multiplications at
-O2, but corresponding additions are all scalar, and there's tons of shuffles
in between. At the same time, on AArch64 this loop is vectorized properly.

static float muladd(float x, float y, float z)
{
    return x * y + z;
}
float g(float x[], long n)
{
    float r0 = 0, r1 = 0;
    for (; n; x += 2, n--) {
        r0 = muladd(x[0], x[0], r0);
        r1 = muladd(x[1], x[1], r1);
    }
    return r0 + r1;
}

Reply via email to