https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117344

            Bug ID: 117344
           Summary: Suboptimal use of movprfx in SVE intrinsics code
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: aarch64-sve, missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

I'm not sure how bad this is in real code but spotted the testcase:
#include <arm_sve.h>

svint32_t foo(svbool_t pg, svint32_t a, svint32_t b)
{
  b = svadd_m (pg, b, a);
  return b;
}

This will generate with e.g. -O2 -march=armv9-a
foo:
        mov     z31.d, z0.d
        movprfx z0, z1
        add     z0.s, p0/m, z0.s, z31.s
        ret

but LLVM can do:
foo:
        add     z1.s, p0/m, z1.s, z0.s
        mov     z0.d, z1.d
        ret

Reply via email to