https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116075
Bug ID: 116075 Summary: Inefficient SVE INSR codegen Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: aarch64-sve, missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org Target Milestone: --- Target: aarch64 I'm using the testcase: #include <stdint.h> #define N 32000 uint8_t in[N]; uint8_t in2[N]; uint32_t foo (void) { uint32_t res = 0; for (int i = 0; i < N; i++) res += in[i]; return res; } compiling with -Ofast -mcpu=neoverse-v2 Ignoring the vector loop for now, in the preamble I see generated code: mov z31.b, #0 movprfx z30, z31 insr z30.s, wzr which seems inefficient as it just zeroes out z31 and z30.