https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116075

            Bug ID: 116075
           Summary: Inefficient SVE INSR codegen
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: aarch64-sve, missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

I'm using the testcase:
#include <stdint.h>
#define N 32000
uint8_t in[N];
uint8_t in2[N];

uint32_t
foo (void)
{
  uint32_t res = 0;
  for (int i = 0; i < N; i++)
    res += in[i];
  return res;
}

compiling with -Ofast -mcpu=neoverse-v2
Ignoring the vector loop for now, in the preamble I see generated code:
        mov     z31.b, #0
        movprfx z30, z31
        insr    z30.s, wzr

which seems inefficient as it just zeroes out z31 and z30.

Reply via email to