https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118974
Bug ID: 118974 Summary: Use SVE cbranch sequence for Neon modes when TARGET_SVE Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org CC: tnfchris at gcc dot gnu.org Target Milestone: --- Target: aarch64 For example, the testcase from the testsuite: #define N 640 int a[N] = {0}; int b[N] = {0}; void f1 () { for (int i = 0; i < N; i++) { b[i] += a[i]; if (a[i] > 0) break; } } When compiled with, say, -mcpu=neoverse-v2 will choose to vectorise with Neon modes and emit: cmgt v31.4s, v30.4s, #0 umaxp v31.4s, v31.4s, v31.4s fmov x3, d31 cbz x3, .L2 for the early break check. But since this target supports SVE it could be using the SVE sequence: cmpgt p14.s, p7/z, z28.s, #0 ptest p15, p14.b b.none .L3 which is a bit shorter and, if I read the Neoverse V2 optimisation guide correctly, should take one less cycle. In this particular case the compiler would know that the operand to the compare came from a Neon load so the >128 bits are zero for VLA code. But if it can't prove that generally it could still make this codegen decision with -msve-vector-bits=128