https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118974

            Bug ID: 118974
           Summary: Use SVE cbranch sequence for Neon modes when
                    TARGET_SVE
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
                CC: tnfchris at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

For example, the testcase from the testsuite:

#define N 640
int a[N] = {0};
int b[N] = {0};

void f1 ()
{
  for (int i = 0; i < N; i++)
    {
      b[i] += a[i];
      if (a[i] > 0)
        break;
    }
}

When compiled with, say, -mcpu=neoverse-v2 will choose to vectorise with Neon
modes and emit:
        cmgt    v31.4s, v30.4s, #0
        umaxp   v31.4s, v31.4s, v31.4s
        fmov    x3, d31
        cbz     x3, .L2

for the early break check. But since this target supports SVE it could be using
the SVE sequence:
        cmpgt   p14.s, p7/z, z28.s, #0
        ptest   p15, p14.b
        b.none  .L3

which is a bit shorter and, if I read the Neoverse V2 optimisation guide
correctly, should take one less cycle.
In this particular case the compiler would know that the operand to the compare
came from a Neon load so the >128 bits are zero for VLA code. But if it can't
prove that generally it could still make this codegen decision with
-msve-vector-bits=128

Reply via email to