https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115258

            Bug ID: 115258
           Summary: [14 Regression][aarch64] Additional XORs generated
                    after
                    r14-6290-g9f0f7d802482a8958d6cdc72f1fe0c8549db2182
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: manolis.tsamis at vrull dot eu
  Target Milestone: ---

For the following testcase:

  typedef int veci __attribute__ ((vector_size (4 * sizeof (int))));
  void fun (veci *a, veci *b, veci *c) {
    *c = __builtin_shufflevector (*a, *b, 0, 5, 2, 7);
  }

After this commit we generate (at -O2)

        adrp    x3, .LC0
        ldr     q30, [x1]
        ldr     q31, [x0]
        ldr     q29, [x3, #:lo12:.LC0]
        eor     v31.16b, v31.16b, v30.16b
        eor     v30.16b, v31.16b, v30.16b
        eor     v31.16b, v31.16b, v30.16b
        tbl     v30.16b, {v30.16b - v31.16b}, v29.16b
        str     q30, [x2]
        ret

Instead of

        adrp    x3, .LC0
        ldr     q30, [x0]
        ldr     q31, [x1]
        ldr     q29, [x3, #:lo12:.LC0]
        tbl     v30.16b, {v30.16b - v31.16b}, v29.16b
        str     q30, [x2]
        ret

The 3 newly introduced eor instructions just swap the values of v30 and v31,
which are loaded in the reverse order compared to the old code.

Reply via email to