https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115258
Bug ID: 115258 Summary: [14 Regression][aarch64] Additional XORs generated after r14-6290-g9f0f7d802482a8958d6cdc72f1fe0c8549db2182 Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: manolis.tsamis at vrull dot eu Target Milestone: --- For the following testcase: typedef int veci __attribute__ ((vector_size (4 * sizeof (int)))); void fun (veci *a, veci *b, veci *c) { *c = __builtin_shufflevector (*a, *b, 0, 5, 2, 7); } After this commit we generate (at -O2) adrp x3, .LC0 ldr q30, [x1] ldr q31, [x0] ldr q29, [x3, #:lo12:.LC0] eor v31.16b, v31.16b, v30.16b eor v30.16b, v31.16b, v30.16b eor v31.16b, v31.16b, v30.16b tbl v30.16b, {v30.16b - v31.16b}, v29.16b str q30, [x2] ret Instead of adrp x3, .LC0 ldr q30, [x0] ldr q31, [x1] ldr q29, [x3, #:lo12:.LC0] tbl v30.16b, {v30.16b - v31.16b}, v29.16b str q30, [x2] ret The 3 newly introduced eor instructions just swap the values of v30 and v31, which are loaded in the reverse order compared to the old code.