https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115258
Bug ID: 115258
Summary: [14 Regression][aarch64] Additional XORs generated
after
r14-6290-g9f0f7d802482a8958d6cdc72f1fe0c8549db2182
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: manolis.tsamis at vrull dot eu
Target Milestone: ---
For the following testcase:
typedef int veci __attribute__ ((vector_size (4 * sizeof (int))));
void fun (veci *a, veci *b, veci *c) {
*c = __builtin_shufflevector (*a, *b, 0, 5, 2, 7);
}
After this commit we generate (at -O2)
adrp x3, .LC0
ldr q30, [x1]
ldr q31, [x0]
ldr q29, [x3, #:lo12:.LC0]
eor v31.16b, v31.16b, v30.16b
eor v30.16b, v31.16b, v30.16b
eor v31.16b, v31.16b, v30.16b
tbl v30.16b, {v30.16b - v31.16b}, v29.16b
str q30, [x2]
ret
Instead of
adrp x3, .LC0
ldr q30, [x0]
ldr q31, [x1]
ldr q29, [x3, #:lo12:.LC0]
tbl v30.16b, {v30.16b - v31.16b}, v29.16b
str q30, [x2]
ret
The 3 newly introduced eor instructions just swap the values of v30 and v31,
which are loaded in the reverse order compared to the old code.