https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98532
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|pinskia at gcc dot gnu.org |unassigned at gcc dot
gnu.org
Known to work| |12.1.0
Status|ASSIGNED |NEW
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Starting in GCC 12 we produce:
vect__1.5_10 = *a_4(D);
vect__2.6_11 = VEC_PERM_EXPR <vect__1.5_10, vect__1.5_10, { 1, 0 }>;
*b_6(D) = vect__2.6_11;
ldr q0, [x0]
ext v0.16b, v0.16b, v0.16b, #8
str q0, [x1]
RTL level wise:
Trying 8 -> 9:
8: r96:V2DI=unspec[r92:V2DI,r92:V2DI,0x1] 237
REG_DEAD r92:V2DI
9: [r98:DI]=r96:V2DI
REG_DEAD r98:DI
REG_DEAD r96:V2DI
Failed to match this instruction:
(set (mem:V2DI (reg:DI 98) [1 *b_6(D)+0 S16 A128])
(unspec:V2DI [
(reg:V2DI 92 [ vect__1.5 ]) repeated x2
(const_int 1 [0x1])
] UNSPEC_EXT))
Trying 7, 8 -> 9:
7: r92:V2DI=[r97:DI]
REG_DEAD r97:DI
8: r96:V2DI=unspec[r92:V2DI,r92:V2DI,0x1] 237
REG_DEAD r92:V2DI
9: [r98:DI]=r96:V2DI
REG_DEAD r98:DI
REG_DEAD r96:V2DI
Failed to match this instruction:
(set (mem:V2DI (reg:DI 98) [1 *b_6(D)+0 S16 A128])
(unspec:V2DI [
(mem:V2DI (reg:DI 97) [1 *a_4(D)+0 S16 A128]) repeated x2
(const_int 1 [0x1])
] UNSPEC_EXT))
Maybe the aarch64 backend could have a pattern that matches the last 7,8 -> 9
combined rtl that then expands into a load pair/store pair with reversed
registers.