https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98532
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|pinskia at gcc dot gnu.org |unassigned at gcc dot gnu.org Known to work| |12.1.0 Status|ASSIGNED |NEW --- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Starting in GCC 12 we produce: vect__1.5_10 = *a_4(D); vect__2.6_11 = VEC_PERM_EXPR <vect__1.5_10, vect__1.5_10, { 1, 0 }>; *b_6(D) = vect__2.6_11; ldr q0, [x0] ext v0.16b, v0.16b, v0.16b, #8 str q0, [x1] RTL level wise: Trying 8 -> 9: 8: r96:V2DI=unspec[r92:V2DI,r92:V2DI,0x1] 237 REG_DEAD r92:V2DI 9: [r98:DI]=r96:V2DI REG_DEAD r98:DI REG_DEAD r96:V2DI Failed to match this instruction: (set (mem:V2DI (reg:DI 98) [1 *b_6(D)+0 S16 A128]) (unspec:V2DI [ (reg:V2DI 92 [ vect__1.5 ]) repeated x2 (const_int 1 [0x1]) ] UNSPEC_EXT)) Trying 7, 8 -> 9: 7: r92:V2DI=[r97:DI] REG_DEAD r97:DI 8: r96:V2DI=unspec[r92:V2DI,r92:V2DI,0x1] 237 REG_DEAD r92:V2DI 9: [r98:DI]=r96:V2DI REG_DEAD r98:DI REG_DEAD r96:V2DI Failed to match this instruction: (set (mem:V2DI (reg:DI 98) [1 *b_6(D)+0 S16 A128]) (unspec:V2DI [ (mem:V2DI (reg:DI 97) [1 *a_4(D)+0 S16 A128]) repeated x2 (const_int 1 [0x1]) ] UNSPEC_EXT)) Maybe the aarch64 backend could have a pattern that matches the last 7,8 -> 9 combined rtl that then expands into a load pair/store pair with reversed registers.