https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117093
--- Comment #3 from ktkachov at gcc dot gnu.org --- I think it's the VIEW_CONVERT_EXPR that are hurting us (more complete dump before expand): _1 = VIEW_CONVERT_EXPR<uint32x4_t>(r_3(D)); t_4 = BIT_FIELD_REF <r_3(D), 32, 0>; a_5 = VEC_PERM_EXPR <_1, _1, { 1, 1, 2, 3 }>; a_6 = BIT_INSERT_EXPR <a_5, t_4, 32 (32 bits)>; t_7 = BIT_FIELD_REF <r_3(D), 32, 64>; _2 = BIT_FIELD_REF <r_3(D), 32, 96>; a_8 = BIT_INSERT_EXPR <a_6, _2, 64 (32 bits)>; a_9 = BIT_INSERT_EXPR <a_8, t_7, 96 (32 bits)>; _10 = VIEW_CONVERT_EXPR<uint64x2_t>(a_9); return _10; If we remove the casts: uint32x4_t ror32_neon_tgt_gcc_bad(uint32x4_t r) { uint32x4_t a = r; uint32_t t; t = a[0]; a[0] = a[1]; a[1] = t; t = a[2]; a[2] = a[3]; a[3] = t; return a; } Then this is successfully recognised as: a_2 = VEC_PERM_EXPR <r_1(D), r_1(D), { 1, 0, 3, 2 }>;