https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117093
--- Comment #4 from ktkachov at gcc dot gnu.org --- (In reply to ktkachov from comment #3) > If we remove the casts: > uint32x4_t ror32_neon_tgt_gcc_bad(uint32x4_t r) { > uint32x4_t a = r; > uint32_t t; > t = a[0]; a[0] = a[1]; a[1] = t; > t = a[2]; a[2] = a[3]; a[3] = t; > return a; > } > Then this is successfully recognised as: > a_2 = VEC_PERM_EXPR <r_1(D), r_1(D), { 1, 0, 3, 2 }>; In this case it's forwprop1 that optimises it.