https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117093

--- Comment #4 from ktkachov at gcc dot gnu.org ---
(In reply to ktkachov from comment #3)
> If we remove the casts:
> uint32x4_t ror32_neon_tgt_gcc_bad(uint32x4_t r) {
>     uint32x4_t a = r;
>     uint32_t t;
>     t = a[0]; a[0] = a[1]; a[1] = t;
>     t = a[2]; a[2] = a[3]; a[3] = t;
>     return a;
> }
> Then this is successfully recognised as:
>   a_2 = VEC_PERM_EXPR <r_1(D), r_1(D), { 1, 0, 3, 2 }>;

In this case it's forwprop1 that optimises it.

Reply via email to