https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96135
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Note on the trunk for f and g at -O3 -msse4 (and -O3 on aarch64), GCC produces: _21 = VIEW_CONVERT_EXPR<vector(8) char>(i_2(D)); _22 = VEC_PERM_EXPR <_21, _21, { 7, 6, 5, 4, 3, 2, 1, 0 }>; _18 = VIEW_CONVERT_EXPR<long long int>(_22); But that VEC_PERM_EXPR is a bswap :). So to fix this at -O3 -msse4, maybe we could just do: (simplify (view_convert (vec_perm @0 @0 vector_cst_byteswap_p @1)) (if (INTERGAL_TYPE_P (type))) (convert (bswap (view_convert @1)))) Note I don't think we want to do the byteswap in the integer registers if we are going back to the floating point registers.