On Mittwoch, 1. August 2018 18:51:41 CEST Marc Glisse wrote: > On Wed, 1 Aug 2018, Allan Sandfeld Jensen wrote: > > extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, > > > > __artificial__)) > > > > _mm_move_sd (__m128d __A, __m128d __B) > > { > > > > - return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B); > > + return __extension__ (__m128d)(__v2df){__B[0],__A[1]}; > > > > } > > If the goal is to have it represented as a VEC_PERM_EXPR internally, I > wonder if we should be explicit and use __builtin_shuffle instead of > relying on some forwprop pass to transform it. Maybe not, just asking. And > the answer need not even be the same for _mm_move_sd and _mm_move_ss.
I forgot. One of the things that makes using __builtin_shuffle ugly is that __v4si as the suffle argument needs to be in _mm_move_ss, is declared in emmintrin.h, but _mm_move_ss is in xmmintrin.h. In general the gcc __builtin_shuffle syntax with the argument being a vector is kind of ackward. At least for the declaring intrinsics, the clang still where the permutator is extra argument is easier to deal with: __builtin_shuffle(a, b, (__v4si){4, 0, 1, 2}) vs __builtin_shuffle(a, b, 4, 0, 1, 2)