https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756
--- Comment #13 from rguenther at suse dot de <rguenther at suse dot de> --- On Thu, 19 May 2016, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756 > > --- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #11) > > > Index: gcc/config/i386/i386.c > > =================================================================== > > --- gcc/config/i386/i386.c (revision 236441) > > +++ gcc/config/i386/i386.c (working copy) > ... > > given the plethora of shuffling intrinsics this might be quite tedious > > work... > > The builtins aren't guaranteed to be usable directly, only the intrinsics are, > so if we want to do the above, we should just kill those builtins instead and > use __builtin_shuffle directly in the headers (plus of course each time verify > that we get the corresponding or better insn sequence). Yes, but that will result in sth like extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_shuffle_ps (__m128 __A, __m128 __B, int const __mask) { return (__m128) __builtin_shuffle2 ((__v4sf)__A, ((__v4sf)__B, (__v4si) { __mask & 3, (__mask >> 2) & 3, ((__mask >> 4) & 3) + 4, ((__mask >> 6) & 3) + 4) }); } (not sure if we still need the !__OPTIMIZE__ path or what we should do for that in general in the above context - once !__OPTIMIZE__ would no longer constant-fold or so) But if this would be the prefered way of addressing this that's clearly better than "folding" the stuff back.