https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756

--- Comment #13 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 19 May 2016, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756
> 
> --- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #11)
> 
> > Index: gcc/config/i386/i386.c
> > ===================================================================
> > --- gcc/config/i386/i386.c      (revision 236441)
> > +++ gcc/config/i386/i386.c      (working copy)
> ...
> > given the plethora of shuffling intrinsics this might be quite tedious
> > work...
> 
> The builtins aren't guaranteed to be usable directly, only the intrinsics are,
> so if we want to do the above, we should just kill those builtins instead and
> use __builtin_shuffle directly in the headers (plus of course each time verify
> that we get the corresponding or better insn sequence).

Yes, but that will result in sth like

extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
_mm_shuffle_ps (__m128 __A, __m128 __B, int const __mask)
{
  return (__m128) __builtin_shuffle2 ((__v4sf)__A, ((__v4sf)__B,
                (__v4si) { __mask & 3, (__mask >> 2) & 3,
                           ((__mask >> 4) & 3) + 4, ((__mask >> 6) & 3) + 4)
});
}

(not sure if we still need the !__OPTIMIZE__ path or what we should do for
that in general in the above context - once  !__OPTIMIZE__ would no
longer constant-fold or so)

But if this would be the prefered way of addressing this that's clearly
better than "folding" the stuff back.

Reply via email to