"David Mathog" <mat...@caltech.edu> writes:

> In my software SSE2 emulation I am currently using this sort of approach
> to extract data fields out of __m128i and __m128d vectors:
>
> #define EMM_SINT4(a)   ((int *)&(a))
>
> static __inline __m128i __attribute__((__always_inline__))
> _mm_slli_epi32 (__m128i __A, int __B)
> {
>   __v4si __tmp = {
>     EMM_SINT4(__A)[0] << __B,
>     EMM_SINT4(__A)[1] << __B,
>     EMM_SINT4(__A)[2] << __B,
>     EMM_SINT4(__A)[3] << __B};
>   return (__m128i)__tmp;
> }

You are accessing a value of type __m128i using a pointer of type int*.
That could be an aliasing violation.  Are you using the definition of
__m128i from emmintrin.h, which gives the type the __may_alias__
attribute?


> This works fine when testing one _mm function at a time, but does not
> work reliably in real programs unless -O0 is used.  I think at least
> part of the problem is that once the function is inlined the parameter
> __A is in some cases a  register variable, and the pointer method is not
> valid there.  To get around that I'm think of introducing an explicit
> local variable, like this:
>
> static __inline __m128i __attribute__((__always_inline__))
> _mm_slli_epi32 (__m128i __A, int __B)
> {
> __m128i A = __A;
>   __v4si __tmp = {
>     EMM_SINT4(A)[0] << __B,
>     EMM_SINT4(A)[1] << __B,
>     EMM_SINT4(A)[2] << __B,
>     EMM_SINT4(A)[3] << __B};
>   return (__m128i)__tmp;
> }

I can't see why that would make any difference.  You are permitted to
take the address of a parameter.  The compiler will do the right thing.

> I'm not sure that will work all the time either.  The only other
> approach I an aware of would be something like this:
>
> #typedef union {
>   __m128i           vi;
>   __m128d           vd;
>   int               s[4];
>   unsigned int      us[4];
>   /* etc. for other types */
> } emm_universal ;
>
> #define EMM_SINT4(a)   (a).s
>
> static __inline __m128i __attribute__((__always_inline__))
> _mm_slli_epi32 (__m128i __A, int __B)
> {
> emm_universal A;
>   A.vi = __A;
>  __v4si __tmp = {
>     EMM_SINT4(A)[0] << __B,
>     EMM_SINT4(A)[1] << __B,
>     EMM_SINT4(A)[2] << __B,
>     EMM_SINT4(A)[3] << __B};
>   return (__m128i)__tmp;
> }
>
> The union approach seems to be just a different a way to spin the
> pointer operations.  For gcc in particular, is one approach or the other
> to be preferred and why?

The advantage of using a union is that there is no aliasing violation.
gcc explicitly permits using a different type to access a value when
using a union.  I would use a union for that reason alone.

Ian

Reply via email to