Hi!

On Wed, Jun 02, 2021 at 05:13:15PM -0500, Paul A. Clarke wrote:
> Add a naive implementation of the subject x86 intrinsic to
> ease porting.

> +/* Return horizontal packed word minimum and its index in bits [15:0]
> +   and bits [18:16] respectively.  */
> +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_minpos_epu16 (__m128i __A)
> +{
> +  union __u
> +    {
> +      __m128i __m;
> +      __v8hu __uh;
> +    };
> +  union __u __u = { .__m = __A }, __r = { .__m = {0} };
> +  unsigned short __ridx = 0;
> +  unsigned short __rmin = __u.__uh[__ridx];
> +  for (unsigned long __i = __ridx+1;

(spaces around the "+"?)

> +       __i < sizeof (__u.__uh) / sizeof (__u.__uh[0]);

You should either use a macro for that, or just write "8" :-)

> +       __i++)
> +    {
> +      if (__u.__uh[__i] < __rmin)
> +        {
> +          __rmin = __u.__uh[__i];
> +          __ridx = __i;
> +        }
> +    }
> +  __r.__uh[0] = __rmin;
> +  __r.__uh[1] = __ridx;
> +  return __r.__m;
> +}

This does not compute the index correctly for big endian (it needs to
walk from right to left for that).  The construction of the return value
looks wrong as well.

Okay for trunk with that fixed.  Thanks!


Segher

Reply via email to