On Mon, Apr 23, 2007 at 09:05:05PM +0300, Dorit Nuzman wrote:
> "H. J. Lu" <[EMAIL PROTECTED]> wrote on 23/04/2007 01:34:39:
> 
> > On Mon, Apr 23, 2007 at 12:55:26AM +0300, Dorit Nuzman wrote:
> > > "H. J. Lu" <[EMAIL PROTECTED]> wrote on 23/04/2007 00:29:16:
> > >
> > > > On Sun, Apr 22, 2007 at 11:14:20PM +0300, Dorit Nuzman wrote:
> > > > > "H. J. Lu" <[EMAIL PROTECTED]> wrote on 20/04/2007 18:02:09:
> > > > >
> > > > > > Hi Dorit,
> > > > > >
> > > > > > SSE4 has vector zero/sign-extensions like:
> > > > > >
> > > > > > (define_insn "sse4_1_zero_extendv2siv2di2"
> > > > > >   [(set (match_operand:V2DI 0 "register_operand" "=x")
> > > > > >         (zero_extend:V2DI
> > > > > >            (vec_select:V2SI
> > > > > >               (match_operand:V4SI 1 "nonimmediate_operand" "xm")
> > > > > >               (parallel [(const_int 0)
> > > > > >                          (const_int 1)]))))]
> > > > > >   "TARGET_SSE4_1"
> > > > > >   "pmovzxdq\t{%1, %0|%0, %1}"
> > > > > >   [(set_attr "type" "ssemov")
> > > > > >    (set_attr "mode" "TI")])
> > > > > >
> > > > > > Does vectorizer support them?
> > > > > >
> > > > >
> > > > > (sorry, I was away from email during Friday-Saturday) -
> > > > >
> > > > > so this looks like a vec_unpacku_hi_v4si (or _lo?), i.e. what is
> now
> > > > > modeled as follows in sse.md:
> > > > >
> > > > > (define_expand "vec_unpacku_hi_v4si"
> > > > >   [(match_operand:V2DI 0 "register_operand" "")
> > > > >    (match_operand:V4SI 1 "register_operand" "")]
> > > > >   "TARGET_SSE2"
> > > > > {
> > > > >   ix86_expand_sse_unpack (operands, true, true);
> > > > >   DONE;
> > > > > })
> > > > >
> > > >
> > > > I am not sure if they are the same since SSE4.1 instructions
> > > > extend the first 2 elements in the vector, not the high/low
> > > > parts.
> > > >
> > >
> > > unpack high/low means the high/low elements of the vector
> > >
> >
> > SSE4.1 has
> >
> > 1.  The first 8 elements of V16QI zero/sign externd to V8HI.
> 
> This one is equivalent to vec_unpacku/s_hi_v16qi.

Did you mean vec_unpacku/s_lo_v16qi?

> 
> > 2.  The first 4 elements of V16QI/V8HI zero/sign externd to V4SI.
> 
> The second of these two - "first 4 elements of V8HI zero/sign externd to
> V4SI" - is equivalent to vec_unpacku/s_hi_v8hi.

Did you mean vec_unpacku/s_lo_v8hi?

> 
> > 2.  The first 2 elements of V16QI/V8HI/V4SI zero/sign externd to V2DI.
> 
> The last of these three - "first 2 elements of V4SI zero/sign extend to
> V2DI" - is equivalent to vec_unpacku/s_hi_v4si.

Did you mean vec_unpacku/s_lo_v4si?

> 
> We currently don't have idioms that represent the other forms.
> 
> By the way, the vectorizer will not be able to make use of these
> vec_unpacku/s_hi_* insns if you don't define the corresponding
> vec_unpacku/s_lo_* patterns (although I think these are already defined in
> sse.md, though maybe less efficiently than the way sse4 can support them?).

With my SSE4.1 patch applied, for

typedef char vec_t;
typedef short vecx_t;

extern __attribute__((aligned(16))) vec_t x [64];
extern __attribute__((aligned(16))) vecx_t y [64];

void
foo ()
{
  int i;

  for (i = 0; i < 64; i++)
    y [i]  = x [i];
}

I got

        movdqa  x(%rip), %xmm0
        movl    $16, %eax
        pxor    %xmm2, %xmm2
        pmovsxbw        %xmm0, %xmm1
        movdqa  %xmm1, y(%rip)
        movdqa  %xmm2, %xmm1
        pcmpgtb %xmm0, %xmm1
        punpckhbw       %xmm1, %xmm0
        movdqa  %xmm0, y+16(%rip)

When extention is a single instruction, it is better to extend one low
element at a time:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31667


H.J.

Reply via email to