On Mon, Apr 23, 2007 at 09:05:05PM +0300, Dorit Nuzman wrote: > "H. J. Lu" <[EMAIL PROTECTED]> wrote on 23/04/2007 01:34:39: > > > On Mon, Apr 23, 2007 at 12:55:26AM +0300, Dorit Nuzman wrote: > > > "H. J. Lu" <[EMAIL PROTECTED]> wrote on 23/04/2007 00:29:16: > > > > > > > On Sun, Apr 22, 2007 at 11:14:20PM +0300, Dorit Nuzman wrote: > > > > > "H. J. Lu" <[EMAIL PROTECTED]> wrote on 20/04/2007 18:02:09: > > > > > > > > > > > Hi Dorit, > > > > > > > > > > > > SSE4 has vector zero/sign-extensions like: > > > > > > > > > > > > (define_insn "sse4_1_zero_extendv2siv2di2" > > > > > > [(set (match_operand:V2DI 0 "register_operand" "=x") > > > > > > (zero_extend:V2DI > > > > > > (vec_select:V2SI > > > > > > (match_operand:V4SI 1 "nonimmediate_operand" "xm") > > > > > > (parallel [(const_int 0) > > > > > > (const_int 1)]))))] > > > > > > "TARGET_SSE4_1" > > > > > > "pmovzxdq\t{%1, %0|%0, %1}" > > > > > > [(set_attr "type" "ssemov") > > > > > > (set_attr "mode" "TI")]) > > > > > > > > > > > > Does vectorizer support them? > > > > > > > > > > > > > > > > (sorry, I was away from email during Friday-Saturday) - > > > > > > > > > > so this looks like a vec_unpacku_hi_v4si (or _lo?), i.e. what is > now > > > > > modeled as follows in sse.md: > > > > > > > > > > (define_expand "vec_unpacku_hi_v4si" > > > > > [(match_operand:V2DI 0 "register_operand" "") > > > > > (match_operand:V4SI 1 "register_operand" "")] > > > > > "TARGET_SSE2" > > > > > { > > > > > ix86_expand_sse_unpack (operands, true, true); > > > > > DONE; > > > > > }) > > > > > > > > > > > > > I am not sure if they are the same since SSE4.1 instructions > > > > extend the first 2 elements in the vector, not the high/low > > > > parts. > > > > > > > > > > unpack high/low means the high/low elements of the vector > > > > > > > SSE4.1 has > > > > 1. The first 8 elements of V16QI zero/sign externd to V8HI. > > This one is equivalent to vec_unpacku/s_hi_v16qi.
Did you mean vec_unpacku/s_lo_v16qi? > > > 2. The first 4 elements of V16QI/V8HI zero/sign externd to V4SI. > > The second of these two - "first 4 elements of V8HI zero/sign externd to > V4SI" - is equivalent to vec_unpacku/s_hi_v8hi. Did you mean vec_unpacku/s_lo_v8hi? > > > 2. The first 2 elements of V16QI/V8HI/V4SI zero/sign externd to V2DI. > > The last of these three - "first 2 elements of V4SI zero/sign extend to > V2DI" - is equivalent to vec_unpacku/s_hi_v4si. Did you mean vec_unpacku/s_lo_v4si? > > We currently don't have idioms that represent the other forms. > > By the way, the vectorizer will not be able to make use of these > vec_unpacku/s_hi_* insns if you don't define the corresponding > vec_unpacku/s_lo_* patterns (although I think these are already defined in > sse.md, though maybe less efficiently than the way sse4 can support them?). With my SSE4.1 patch applied, for typedef char vec_t; typedef short vecx_t; extern __attribute__((aligned(16))) vec_t x [64]; extern __attribute__((aligned(16))) vecx_t y [64]; void foo () { int i; for (i = 0; i < 64; i++) y [i] = x [i]; } I got movdqa x(%rip), %xmm0 movl $16, %eax pxor %xmm2, %xmm2 pmovsxbw %xmm0, %xmm1 movdqa %xmm1, y(%rip) movdqa %xmm2, %xmm1 pcmpgtb %xmm0, %xmm1 punpckhbw %xmm1, %xmm0 movdqa %xmm0, y+16(%rip) When extention is a single instruction, it is better to extend one low element at a time: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31667 H.J.