------- Additional Comments From guardia at sympatico dot ca  2005-01-29 19:21 
-------
Hum, ok we can do a "movd %mm0, %eax", that's why it gets combined... 

Well, I give up. The V8QI (and whatever) -> V2SI conversion seems to be causing
all the trouble here if we look at the RTL of something like:
__m64 moo(__v8qi mmx1)
{
   mmx1 = __builtin_ia32_punpcklbw (mmx1, mmx1);
   return mmx1;
}

It explicitly asks for a conversion to V2SI (__m64) that gets assigned to an xmm
register afterwards:
(insn 15 14 17 1 (set (reg:V8QI 58 [ D.2201 ])
        (reg:V8QI 62)) -1 (nil)
    (nil))

(insn 17 15 18 1 (set (reg:V2SI 63)
        (subreg:V2SI (reg:V8QI 58 [ D.2201 ]) 0)) -1 (nil)
    (nil))

(insn 18 17 19 1 (set (mem/i:V2SI (reg/f:SI 60 [ D.2206 ]) [0 <result>+0 S8 
A64])
        (reg:V2SI 63)) -1 (nil)
    (nil))

So... the only way to fix this would be to either make the register allocator
more intelligent (bug 19161), or to provide intrinsics like the Intel compiler
does with one to one mapping to instructions directly. right? That wouldn't be
such a bad idea, I think... instead of using the current __builtins for stuff in
*mmintrin.h, we could use a different set of builtins that only supports V2SI
and nothing else..? Well, that's going to be for another time ;)

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530

Reply via email to