------- Additional Comments From guardia at sympatico dot ca 2005-01-29 04:47 ------- Hum, there apparently seems to be a problem with the optimization stages.. I cooked up another snippet :
void moo(__m64 i, unsigned int *r) { unsigned int tmp = __builtin_ia32_vec_ext_v2si (i, 0); *r = tmp; } With -O0 -mmmx we get: movd %mm0, -4(%ebp) movl 8(%ebp), %edx movl -4(%ebp), %eax movl %eax, (%edx) Which with -O3 gets reduced to: movl 8(%ebp), %eax movd %mm0, (%eax) Now, clearly it understands that "movd" is the same as "movl", except they work on different registers on an MMX only machine. With "movlps" and "movq" it should do the same I think? If the optimization stages can work this out, maybe we wouldn't need to rewrite the MMX/SSE1 support... (BTW, correction, when I said 200+ instructions to schedule, I meant per function. I have a dozen such functions with 200+ instructions, and it ain't going to get any smaller) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530