------- Additional Comments From guardia at sympatico dot ca 2005-01-27 02:30 ------- Ok ok, SSE is not enabled by default on Athlon...
So, is there some sort of "pragma" that could be used to disable SSE registers (force -mmmx sort of) for only part of some code? The way I see it, the problem seems to be that gcc views __m64 and __m128 as the same kind of variables, when they are not. __m64 should always be on mmx registers, and __m128 should always be on xmm registers. Actually, Intel created a new type __m128d, instead of trying to guess which out of integer or float instructions one should use for stuff like MOVDQA.. We can easily see that gcc is trying to put an __m64 variable on xmm registers in moo2.i . I can also prevent it from using an xmm register by using only __v8qi variables (which are invalid ie.: too small on xmm registers): __v8qi moo(__v8qi mmx1) { mmx1 = __builtin_ia32_punpcklbw (mmx1, mmx1); return mmx1; } tadam! no movss or movlps... Shouldn't gcc not try to place __m64 variables on xmm registers? If one wants to use an xmm register, one should use __m128 or __m128d (or at least a cast from a __m64 pointer), even on the Pentium 4, I think it makes sense, because moving stuff from mmx registers to xmm registers is not so cheap either.. If one wants to move one 32 bit integer to a mmx register, that should be the job of a specialized intrinsics (_mm_cvtsi32_si64) which maps to a MOVD instruction. And if one wants to load a 64 bit something into an xmm register, that should be the job of _mm_load_ss (and other such functions). At the moment, these intrinsics (_mm_cvtsi32_si64, _mm_load_ss) do NOT generate a mov instruction by themselves.. they go through a process (from what I can understand of i386.c) of "vector initialization" which starts generating mov instructions from MMX, SSE or SSE2 sets without discrimination... In my mind _mm_cvtsi32_si64 should generate a MOVD, and _mm_load_ss a MOVSS, period. Just like __builtin_ia32_punpcklbw generates a PUNPCKLBW. Does it make sense? Is this what you mean by a complete rewrite or were you thinking of something else? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530