------- Additional Comments From drab at kepler dot fjfi dot cvut dot cz 2005-01-04 13:51 ------- (In reply to comment #10) > Looking at the Intel reference documentation available from ftp://download.intel.com/design/ > Pentium4/manuals/25366614.pdf MOVQ has the following opcodes: > > 0F 6F /r MOVQ mm, mm/m64 Move quadword from mm/m64 to mm. > 0F 7F /r MOVQ mm/m64, mm Move quadword from mm to mm/m64. > F3 0F 7E MOVQ xmm1, xmm2/m64 Move quadword from xmm2/mem64 to xmm1. > 66 0F D6 MOVQ xmm2/m64, xmm1 Move quadword from xmm1 to xmm2/mem64. > > and since the two latter instructions are unsupported on AMD and Pentium III you would need some > other way to move data between the xmm registers and memory.
Those 0F 6F and 0F 7F are, however, standard MMX instructions. So when you use for instance -msse -mfpmath=sse -no-mmx those shouldn't be used as well (don't know why would anybody want to do that, but...). However when it is used only for copying (as in the example, that I porposed), there are other ways, such as using the following instructions: 0F 12 /r MOVLPS xmm, mem64 0F 13 /r MOVLPS mem64, xmm and even more 0F 16 /r MOVHPS xmm, mem64 0F 17 /r MOVHPS mem64, xmm It's true, that those are used for two single precision floats moving (into lower or higher half of the xmm reg.), but since it's only moving, it doesn't matter, because it just copies those 64bits into either lower or upper 64 bits of the xmm register. These could come quite handy, since it leaves the mmx/st registers available for other usage and when we consider only 64bit memory accesses, then it effectively adds doule the amount of xmm registers as additional 64bit registers. I think that might be worth considering, isn't it? And it is SSE only, so AthlonXP, PIII and others might benefit out of it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19235