------- Additional Comments From guardia at sympatico dot ca 2005-06-21 13:26 ------- Hum, it will be interesting to test this (it will have to wait a couple of weeks), but the problem with this here is that there is no "mov" instructions that can move stuff between MMX registers and SSE registers (MOVQ can't do it). In SSE2, there is one (MOVQ), but not in the original SSE. So the compiler generates movlps instructions from/to memory from/to SSE registers along MMX calculations, and, in the original SSE case, ends up not being able to reduce anymore than MMx->memory->XMMx->memory->MMx again for data that should have stayed in MMX registers all along... it does not realize up front how expensive it is to use XMM registers on "SSE1" along with MMX instructions.
-- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530