On Wed, 23 Nov 2011, Leith Bade wrote:
I have been hand optimising a loop that GCC 4.6 was not able to vectorise.I have been keeping an eye on the assembly output of this loop and have noticed GCC inserting unnecessary MOVAPS instructions.
Yes, there are several bugzilla entries showing that the register allocator is doing a fairly poor job on SSE/AVX registers...
-- Marc Glisse