http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349
--- Comment #4 from Ondrej Bilka <neleai at seznam dot cz> 2013-04-27 01:06:45 UTC --- I found that AMD Bulldozer optimization guide states that moves from xmm to GPR register should be done directly:" 10.4 Moving Data Between General-Purpose and XMM/YMM Registers When moving data from a GPR to an XMM register, use separate store and load instructions to move the data first from the source register to a temporary location in memory and then from memory into the destination register, taking the memory latency into account when scheduling both stages of the load-store sequence. When moving data from an XMM register to a general-purpose register, use the VMOVD instruction. Whenever possible, use loads and stores of the same data length. (See 6.3, `Store-to-Load Forwarding Restrictions" on page 98 for more information.) "