In the provided testcase, gcc spills an xmm register onto the stack even though there is only one register being used. This does not occur with similar code using general purpose registers.
<BEGIN TESTCASE> const void* test(int action, void* ptr) { static void * const addrs[] = {&&L1, &&L2}; if (action == 0) { return addrs; } else { char* ip = ptr; register double reg_f_a; double reg_f[1]; reg_f_a = 0.0; reg_f[0] = 0.0; goto *ip; L1: { int t1 = *(int*)(++ip); reg_f_a = reg_f_a + reg_f[t1]; goto *(++ip); } L2: *(double*)ptr = reg_f_a; } return 0; } <END TESTCASE> The above code compiled with -O3 -march=i686 -msse2 -mfpmath=sse produces the following bit of assembly <BEGIN OUTPUT> movl 1(%ebx), %eax addl $2, %ebx movsd -32(%ebp), %xmm0 addsd -16(%ebp,%eax,8), %xmm0 movl %ebx, %eax movsd %xmm0, -32(%ebp) jmp *%eax <END OUTPUT> The xmm0 register should remain the home register for reg_f_a, so there should be no need for the store/load. Other usages of xmm0 should be placed in xmm1. So the output should read: <BEGIN MODIFIED 1> movl 1(%ebx), %eax addl $2, %ebx addsd -16(%ebp,%eax,8), %xmm0 movl %ebx, %eax jmp *%eax <END MODIFIED 1> As a possibly related issue, there is also no reason why a copy of %ebx is made prior to performing the jump. This could just as easily be <BEGIN MODIFIED 2> movl 1(%ebx), %eax addl $2, %ebx addsd -16(%ebp,%eax,8), %xmm0 jmp *%ebx <END MODIFIED 2> So, as you can see, three out of the seven instructions can be removed, as well as two of four memory references. The version of gcc is 4.2.3 (Ubuntu 4.2.3-2ubuntu7) -- Summary: register allocation spills floats needlessly Product: gcc Version: 4.2.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jstrother9109 at gmail dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37488