------- Comment #6 from vmakarov at redhat dot com 2010-02-09 19:56 ------- The patch which I'll send in a few minutes solves the problem. The patch avoids the creation of shuffle copies if an involved operand should be bound to some other operand in the current insn. The test code generated with the patch looks like
.L2: movapd %xmm0, %xmm8 subsd %xmm3, %xmm8 movsd a(%rax), %xmm6 mulsd %xmm8, %xmm8 movsd b(%rax), %xmm7 subsd %xmm8, %xmm7 movsd %xmm7, b(%rax) leaq 8(%rax), %r10 movapd %xmm0, %xmm5 subsd %xmm6, %xmm5 movsd a(%r10), %xmm3 mulsd %xmm5, %xmm5 movsd b(%r10), %xmm4 subsd %xmm5, %xmm4 movsd %xmm4, b(%r10) leaq 16(%rax), %r9 movapd %xmm0, %xmm1 subsd %xmm3, %xmm1 movsd a(%r9), %xmm15 mulsd %xmm1, %xmm1 movsd b(%r9), %xmm2 subsd %xmm1, %xmm2 movsd %xmm2, b(%r9) leaq 24(%rax), %r8 SPEC2000 benchmarking on x86/x86_64 (Core i7) shows that the patch usage results in a bit better code. x86: The code is different on gzip, vpr, gcc, crafty, perlbmk, gap, vortex, bzip2, twolf and mesa. The patch results in always not bigger code (in average about 0.02% smaller). The rate is a bit better with patch but practically the same (the biggest improvement is on crafty and perlbmk about 1%). x86_64: The code is different on gzip, vpr, gcc, crafty, parser, perlbmk, gap, vortex, bzip2, twolf and mesa, art, ammp. The patch results in average about 0.01% smaller code. The rate is a bit better with patch but practically the same (the biggest improvement is on vortex 1.3% and on crafty and bzip2 0.7%). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42973