------- Comment #6 from vmakarov at redhat dot com  2010-02-09 19:56 -------
  The patch which I'll send in a few minutes solves the problem.  The patch
avoids the creation of shuffle copies if an involved operand should be bound to
some other operand in the current insn.  The test code generated with the patch
looks like

.L2:
        movapd  %xmm0, %xmm8
        subsd   %xmm3, %xmm8
        movsd   a(%rax), %xmm6
        mulsd   %xmm8, %xmm8
        movsd   b(%rax), %xmm7
        subsd   %xmm8, %xmm7
        movsd   %xmm7, b(%rax)
        leaq    8(%rax), %r10
        movapd  %xmm0, %xmm5
        subsd   %xmm6, %xmm5
        movsd   a(%r10), %xmm3
        mulsd   %xmm5, %xmm5
        movsd   b(%r10), %xmm4
        subsd   %xmm5, %xmm4
        movsd   %xmm4, b(%r10)
        leaq    16(%rax), %r9
        movapd  %xmm0, %xmm1
        subsd   %xmm3, %xmm1
        movsd   a(%r9), %xmm15
        mulsd   %xmm1, %xmm1
        movsd   b(%r9), %xmm2
        subsd   %xmm1, %xmm2
        movsd   %xmm2, b(%r9)
        leaq    24(%rax), %r8


SPEC2000 benchmarking on x86/x86_64 (Core i7) shows that the patch usage
results in a bit better code.

 x86: The code is different on gzip, vpr, gcc, crafty, perlbmk, gap,
vortex, bzip2, twolf and mesa.  The patch results in always not bigger
code (in average about 0.02% smaller).  The rate is a bit better with
patch but practically the same (the biggest improvement is on crafty
and perlbmk about 1%).

  x86_64: The code is different on gzip, vpr, gcc, crafty, parser, perlbmk,
gap,
vortex, bzip2, twolf and mesa, art, ammp.  The patch results in
average about 0.01% smaller code.  The rate is a bit better with patch
but practically the same (the biggest improvement is on vortex 1.3%
and on crafty and bzip2 0.7%).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42973

Reply via email to