------- Comment #58 from bonzini at gnu dot org  2009-05-06 09:56 -------
Uhm, it's better to run unpatched 4.5 with -O1 -fforward-propagate to get a
fair comparison.  Also, I was counting the loop headers, which are not part of
the hot code.

                   4.2 -O1     4.5 -O1 -ffw-prop     4.5 + patch -O1
LOOP 1                181         201                   180
INNER LOOP 1.1        117         118                   113
LOOP 2                27           27                    26

This shows that you should compare running the code (you can use direct.i) with
4.2/-O1 and 4.5/-O1 -fforward-propagate.  This is very important, otherwise
you're comparing apples to oranges.

fwprop is creating too high register pressure by creating offsets like these in
the loop header:

        leaq    -8(%r12), %rsi
        leaq    8(%r12), %r10
        leaq    -16(%r12), %r9
        leaq    -24(%r12), %rbx
        leaq    -32(%r12), %rbp
        leaq    -40(%r12), %rdi
        leaq    -48(%r12), %r11
        leaq    40(%r12), %rdx

Then, the additional register pressure is causing the bad scheduling we have in
the fast assembly outputs:

        movq    (%rdx), %rax
        movsd   (%rax,%r15,2), %xmm7
        movq    (%rdi), %r15
        movsd   (%rax,%r15,2), %xmm10
        movq    (%rbp), %r15
        movsd   (%rax,%r15,2), %xmm5
        movq    (%rbx), %r15
        movsd   (%rax,%r15,2), %xmm6
        movq    (%r9), %r15
        movsd   (%rax,%r15,2), %xmm15
        movq    (%rsi), %r15
        movsd   (%rax,%r15,2), %xmm11


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928

Reply via email to