http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54000



Steven Bosscher <steven at gcc dot gnu.org> changed:



           What    |Removed                     |Added

----------------------------------------------------------------------------

                 CC|                            |rguenth at gcc dot gnu.org

            Summary|[4.6/4.7/4.8                |[4.6/4.7/4.8 Regression]

                   |Regression][IVOPTS]         |Performance breakdown for

                   |Performance breakdown for   |gcc-4.{6,7} vs. gcc-4.5

                   |gcc-4.{6,7} vs. gcc-4.5     |using std::vector in matrix

                   |using std::vector in matrix |vector multiplication

                   |vector multiplication       |(IVopts / inliner)



--- Comment #9 from Steven Bosscher <steven at gcc dot gnu.org> 2013-02-22 
23:40:35 UTC ---

(In reply to comment #8)

> Thanks for the reduced testcase.  The innermost loops compare as follows:

> 

> 4.5:

> 

> .L7:

>         movsd   (%rbx,%rcx), %xmm0

>         addq    $8, %rcx

>         mulsd   0(%rbp,%rdx), %xmm0

>         addq    $8, %rdx

>         cmpq    $24, %rdx

>         addsd   %xmm0, %xmm1

>         movsd   %xmm1, (%rsi)

>         jne     .L7



4.8 r196182 with "--param early-inlining-insns=2" (2 x the default value):



.L13:   

        movsd   (%rdx), %xmm0

        addq    $8, %rdx

        mulsd   (%rsi,%rax), %xmm0

        addq    $8, %rax

        cmpq    $24, %rax

        addsd   %xmm0, %xmm1

        movsd   %xmm1, 8(%rdi,%rcx)

        jne     .L13





> 

> 4.7:

> 

> .L13:

>         movq    64(%rsp), %rdi

>         movq    80(%rsp), %rdx

>         addq    %rcx, %rdi

>         addq    %r8, %rdx

>         movsd   -8(%rax,%rdi), %xmm0

>         mulsd   (%rsi,%rax), %xmm0

>         addq    $8, %rax

>         cmpq    $24, %rax

>         addsd   (%rdx), %xmm0

>         movsd   %xmm0, (%rdx)

>         jne     .L13



This is similar to what 4.8 r196182 produces without inliner tweaks:



.L18:   

        movq    %rcx, %rdi

        addq    64(%rsp), %rdi

        movq    %r8, %rdx

        addq    80(%rsp), %rdx

        movsd   -8(%rax,%rdi), %xmm0

        mulsd   (%rsi,%rax), %xmm0

        addq    $8, %rax

        cmpq    $24, %rax

        addsd   (%rdx), %xmm0

        movsd   %xmm0, (%rdx)

        jne     .L18





> so we seem to have a register allocation / spilling issue here as well

> as a bad induction variable choice.  GCC 4.8 is not any better here.



All true, but in the end it looks like an inliner heuristics issue first

(as also suggested by comment #3).

Reply via email to