4.3 Regression] Inlining related regression for gcc-4.x

jakub at gcc dot gnu dot org Thu, 22 Nov 2007 08:41:23 -0800


------- Comment #9 from jakub at gcc dot gnu dot org  2007-11-22 16:41 -------
On x86_64-linux -m64 with -O2 gcc doesn't hoist movabsq insns out of the loops,
which can give some performance back:


time ./pr23305-slow

real    0m4.028s
user    0m4.023s
sys     0m0.003s
time ./pr23305-slow2

real    0m3.436s
user    0m3.434s
sys     0m0.001s

when I hoist it by hand in assembly:

--- pr23305-slow.s      2007-11-22 17:14:09.000000000 +0100
+++ pr23305-slow2.s     2007-11-22 17:31:31.000000000 +0100
@@ -222,16 +222,16 @@ _Z13s000005a_testv:
 .LVL2:
 .LBB329:
 .LBB330:
        .loc 1 28697 0
        cmpq    %rax, %rdx
        je      .L13
+       movabsq $4613937818241073152, %r8
        .p2align 4,,10
        .p2align 3
 .L14:
-       movabsq $4613937818241073152, %r8
        movq    %r8, (%rax)
        addq    $8, %rax
        cmpq    %rax, %rdx
        jne     .L14
 .L13:
 .LBE330:
@@ -242,17 +242,17 @@ _Z13s000005a_testv:
 .LVL3:
 .LBB326:
 .LBB327:
        .loc 1 28697 0
        cmpq    %rax, %rdx
        je      .L15
+       movabsq $4613937818241073152, %rdi
        .p2align 4,,10
        .p2align 3
 .L16:
 .LBE327:
-       movabsq $4613937818241073152, %rdi
        movq    %rdi, (%rax)
 .LBB328:
        addq    $8, %rax
        cmpq    %rax, %rdx
        jne     .L16
 .L15:

but still the -O2 -fno-inline-small-functions version is much faster:

time ./pr23305-fast

real    0m1.591s
user    0m1.588s
sys     0m0.001s


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305

[Bug tree-optimization/23305] [4.0/4.1/4.2/4.3 Regression] Inlining related regression for gcc-4.x

Reply via email to