Hello all,
In the work I'm doing on my new book, I'm trying to show how modern
compiler optimizations can eliminate a good deal of the overhead
introduced by an modular/unit-testable design. In verifying some of my
text, I found that GCC 4.4 and 4.5 (20091018, Ubuntu 9.10 package) isn't
doing an optimization that I expected it to do:
class Calculable
{
public:
virtual unsigned char calculate() = 0;
};
class X : public Calculable
{
public:
unsigned char calculate() { return 1; }
};
class Y : public Calculable
{
public:
unsigned char calculate() { return 2; }
};
static void print(Calculable& c)
{
printf("%d\n", c.calculate());
printf("+1: %d\n", c.calculate() + 1);
}
int main()
{
X x;
Y y;
print(x);
print(y);
return 0;
}
GCC 4.5 (and 4.4.1) generates this approximate code:
~/src $ /usr/lib/gcc-snapshot/bin/g++ -O3 -ftree-loop-ivcanon -fivopts
-ftree-loop-im -fwhole-program -fipa-struct-reorg -fipa-matrix-reorg
-fgcse-sm -fgcse-las -fgcse-after-reload --param max-gcse-memory=100000000
--param max-pending-list-length=100000 folding-test-interface.cpp -o
folding-test-interface_gcc450_20091018-O3-kitchen-sink
~/src$ objdump -Mintel -S
folding-test-interface_gcc450_20091018-O3-kitchen-sink | less -p \<main
0000000000400310 <main>:
400310: 53 push rbx
400311: 48 83 ec 20 sub rsp,0x20
400315: 48 8d 5c 24 10 lea rbx,[rsp+0x10]
40031a: 48 c7 44 24 10 c0 04 mov QWORD PTR
[rsp+0x10],0x4004c0
400321: 40 00
400323: 48 c7 04 24 00 05 40 mov QWORD PTR [rsp],0x400500
40032a: 00
40032b: 48 89 df mov rdi,rbx
40032e: ff 15 8c 01 00 00 call QWORD PTR [rip+0x18c]
# 4004c0 <_ZTV1X+0x10>
400334: bf ac 04 40 00 mov edi,0x4004ac
400339: 0f b6 f0 movzx esi,al
40033c: 31 c0 xor eax,eax
40033e: e8 a5 03 00 00 call 4006e8 <pri...@plt>
400343: 48 8b 44 24 10 mov rax,QWORD PTR [rsp+0x10]
400348: 48 89 df mov rdi,rbx
40034b: ff 10 call QWORD PTR [rax]
40034d: 0f b6 f0 movzx esi,al
400350: bf a4 04 40 00 mov edi,0x4004a4
400355: 31 c0 xor eax,eax
400357: 83 c6 01 add esi,0x1
40035a: e8 89 03 00 00 call 4006e8 <pri...@plt>
[...]
as seen here, GCC isn't folding/inlining the constants returned across the
virtual function boundary, even though they are visible in the compilation
unit and -O3 -fwhole-program is being used. (Note that I started with just
that commandline, and added things in an attempt to induce the
optimization I was hoping for.)
I was able to induce the optimization by removing a level of indirection
via two ways: 1) By having two print() methods, one overloaded to accept
X& and a second overload to accept Y&; and 2) by replacing the classes
with single-level indirection function pointers:
--
#include <stdio.h>
typedef unsigned char(*Calculable)(void);
unsigned char one() { return 1; }
unsigned char two() { return 2; }
static void print(Calculable calculate)
{
printf("%d\n", calculate());
printf("+1: %d\n", calculate() + 1);
}
int main()
{
print(one);
print(two);
return 0;
}
--
For completeness, this code is generated from the function-pointer example
optimizes in the way I expect:
0000000000400390 <main>:
400390: 48 83 ec 08 sub rsp,0x8
400394: ba 01 00 00 00 mov edx,0x1
400399: be e4 04 40 00 mov esi,0x4004e4
40039e: bf 01 00 00 00 mov edi,0x1
4003a3: 31 c0 xor eax,eax
4003a5: e8 c6 02 00 00 call 400670 <__printf_...@plt>
4003aa: ba 02 00 00 00 mov edx,0x2
4003af: be dc 04 40 00 mov esi,0x4004dc
4003b4: bf 01 00 00 00 mov edi,0x1
4003b9: 31 c0 xor eax,eax
4003bb: e8 b0 02 00 00 call 400670 <__printf_...@plt>
Modifying this last example to include two function pointer indirections
once again causes the optimization to be missed.
So, my questions are:
0) Am I missing some existing commandline parameter that would induce the
optimization? (e.g. a bad connection between my chair and keyboard)
1) Is this a missed optimization bug, or is this a missing feature?
2) Either way, what are the steps to correct the issue?
Thanks in advance for insights and/or help!
PS: I would test with a newer 4.5.0 build, but I'm having trouble
bootstrapping. Any help is appreciated on that email (sent yesterday), as
well.
--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt