This is the gcc-4.0 reincarnation of PR 15242. Since about gcc-3.2 or
so gcc tends to compile "goto *" into direct jumps to a shared
indirect jump. Gcc-4.0 tries to undo this in a later stage, but
apparently it is not completely successful:
This is a fragment from the engine1.i file I will
attach (with some newlines removed:
H_IALOAD: __asm__(""); I_IALOAD:
{
java_arrayheader * aArray;
s4 iIndex;
Cell vResult;
;
((aArray) = (java_arrayheader * )(sp[1]));
((iIndex)=(s4)(Cell)(spTOS));
sp += 1;
{
# 188 "./java.vmg"
{
{ if ((aArray) == ((void *)0)) { goto *throw_nullpointerexception; } };
{ if (( ((java_arrayheader*)(aArray))->size ) <= (u4) (iIndex)) {
arrayindexoutofbounds_index = (iIndex); goto
*throw_arrayindexoutofboundsexception; } };
;
vResult = ((((java_intarray*)(aArray))->data)[iIndex]);
}
# 332 "java-vm.i"
}
;
((spTOS) = (Cell)(vResult));
J_IALOAD: __asm__("");
do {ca=*(ip++);} while(0);
K_IALOAD: __asm__("");
goto before_goto;
}
After compiling this with "gcc-4.0.0 -fno-reorder-blocks -O2 -g3 -S
engine1.i", the assembly output for this fragment is:
.L995:
jmp *%rdx
...
.L9:
.LBB46:
.loc 115 314 0
addq $8, %r15
.loc 115 315 0
movl -168(%rbp), %eax
.loc 114 189 0
movq -136(%rbp), %rdx
.loc 115 314 0
movq (%r15), %r9
.loc 114 189 0
testq %r9, %r9
je .L995
.loc 114 190 0
cmpl %eax, 16(%r9)
ja .L764
movq -152(%rbp), %rdx
movl %eax, -116(%rbp)
.LBE46:
.loc 2 231 0
jmp *%rdx
.L764:
.LBB47:
.loc 114 192 0
cltq
movslq 24(%r9,%rax,4),%r9
movq %r9, -168(%rbp)
.L195:
.loc 115 342 0
.loc 115 343 0
movq (%r14), %r13
addq $8, %r14
.L381:
.loc 115 344 0
jmp .L560
So while gcc managed to reconstruct the second indirect jump, it did
not succeed for the first "goto *", which is pessimised into a
conditional branch to a shared indirect jump.
Code coming from "gcc version 4.0.2 (Debian 4.0.2-2)" or gcc-4.0.0
without -fno-reorder-blocks is similar.
The impact of this pessimisation is that we cannot use "selective
inlining" for JVM instructions that can throw exceptions, like
"getfield"; a rough guess at the resulting slowdown for the Cacao JVM
interpreter is a factor 1.2.
Hmm, I guess that the intermediate direct unconditional jump is
optimised away, and that leads to the inability to reconstruct the
indirect jump. Maybe I can work around this problem by putting an
__asm__("") or a label between the if and the "goto *" to prevent the
optimisation, but:
1) Will the workaround still work with the next gcc?
2) The workarounds start to accumulate.
--
Summary: pessimization of goto * ("computed goto")
Product: gcc
Version: 4.0.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: anton at mips dot complang dot tuwien dot ac dot at
GCC build triplet: x86_64-unknown-linux-gnu
GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25285