This is the gcc-4.0 reincarnation of PR 15242. Since about gcc-3.2 or so gcc tends to compile "goto *" into direct jumps to a shared indirect jump. Gcc-4.0 tries to undo this in a later stage, but apparently it is not completely successful:
This is a fragment from the engine1.i file I will attach (with some newlines removed: H_IALOAD: __asm__(""); I_IALOAD: { java_arrayheader * aArray; s4 iIndex; Cell vResult; ; ((aArray) = (java_arrayheader * )(sp[1])); ((iIndex)=(s4)(Cell)(spTOS)); sp += 1; { # 188 "./java.vmg" { { if ((aArray) == ((void *)0)) { goto *throw_nullpointerexception; } }; { if (( ((java_arrayheader*)(aArray))->size ) <= (u4) (iIndex)) { arrayindexoutofbounds_index = (iIndex); goto *throw_arrayindexoutofboundsexception; } }; ; vResult = ((((java_intarray*)(aArray))->data)[iIndex]); } # 332 "java-vm.i" } ; ((spTOS) = (Cell)(vResult)); J_IALOAD: __asm__(""); do {ca=*(ip++);} while(0); K_IALOAD: __asm__(""); goto before_goto; } After compiling this with "gcc-4.0.0 -fno-reorder-blocks -O2 -g3 -S engine1.i", the assembly output for this fragment is: .L995: jmp *%rdx ... .L9: .LBB46: .loc 115 314 0 addq $8, %r15 .loc 115 315 0 movl -168(%rbp), %eax .loc 114 189 0 movq -136(%rbp), %rdx .loc 115 314 0 movq (%r15), %r9 .loc 114 189 0 testq %r9, %r9 je .L995 .loc 114 190 0 cmpl %eax, 16(%r9) ja .L764 movq -152(%rbp), %rdx movl %eax, -116(%rbp) .LBE46: .loc 2 231 0 jmp *%rdx .L764: .LBB47: .loc 114 192 0 cltq movslq 24(%r9,%rax,4),%r9 movq %r9, -168(%rbp) .L195: .loc 115 342 0 .loc 115 343 0 movq (%r14), %r13 addq $8, %r14 .L381: .loc 115 344 0 jmp .L560 So while gcc managed to reconstruct the second indirect jump, it did not succeed for the first "goto *", which is pessimised into a conditional branch to a shared indirect jump. Code coming from "gcc version 4.0.2 (Debian 4.0.2-2)" or gcc-4.0.0 without -fno-reorder-blocks is similar. The impact of this pessimisation is that we cannot use "selective inlining" for JVM instructions that can throw exceptions, like "getfield"; a rough guess at the resulting slowdown for the Cacao JVM interpreter is a factor 1.2. Hmm, I guess that the intermediate direct unconditional jump is optimised away, and that leads to the inability to reconstruct the indirect jump. Maybe I can work around this problem by putting an __asm__("") or a label between the if and the "goto *" to prevent the optimisation, but: 1) Will the workaround still work with the next gcc? 2) The workarounds start to accumulate. -- Summary: pessimization of goto * ("computed goto") Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: anton at mips dot complang dot tuwien dot ac dot at GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25285