https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785
Aleksey <rndfax at yandex dot ru> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rndfax at yandex dot ru --- Comment #12 from Aleksey <rndfax at yandex dot ru> --- Committed test case gcc/testsuite/gcc.target/powerpc/pr71785.c still fails on the latest GCC version. It seems some flags affect result. For example, this command: gcc -S -O3 gcc/testsuite/gcc.target/powerpc/pr71785.c produces good result - all computed gotos are of type "jump reg": $ fgrep jmp pr71785.s jmp *%rax jmp *%rax jmp *%rax jmp *%rax $ But adding these two flags "-fno-reorder-blocks-and-partition -fno-reorder-blocks" we get command like this: gcc -S -O3 -fno-reorder-blocks-and-partition -fno-reorder-blocks gcc/testsuite/gcc.target/powerpc/pr71785.c and we see that it does not optimize one goto: $ fgrep jmp pr71785.s jmp .L2 jmp *%rax jmp *%rax jmp *%rax jmp *%rax $ I investigated this a little in the source code of GCC. And found that in gcc/bb-reorder.c one basic block is not optimized (pass_duplicate_computed_gotos) because of failing this condition "single_pred_p (bb)" in this for-loop block, in this if statement: for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); ) { basic_block pred = e->src; /* Do not duplicate BB into PRED if that is the last predecessor, or if we cannot merge a copy of BB with PRED. */ if (single_pred_p (bb) || !single_succ_p (pred) || e->flags & EDGE_COMPLEX || pred->index < NUM_FIXED_BLOCKS || (JUMP_P (BB_END (pred)) && !simplejump_p (BB_END (pred))) || (JUMP_P (BB_END (pred)) && CROSSING_JUMP_P (BB_END (pred)))) { ei_next (&ei); continue; } If I get it right, happens this: bb, which has "jmp reg", has several predecessors, which all have "jmp rel" to this bb. After merging this bb into such a predecessor, bb have one predecessor less. When bb leaves with one predecessor it falls into this if and stop optimizing. One may say, that it's not that big deal, that only the first jump is not optimized. But with additional flag "-mcmodel=large" and this patch on the committed test case file gcc/testsuite/gcc.target/powerpc/pr71785.c: ``` diff --git a/gcc/testsuite/gcc.target/powerpc/pr71785.c b/gcc/testsuite/gcc.target/powerpc/pr71785.c index c667ad8..6c8dde6 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr71785.c +++ b/gcc/testsuite/gcc.target/powerpc/pr71785.c @@ -28,6 +28,7 @@ extern void do_stuff_b(int arg); extern void do_stuff_c(int arg); extern int someglobal; +extern void *jump; void eval(op *op) @@ -43,10 +44,14 @@ eval(op *op) CASE_OP_A: someglobal++; op++; + if (op->opcode == OP_END) + goto *jump; goto *dispatch_table[op->opcode]; CASE_OP_B: do_stuff_b(op->arg); op++; + if (op->opcode == OP_END) + goto *jump; goto *dispatch_table[op->opcode]; CASE_OP_C: do_stuff_c(op->arg); ``` the results become bad: gcc -S -O3 -fno-reorder-blocks-and-partition -fno-reorder-blocks -mcmodel=large gcc/testsuite/gcc.target/powerpc/pr71785_patched.c $ fgrep jmp pr71785.s jmp .L2 jmp .L14 jmp *%rax jmp *%rax jmp .L6 jmp *%rax jmp *%rax $ More real-world example of mine from project that I'm working on is using these flags: gcc -S -O3 -fno-reorder-blocks-and-partition -fno-reorder-blocks -mcmodel=large -fno-crossjumping -fno-gcse -fno-PIE gcc/testsuite/gcc.target/powerpc/pr71785_patched.c which gives this: $ fgrep jmp pr71785.s jmp .L2 jmp .L4 jmp *%rax jmp *%rax jmp *%rax jmp .L6 jmp *%rax jmp *%rax jmp *%rax $ If the first jump is not that crucial - it can be "jmp rel + jmp reg" - all other jumps must be "jmp reg" only. Moreover, basic blocks are not traversed in order they appear in source file, so in one day the first jump becomes "jmp rel + jmp reg", the other day (depending on the source file and basic block traversing order) some crucial goto becomes "jmp rel + jmp reg", instead of just "jmp reg". And this breaks everything. Is it possible to fix that?