[Bug rtl-optimization/71785] Computed gotos are mostly optimized away

rndfax at yandex dot ru Wed, 20 Nov 2019 04:33:54 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785


Aleksey <rndfax at yandex dot ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rndfax at yandex dot ru

--- Comment #12 from Aleksey <rndfax at yandex dot ru> ---
Committed test case gcc/testsuite/gcc.target/powerpc/pr71785.c still fails on
the latest GCC version. It seems some flags affect result.

For example, this command:

gcc -S -O3 gcc/testsuite/gcc.target/powerpc/pr71785.c

produces good result - all computed gotos are of type "jump reg":

$ fgrep jmp pr71785.s 
        jmp     *%rax
        jmp     *%rax
        jmp     *%rax
        jmp     *%rax
$

But adding these two flags "-fno-reorder-blocks-and-partition
-fno-reorder-blocks" we get command like this:

gcc -S -O3 -fno-reorder-blocks-and-partition -fno-reorder-blocks
gcc/testsuite/gcc.target/powerpc/pr71785.c

and we see that it does not optimize one goto:

$ fgrep jmp pr71785.s 
        jmp     .L2
        jmp     *%rax
        jmp     *%rax
        jmp     *%rax
        jmp     *%rax
$ 

I investigated this a little in the source code of GCC. And found that in
gcc/bb-reorder.c one basic block is not optimized
(pass_duplicate_computed_gotos) because of failing this condition
"single_pred_p (bb)" in this for-loop block, in this if statement:

  for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); )
    {
      basic_block pred = e->src;

      /* Do not duplicate BB into PRED if that is the last predecessor, or if
     we cannot merge a copy of BB with PRED.  */
      if (single_pred_p (bb)
          || !single_succ_p (pred)
          || e->flags & EDGE_COMPLEX
          || pred->index < NUM_FIXED_BLOCKS
          || (JUMP_P (BB_END (pred)) && !simplejump_p (BB_END (pred)))
          || (JUMP_P (BB_END (pred)) && CROSSING_JUMP_P (BB_END (pred))))
        {
          ei_next (&ei);
          continue;
        }

If I get it right, happens this: bb, which has "jmp reg", has several
predecessors, which all have "jmp rel" to this bb. After merging this bb into
such a predecessor, bb have one predecessor less. When bb leaves with one
predecessor it falls into this if and stop optimizing.

One may say, that it's not that big deal, that only the first jump is not
optimized. But with additional flag "-mcmodel=large" and this patch on the
committed test case file gcc/testsuite/gcc.target/powerpc/pr71785.c:
```
diff --git a/gcc/testsuite/gcc.target/powerpc/pr71785.c
b/gcc/testsuite/gcc.target/powerpc/pr71785.c
index c667ad8..6c8dde6 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr71785.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr71785.c
@@ -28,6 +28,7 @@ extern void do_stuff_b(int arg);
 extern void do_stuff_c(int arg);

 extern int someglobal;
+extern void *jump;

 void
 eval(op *op)
@@ -43,10 +44,14 @@ eval(op *op)
 CASE_OP_A:
        someglobal++;
        op++;
+       if (op->opcode == OP_END)
+               goto *jump;
        goto *dispatch_table[op->opcode];
 CASE_OP_B:
        do_stuff_b(op->arg);
        op++;
+       if (op->opcode == OP_END)
+               goto *jump;
        goto *dispatch_table[op->opcode];
 CASE_OP_C:
        do_stuff_c(op->arg);
```

the results become bad:

gcc -S -O3 -fno-reorder-blocks-and-partition -fno-reorder-blocks -mcmodel=large
gcc/testsuite/gcc.target/powerpc/pr71785_patched.c

$ fgrep jmp pr71785.s 
        jmp     .L2
        jmp     .L14
        jmp     *%rax
        jmp     *%rax
        jmp     .L6
        jmp     *%rax
        jmp     *%rax
$

More real-world example of mine from project that I'm working on is using these
flags:

gcc -S -O3 -fno-reorder-blocks-and-partition -fno-reorder-blocks -mcmodel=large
-fno-crossjumping -fno-gcse -fno-PIE
gcc/testsuite/gcc.target/powerpc/pr71785_patched.c

which gives this:

$ fgrep jmp pr71785.s 
        jmp     .L2
        jmp     .L4
        jmp     *%rax
        jmp     *%rax
        jmp     *%rax
        jmp     .L6
        jmp     *%rax
        jmp     *%rax
        jmp     *%rax
$

If the first jump is not that crucial - it can be "jmp rel + jmp reg" - all
other jumps must be "jmp reg" only. Moreover, basic blocks are not traversed in
order they appear in source file, so in one day the first jump becomes "jmp rel
+ jmp reg", the other day (depending on the source file and basic block
traversing order) some crucial goto becomes "jmp rel + jmp reg", instead of
just "jmp reg". And this breaks everything.

Is it possible to fix that?

[Bug rtl-optimization/71785] Computed gotos are mostly optimized away

Reply via email to