https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785

--- Comment #16 from Aleksey <rndfax at yandex dot ru> ---
> > It would be helpful if you give the explanation how these options affect
> > "un-factoring".
> 
> What options?  -fno-reorder-blocks?  Those doo the same to this code as
> they do anywhere else: the compiler does not run the reorder-blocks
> pass, so you get worse code.

This is not an explanation. Just image someone asks "why with -O0 dead code is
not eliminated?". And you just answer "It's because you are using -O0". Well
thank you, captain :)

"-fno-reorder-blocks" option affects bbro step - this is not an "un-factoring"
process. compgoto is "un-factoring" step, which takes place earlier than bbro.
compgoto promises to "un-factor" computed gotos, but this does not happen.

Anyway, I investigated my patched test case and got the following results and
found the following.

There are two labels in RTL which forms "bb 4" and "bb 5". "bb 5" is located
right after "bb 4". "bb 4" sets registers, and "bb 5" just an indirect jump.
There are 3 explicit jumps to "bb 4" from OP_A, OP_B, OP_C.
There is 1 explicit jump to "bb 5" from START (very first goto *xxx).

"bb 4" has 3 predecessors from jumps.
"bb 5" has 2 predecessors: one from jump and one from "bb 4".

What we see from rtl in compgoto:

  Duplicating computed goto bb 5 into bb 4 (bb 5 edge count: 2)
...
  After duplicating computed goto bb 5 into bb 4 (bb 5 edge count: 1)

After merging, "bb 4" has 4 predecessors. "bb 5" now has only 1 predecessor.
During the duplicate_block the edge from "bb 4" to "bb 5" eliminated. "bb 5"
has only one edge from START. Then we continue to optimize "bb 4" stepping
recursively from "bb 5":

  Duplicating computed goto bb 4 into bb 12 (bb 4 edge count: 4)

"bb 12" is OP_C.

  Duplicating computed goto bb 4 into bb 8 (bb 4 edge count: 3)

"bb 8" is OP_B.

  Duplicating computed goto bb 4 into bb 6 (bb 4 edge count: 2)

"bb 6" is OP_A.

After finishing with "bb 4" we return to "bb 5" and see this:

  Ignoring "Duplicating computed goto bb 5 into bb 2".

"bb 2" is START.

START - the very first "goto *xxx" - was not optimized, since "bb 5" has only 1
predecessor.
But it's optimized in later step "bbro". That's exactly why option
"-fno-reorder-blocks" breaks first jump optimization.

Can someone explain why there are such conditions:
      if (single_pred_p (bb))
          return false;

      if (single_pred_p (bb))
          continue;
in maybe_duplicate_computed_goto function?

Why it's necessary that bb has more than one predecessor?



> If you want to have the machine code be structured in some exact order
> or way, you will have to use assembler or something similar, not an
> optimising compiler.

Labels start blocks ("There are various reasons why control flow may transfer
from one block to another. One possibility is that some instruction, for
example a CODE_LABEL, in a linearized instruction stream just always starts a
new basic block"). One can ask compiler to not reorder them with option
"-fno-reorder-blocks" - that's enough to fulfil the requirements. This option
was used in 2004 in QEMU for similar purposes. Also it is used in gforth:

  ENGINE_FLAGS =  -fno-gcse -fcaller-saves -fno-defer-pop -fno-inline -fwrapv
-fno-strict-aliasing -fno-cse-follow-jumps -fno-reorder-blocks
-fno-reorder-blocks-and-partition -fno-toplevel-reorder -falign-labels=1
-falign-loops=1 -falign-jumps=1 -fno-delete-null-pointer-checks -pthread

So GCC has enough options to structure the machine code in C (and C++) language
in desired way, there is no need to go into assembler.

Reply via email to