https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785
--- Comment #16 from Aleksey <rndfax at yandex dot ru> --- > > It would be helpful if you give the explanation how these options affect > > "un-factoring". > > What options? -fno-reorder-blocks? Those doo the same to this code as > they do anywhere else: the compiler does not run the reorder-blocks > pass, so you get worse code. This is not an explanation. Just image someone asks "why with -O0 dead code is not eliminated?". And you just answer "It's because you are using -O0". Well thank you, captain :) "-fno-reorder-blocks" option affects bbro step - this is not an "un-factoring" process. compgoto is "un-factoring" step, which takes place earlier than bbro. compgoto promises to "un-factor" computed gotos, but this does not happen. Anyway, I investigated my patched test case and got the following results and found the following. There are two labels in RTL which forms "bb 4" and "bb 5". "bb 5" is located right after "bb 4". "bb 4" sets registers, and "bb 5" just an indirect jump. There are 3 explicit jumps to "bb 4" from OP_A, OP_B, OP_C. There is 1 explicit jump to "bb 5" from START (very first goto *xxx). "bb 4" has 3 predecessors from jumps. "bb 5" has 2 predecessors: one from jump and one from "bb 4". What we see from rtl in compgoto: Duplicating computed goto bb 5 into bb 4 (bb 5 edge count: 2) ... After duplicating computed goto bb 5 into bb 4 (bb 5 edge count: 1) After merging, "bb 4" has 4 predecessors. "bb 5" now has only 1 predecessor. During the duplicate_block the edge from "bb 4" to "bb 5" eliminated. "bb 5" has only one edge from START. Then we continue to optimize "bb 4" stepping recursively from "bb 5": Duplicating computed goto bb 4 into bb 12 (bb 4 edge count: 4) "bb 12" is OP_C. Duplicating computed goto bb 4 into bb 8 (bb 4 edge count: 3) "bb 8" is OP_B. Duplicating computed goto bb 4 into bb 6 (bb 4 edge count: 2) "bb 6" is OP_A. After finishing with "bb 4" we return to "bb 5" and see this: Ignoring "Duplicating computed goto bb 5 into bb 2". "bb 2" is START. START - the very first "goto *xxx" - was not optimized, since "bb 5" has only 1 predecessor. But it's optimized in later step "bbro". That's exactly why option "-fno-reorder-blocks" breaks first jump optimization. Can someone explain why there are such conditions: if (single_pred_p (bb)) return false; if (single_pred_p (bb)) continue; in maybe_duplicate_computed_goto function? Why it's necessary that bb has more than one predecessor? > If you want to have the machine code be structured in some exact order > or way, you will have to use assembler or something similar, not an > optimising compiler. Labels start blocks ("There are various reasons why control flow may transfer from one block to another. One possibility is that some instruction, for example a CODE_LABEL, in a linearized instruction stream just always starts a new basic block"). One can ask compiler to not reorder them with option "-fno-reorder-blocks" - that's enough to fulfil the requirements. This option was used in 2004 in QEMU for similar purposes. Also it is used in gforth: ENGINE_FLAGS = -fno-gcse -fcaller-saves -fno-defer-pop -fno-inline -fwrapv -fno-strict-aliasing -fno-cse-follow-jumps -fno-reorder-blocks -fno-reorder-blocks-and-partition -fno-toplevel-reorder -falign-labels=1 -falign-loops=1 -falign-jumps=1 -fno-delete-null-pointer-checks -pthread So GCC has enough options to structure the machine code in C (and C++) language in desired way, there is no need to go into assembler.