https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108352
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |law at gcc dot gnu.org --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- Checking profitability of path (backwards): bb:3 (6 insns) bb:9 (0 insns) (latch) bb:5 Control statement insns: 2 Overall: 4 insns [4] Registering jump thread: (5, 9) incoming edge; (9, 3) normal (back) (3, 4) nocopy; path: 5->9->3->4 SUCCESS but Checking profitability of path (backwards): bb:3 (6 insns) bb:9 (latch) Control statement insns: 2 Overall: 4 insns FAIL: Would create irreducible loop without threading multiway branch. path: 9->3->xx REJECTED we are no longer considering the first which just adds an unrelated jump to the path after the patch. That's the /* We avoid creating irreducible inner loops unless we thread through a multiway branch, in which case we have deemed it worth losing other loop optimizations later. We also consider it worth creating an irreducible inner loop if the number of copied statement is low relative to the length of the path -- in that case there's little the traditional loop optimizer would have done anyway, so an irreducible loop is not so bad. */ if (!threaded_multiway_branch && creates_irreducible_loop && *creates_irreducible_loop && (n_insns * (unsigned) param_fsm_scale_path_stmts > (m_path.length () * (unsigned) param_fsm_scale_path_blocks))) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " FAIL: Would create irreducible loop without threading " "multiway branch.\n"); return false; heuristic which with 9 -> 3 is 4 * 2 > 2 * 3 but with 5 -> 9 -> 3 we get 4 * 2 > 3 * 3. It's also worth noting that neither of the two threads create an irreducible loop in the end for this particular case since e is also constant on entry and thus the jump is resolved and the extra loop entry is removed (but that's out of scope of the threaders analysis here). It IMHO still makes no sense to reject the shorter path over the longer one so the above "heuristic" makes absolutely no sense to me. Raising --param fsm-scale-path-blocks to 4 "fixes" the testcase on trunk. The heuristic was added in r6-6600-g2b572b3c213b51 by Jeff in the attempt to address a coremark regression (PR68398). I guess Jeff remembers nothing about this. Note this is not about adding inner irreducible loops but making loop itself irreducible. The length of the path itself also says nothing about the length of a path through the irreducible loop ... Reverting the heuristic will reject all non-multi-way branch irreducible loop creation. We have another heuristic that rejects threading through the latch early: /* Threading through an empty latch would cause code to be added to the latch. This could alter the loop form sufficiently to cause loop optimizations to fail. Disable these threads until after loop optimizations have run. */ if ((threaded_through_latch || (taken_edge && taken_edge->dest == loop->latch)) && !(cfun->curr_properties & PROP_loop_opts_done) && empty_block_p (loop->latch)) so we could reject irreducible loops before loop opts (w/o just covering the empty latch case) and otherwise generally allow it even for non-multi-way branches. That said, I fear I'm going to replace one bogus heuristic with another ;) I'm still going to test replacing the heuristic with the following (which allows to remove the fsm-scale-path-blocks param). /* We avoid creating irreducible inner loops unless we thread through a multiway branch, in which case we have deemed it worth losing other loop optimizations later. We also consider it worth creating an irreducible inner loop after loop optimizations if the number of copied statement is low. */ if (!m_threaded_multiway_branch && *creates_irreducible_loop && (!(cfun->curr_properties & PROP_loop_opts_done) || (m_n_insns * param_fsm_scale_path_stmts >= param_max_jump_thread_duplication_stmts))) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " FAIL: Would create irreducible loop early without " "threading multiway branch.\n"); /* We compute creates_irreducible_loop only late. */ return false; }