https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116166
--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Mark Wielaard from comment #12)
> (In reply to Andreas Schwab from comment #11)
> > You can add target-specific flags like this:
> >
> > $(INSNEMIT_SEQ_O): ALL_COMPILERFLAGS += -fno-tree-dominator-opts
>
> Thanks. With "$(GIMPLE_MATCH_PD_SEQ_O) $(INSNEMIT_SEQ_O) insn-opinit.o
> insn-recog.o: ALL_COMPILERFLAGS += -O1 -fno-tree-dominator-opts" a make -j64
> drops from 8 hours to 3.5 hours:
>
> real 202m25.031s
> user 2209m7.176s
> sys 108m49.102s
>
> Now insn-recog.cc (even though it is included in the workaround) takes the
> longest time (~1 hour) to compile.
Compiling insn-recog.cc for a cross-compiler to riscv on x86_64 with trunk
and -O2 takes 90s with a quite flat profile.
Are those worst timings using a stage1 compiler built with default flags (-O0)?
Seeing the profile in the description I'll note the backwards threader has
a search depth for jump thread paths (--param max-jump-thread-paths) but
thread_around_empty_blocks search space is unlimited - with
EDGE_NO_COPY_SRC_BLOCK we do not account any stmts towards the stmt limit.
We're also doing a lot of redundant stmt simplifications by likely
quadratically
exploring jump threading paths. And each hybrid_jt_simplifier::simplify
call resets the path query path which we know is a very expensive operation,
it also shares the issues the backwards threader originally had, starting
with too big imports. Doing that up to 2^four times for each block is
wasteful - simplify_control_stmt_condition_1 ends up calling
hybrid_jt_simplifier::simplify through dom_jt_simplifier::simplify and
while simplify_control_stmt_condition_1 has a recursion limit while
processing & and | it recurses to both arms, something ranger can do
itself(?).
The threader JT simplifier is over-abstracted - only DOM seems to use
hybrid_jt_simplifier. The following should cut compile-time down
significantly (I'm not sure if the "old" DOM equiv lookup done by
dom-simplify is even necessary). IMO "gimping" the old forward threader
with ranger was misguided as it was supposed to vanish anyway.
diff --git a/gcc/tree-ssa-threadedge.cc b/gcc/tree-ssa-threadedge.cc
index 7f82639b8ec..cac290175d4 100644
--- a/gcc/tree-ssa-threadedge.cc
+++ b/gcc/tree-ssa-threadedge.cc
@@ -634,7 +634,8 @@ jump_threader::simplify_control_stmt_condition_1
then use the pass specific callback to simplify the condition. */
if (!res
|| !is_gimple_min_invariant (res))
- res = m_simplifier->simplify (dummy_cond, stmt, e->src, m_state);
+ res = m_simplifier->simplify (dummy_cond, stmt, e->src,
+ limit == 4 ? m_state : NULL);
return res;
}
Note it doesn't help we're trying normal/empty thread stuff over and over.
Possibly RISC-V has "bad" LOGICAL_OP_NON_SHORT_CIRCUIT, it defines it to zero
which means all && and || conditions are CFG branches initially.
Can someone try adding --param logical-op-non-short-circuit=1 to that
FLAGS workaround?