On Wed, 5 Feb 2025, Christoph Müllner wrote: > On Wed, Feb 5, 2025 at 1:29 PM Richard Biener <rguent...@suse.de> wrote: > > > > The PR shows fold-mem-offsets taking ages and a lot of memory computing > > DU/UD chains as that requires the RD problem. The issue is not so much > > the memory required for the pruned sets but the high CFG connectivity > > (and that the CFG is cyclic) which makes solving the dataflow problem > > expensive. > > > > The following adds the same limit as the one imposed by GCSE and CPROP. > > > > Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu, this > > reduces the compile-time of the PR26854 testcase from 480s to 150s. > > > > OK? > > Thank you for sending this. > I was working on something like this as well, but I could not > find a reasonable threshold expression to disable the pass.
Yeah, it's quite arbitrary of course - but luckily others "solved" the magic for us (gcse_or_cprop_is_too_expensive). I couldn't convince myself the pass suffers from n_basic_blocks * n_regs memory complexity, so didn't add that. Richard. > > > > Thanks, > > Richard. > > > > PR rtl-optimization/117922 > > * fold-mem-offsets.cc (pass_fold_mem_offsets::execute): > > Do nothing for a highly connected CFG. > > --- > > gcc/fold-mem-offsets.cc | 18 ++++++++++++++++++ > > 1 file changed, 18 insertions(+) > > > > diff --git a/gcc/fold-mem-offsets.cc b/gcc/fold-mem-offsets.cc > > index a816006e207..c1c94472a07 100644 > > --- a/gcc/fold-mem-offsets.cc > > +++ b/gcc/fold-mem-offsets.cc > > @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If not see > > #include "df.h" > > #include "tree-pass.h" > > #include "cfgrtl.h" > > +#include "diagnostic-core.h" > > > > /* This pass tries to optimize memory offset calculations by moving > > constants > > from add instructions to the memory instructions (loads / stores). > > @@ -841,6 +842,23 @@ do_commit_insn (rtx_insn *insn) > > unsigned int > > pass_fold_mem_offsets::execute (function *fn) > > { > > + /* Computing UD/DU chains for flow graphs which have a high connectivity > > + will take a long time and is unlikely to be particularly useful. > > + > > + In normal circumstances a cfg should have about twice as many > > + edges as blocks. But we do not want to punish small functions > > + which have a couple switch statements. Rather than simply > > + threshold the number of blocks, uses something with a more > > + graceful degradation. */ > > + if (n_edges_for_fn (fn) > 20000 + n_basic_blocks_for_fn (fn) * 4) > > + { > > + warning (OPT_Wdisabled_optimization, > > + "fold-mem-offsets: %d basic blocks and %d edges/basic block", > > + n_basic_blocks_for_fn (cfun), > > + n_edges_for_fn (cfun) / n_basic_blocks_for_fn (cfun)); > > + return 0; > > + } > > + > > df_set_flags (DF_EQ_NOTES + DF_RD_PRUNE_DEAD_DEFS + > > DF_DEFER_INSN_RESCAN); > > df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN); > > df_analyze (); > > -- > > 2.43.0 > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)