On Wed, 5 Feb 2025, Christoph Müllner wrote:

> On Wed, Feb 5, 2025 at 1:29 PM Richard Biener <rguent...@suse.de> wrote:
> >
> > The PR shows fold-mem-offsets taking ages and a lot of memory computing
> > DU/UD chains as that requires the RD problem.  The issue is not so much
> > the memory required for the pruned sets but the high CFG connectivity
> > (and that the CFG is cyclic) which makes solving the dataflow problem
> > expensive.
> >
> > The following adds the same limit as the one imposed by GCSE and CPROP.
> >
> > Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu, this
> > reduces the compile-time of the PR26854 testcase from 480s to 150s.
> >
> > OK?
> 
> Thank you for sending this.
> I was working on something like this as well, but I could not
> find a reasonable threshold expression to disable the pass.

Yeah, it's quite arbitrary of course - but luckily others "solved"
the magic for us (gcse_or_cprop_is_too_expensive).  I couldn't
convince myself the pass suffers from n_basic_blocks * n_regs
memory complexity, so didn't add that.

Richard.

> >
> > Thanks,
> > Richard.
> >
> >         PR rtl-optimization/117922
> >         * fold-mem-offsets.cc (pass_fold_mem_offsets::execute):
> >         Do nothing for a highly connected CFG.
> > ---
> >  gcc/fold-mem-offsets.cc | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> >
> > diff --git a/gcc/fold-mem-offsets.cc b/gcc/fold-mem-offsets.cc
> > index a816006e207..c1c94472a07 100644
> > --- a/gcc/fold-mem-offsets.cc
> > +++ b/gcc/fold-mem-offsets.cc
> > @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "df.h"
> >  #include "tree-pass.h"
> >  #include "cfgrtl.h"
> > +#include "diagnostic-core.h"
> >
> >  /* This pass tries to optimize memory offset calculations by moving 
> > constants
> >     from add instructions to the memory instructions (loads / stores).
> > @@ -841,6 +842,23 @@ do_commit_insn (rtx_insn *insn)
> >  unsigned int
> >  pass_fold_mem_offsets::execute (function *fn)
> >  {
> > +  /* Computing UD/DU chains for flow graphs which have a high connectivity
> > +     will take a long time and is unlikely to be particularly useful.
> > +
> > +     In normal circumstances a cfg should have about twice as many
> > +     edges as blocks.  But we do not want to punish small functions
> > +     which have a couple switch statements.  Rather than simply
> > +     threshold the number of blocks, uses something with a more
> > +     graceful degradation.  */
> > +  if (n_edges_for_fn (fn) > 20000 + n_basic_blocks_for_fn (fn) * 4)
> > +    {
> > +      warning (OPT_Wdisabled_optimization,
> > +              "fold-mem-offsets: %d basic blocks and %d edges/basic block",
> > +              n_basic_blocks_for_fn (cfun),
> > +              n_edges_for_fn (cfun) / n_basic_blocks_for_fn (cfun));
> > +      return 0;
> > +    }
> > +
> >    df_set_flags (DF_EQ_NOTES + DF_RD_PRUNE_DEAD_DEFS + 
> > DF_DEFER_INSN_RESCAN);
> >    df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN);
> >    df_analyze ();
> > --
> > 2.43.0
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to