Richard Biener <rguent...@suse.de> writes: >> I suppose that's better than the first version when a block has a >> large number of dominance frontiers. But I can't remember whether >> that was the case in PR98863. I have a feeling that I tried the above >> as part of the PR, since it's the obvious way of applying liveness >> once per block rather than once per edge. But I think it was more >> just sheer weight of numbers. > > Note at least this (see below for the actual patch for trunk) is > in the linear part, so it shouldn't trigger any quadraticness. > It speeds up the testcase the same as the original one.
But some of the "quadracticness" is from O(nblocks*nregisters) or O(nedges*nregisters), rather than through iteration. >> I wonder whether we could create a new DEF bitmap on-the-fly while >> creating the defs and uses, restricting it to potential PHI registers. >> The test for potential PHI registers is constant time, and we could >> build the bitmaps as tree views before converting to list views. >> Can give it a go if that sounds OK. > > Do that within DF itself? Not sure about that. No, in rtl-ssa. It would be convenient to have a version of bitmap_ior_and_into in which the final bitmap is an sbitmap though. I.e., schematically: EXECUTE_IF_SET_IN_BITMAP (&frontiers[b1], 0, b2, bmi) if (bitmap_ior_and_into (&unfiltered[b2], b1_def, bi.potential_phi_regs) && !bitmap_empty_p (&frontiers[b2])) // Propagate the (potential) new phi node definitions in B2. bitmap_set_bit (worklist, b2); with a new overload to make that possible. > Anyway, I think the below is small enough that it would also qualify > for backporting. Unless there's a problem with it, of course. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > OK? I still think this is likely to reintroduce the problem in PR98863. I won't object further though, so yeah, please go ahead. Richard > Thanks, > Richard. > > From d6f57f72fbbdb832b33e4ba5ccbf60a8d90ea264 Mon Sep 17 00:00:00 2001 > From: Richard Biener <rguent...@suse.de> > Date: Mon, 19 Feb 2024 11:10:50 +0100 > Subject: [PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time > hog > To: gcc-patches@gcc.gnu.org > > The following tries to address the PHI insertion compile-time hog in > RTL fwprop observed with the PR54052 testcase where the loop computing > the "unfiltered" set of variables possibly needing PHI nodes for each > block exhibits quadratic compile-time and memory-use. > > It does so by pruning the local DEFs with LR_OUT of the block, removing > regs that can never be LR_IN (defined by this block) in the dominance > frontier. > > PR rtl-optimization/54052 > * rtl-ssa/blocks.cc (function_info::place_phis): Filter > local defs by LR_OUT. > --- > gcc/rtl-ssa/blocks.cc | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc > index 8996443e8d5..cf4224e61ec 100644 > --- a/gcc/rtl-ssa/blocks.cc > +++ b/gcc/rtl-ssa/blocks.cc > @@ -645,7 +645,12 @@ function_info::place_phis (build_info &bi) > if (bitmap_empty_p (&frontiers[b1])) > continue; > > - bitmap b1_def = &DF_LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, b1))->def; > + // Defs in B1 that are possibly in LR_IN in the dominance frontier > + // blocks. > + auto_bitmap b1_def; > + bitmap_and (b1_def, &DF_LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, > b1))->def, > + DF_LR_OUT (BASIC_BLOCK_FOR_FN (m_fn, b1))); > + > bitmap_iterator bmi; > unsigned int b2; > EXECUTE_IF_SET_IN_BITMAP (&frontiers[b1], 0, b2, bmi)