Richard Biener <[email protected]> writes:
>> I suppose that's better than the first version when a block has a
>> large number of dominance frontiers. But I can't remember whether
>> that was the case in PR98863. I have a feeling that I tried the above
>> as part of the PR, since it's the obvious way of applying liveness
>> once per block rather than once per edge. But I think it was more
>> just sheer weight of numbers.
>
> Note at least this (see below for the actual patch for trunk) is
> in the linear part, so it shouldn't trigger any quadraticness.
> It speeds up the testcase the same as the original one.
But some of the "quadracticness" is from O(nblocks*nregisters) or
O(nedges*nregisters), rather than through iteration.
>> I wonder whether we could create a new DEF bitmap on-the-fly while
>> creating the defs and uses, restricting it to potential PHI registers.
>> The test for potential PHI registers is constant time, and we could
>> build the bitmaps as tree views before converting to list views.
>> Can give it a go if that sounds OK.
>
> Do that within DF itself? Not sure about that.
No, in rtl-ssa. It would be convenient to have a version of
bitmap_ior_and_into in which the final bitmap is an sbitmap though.
I.e., schematically:
EXECUTE_IF_SET_IN_BITMAP (&frontiers[b1], 0, b2, bmi)
if (bitmap_ior_and_into (&unfiltered[b2], b1_def,
bi.potential_phi_regs)
&& !bitmap_empty_p (&frontiers[b2]))
// Propagate the (potential) new phi node definitions in B2.
bitmap_set_bit (worklist, b2);
with a new overload to make that possible.
> Anyway, I think the below is small enough that it would also qualify
> for backporting. Unless there's a problem with it, of course.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK?
I still think this is likely to reintroduce the problem in PR98863.
I won't object further though, so yeah, please go ahead.
Richard
> Thanks,
> Richard.
>
> From d6f57f72fbbdb832b33e4ba5ccbf60a8d90ea264 Mon Sep 17 00:00:00 2001
> From: Richard Biener <[email protected]>
> Date: Mon, 19 Feb 2024 11:10:50 +0100
> Subject: [PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time
> hog
> To: [email protected]
>
> The following tries to address the PHI insertion compile-time hog in
> RTL fwprop observed with the PR54052 testcase where the loop computing
> the "unfiltered" set of variables possibly needing PHI nodes for each
> block exhibits quadratic compile-time and memory-use.
>
> It does so by pruning the local DEFs with LR_OUT of the block, removing
> regs that can never be LR_IN (defined by this block) in the dominance
> frontier.
>
> PR rtl-optimization/54052
> * rtl-ssa/blocks.cc (function_info::place_phis): Filter
> local defs by LR_OUT.
> ---
> gcc/rtl-ssa/blocks.cc | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
> index 8996443e8d5..cf4224e61ec 100644
> --- a/gcc/rtl-ssa/blocks.cc
> +++ b/gcc/rtl-ssa/blocks.cc
> @@ -645,7 +645,12 @@ function_info::place_phis (build_info &bi)
> if (bitmap_empty_p (&frontiers[b1]))
> continue;
>
> - bitmap b1_def = &DF_LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, b1))->def;
> + // Defs in B1 that are possibly in LR_IN in the dominance frontier
> + // blocks.
> + auto_bitmap b1_def;
> + bitmap_and (b1_def, &DF_LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn,
> b1))->def,
> + DF_LR_OUT (BASIC_BLOCK_FOR_FN (m_fn, b1)));
> +
> bitmap_iterator bmi;
> unsigned int b2;
> EXECUTE_IF_SET_IN_BITMAP (&frontiers[b1], 0, b2, bmi)