Richard Biener <rguent...@suse.de> writes:
>> I suppose that's better than the first version when a block has a
>> large number of dominance frontiers.  But I can't remember whether
>> that was the case in PR98863.  I have a feeling that I tried the above
>> as part of the PR, since it's the obvious way of applying liveness
>> once per block rather than once per edge.  But I think it was more
>> just sheer weight of numbers.
>
> Note at least this (see below for the actual patch for trunk) is
> in the linear part, so it shouldn't trigger any quadraticness.
> It speeds up the testcase the same as the original one.

But some of the "quadracticness" is from O(nblocks*nregisters) or
O(nedges*nregisters), rather than through iteration.

>> I wonder whether we could create a new DEF bitmap on-the-fly while
>> creating the defs and uses, restricting it to potential PHI registers.
>> The test for potential PHI registers is constant time, and we could
>> build the bitmaps as tree views before converting to list views.
>> Can give it a go if that sounds OK.
>
> Do that within DF itself?  Not sure about that.

No, in rtl-ssa.  It would be convenient to have a version of
bitmap_ior_and_into in which the final bitmap is an sbitmap though.

I.e., schematically:

      EXECUTE_IF_SET_IN_BITMAP (&frontiers[b1], 0, b2, bmi)
        if (bitmap_ior_and_into (&unfiltered[b2], b1_def,
                                 bi.potential_phi_regs)
            && !bitmap_empty_p (&frontiers[b2]))
          // Propagate the (potential) new phi node definitions in B2.
          bitmap_set_bit (worklist, b2);

with a new overload to make that possible.

> Anyway, I think the below is small enough that it would also qualify
> for backporting.  Unless there's a problem with it, of course.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK?

I still think this is likely to reintroduce the problem in PR98863.
I won't object further though, so yeah, please go ahead.

Richard

> Thanks,
> Richard.
>
> From d6f57f72fbbdb832b33e4ba5ccbf60a8d90ea264 Mon Sep 17 00:00:00 2001
> From: Richard Biener <rguent...@suse.de>
> Date: Mon, 19 Feb 2024 11:10:50 +0100
> Subject: [PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time
>  hog
> To: gcc-patches@gcc.gnu.org
>
> The following tries to address the PHI insertion compile-time hog in
> RTL fwprop observed with the PR54052 testcase where the loop computing
> the "unfiltered" set of variables possibly needing PHI nodes for each
> block exhibits quadratic compile-time and memory-use.
>
> It does so by pruning the local DEFs with LR_OUT of the block, removing
> regs that can never be LR_IN (defined by this block) in the dominance
> frontier.
>
>       PR rtl-optimization/54052
>       * rtl-ssa/blocks.cc (function_info::place_phis): Filter
>       local defs by LR_OUT.
> ---
>  gcc/rtl-ssa/blocks.cc | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
> index 8996443e8d5..cf4224e61ec 100644
> --- a/gcc/rtl-ssa/blocks.cc
> +++ b/gcc/rtl-ssa/blocks.cc
> @@ -645,7 +645,12 @@ function_info::place_phis (build_info &bi)
>        if (bitmap_empty_p (&frontiers[b1]))
>       continue;
>  
> -      bitmap b1_def = &DF_LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, b1))->def;
> +      // Defs in B1 that are possibly in LR_IN in the dominance frontier
> +      // blocks.
> +      auto_bitmap b1_def;
> +      bitmap_and (b1_def, &DF_LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, 
> b1))->def,
> +               DF_LR_OUT (BASIC_BLOCK_FOR_FN (m_fn, b1)));
> +
>        bitmap_iterator bmi;
>        unsigned int b2;
>        EXECUTE_IF_SET_IN_BITMAP (&frontiers[b1], 0, b2, bmi)

Reply via email to