On Fri, 4 Apr 2025, Richard Sandiford wrote:

> As noted in the previous patch, combine still takes >30% of
> compile time in the original testcase for PR101523.  The problem
> is that try_combine uses linear insn searches for some dataflow
> queries, so in the worst case, an unlimited number of 2->2
> combinations for the same i2 can lead to quadratic behaviour.
> 
> This patch limits distribute_links to a certain number
> of instructions when i2 is unchanged.  As Segher said in the PR trail,
> it would make more conceptual sense to apply the limit unconditionally,
> but I thought it would be better to change as little as possible at
> this development stage.  Logically, in stage 1, the --param should
> be applied directly by distribute_links with no input from callers.

I agree with applying this unconditionally, but fine with stage1 for
that as well.

> As I mentioned in:
> 
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c28
> 
> I think it's safe to drop log links even if a use exists.  All
> processing of log links seems to handle the absence of a link
> for a particular register in a conservative way.
> 
> The initial set-up errs on the side of dropping links, since for example
> create_log_links has:
> 
>              /* flow.c claimed:
> 
>                  We don't build a LOG_LINK for hard registers contained
>                  in ASM_OPERANDs.  If these registers get replaced,
>                  we might wind up changing the semantics of the insn,
>                  even if reload can make what appear to be valid
>                  assignments later.  */
>               if (regno < FIRST_PSEUDO_REGISTER
>                   && asm_noperands (PATTERN (use_insn)) >= 0)
>                 continue;
> 
> which excludes combinations by dropping log links, rather than during
> try_combine.  And:
> 
>       /* If this register is being initialized using itself, and the
>          register is uninitialized in this basic block, and there are
>          no LOG_LINKS which set the register, then part of the
>          register is uninitialized.  In that case we can't assume
>          anything about the number of nonzero bits.
> 
>          ??? We could do better if we checked this in
>          reg_{nonzero_bits,num_sign_bit_copies}_for_combine.  Then we
>          could avoid making assumptions about the insn which initially
>          sets the register, while still using the information in other
>          insns.  We would have to be careful to check every insn
>          involved in the combination.  */
> 
>       if (insn
>           && reg_referenced_p (x, PATTERN (insn))
>           && !REGNO_REG_SET_P (DF_LR_IN (BLOCK_FOR_INSN (insn)),
>                                REGNO (x)))
>         {
>           struct insn_link *link;
> 
>           FOR_EACH_LOG_LINK (link, insn)
>             if (dead_or_set_p (link->insn, x))
>               break;
>           if (!link)
>             {
>               rsp->nonzero_bits = GET_MODE_MASK (mode);
>               rsp->sign_bit_copies = 1;
>               return;
>             }
>         }
> 
> treats the lack of a log link as a possible sign of uninitialised data,
> but that would be a missed optimisation rather than a correctness issue.
> 
> One question is what the default --param value should be.  I went with
> Jakub's suggestion of 3000 from:
> 
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c25
> 
> Also, to answer Jakub's question in that comment, I tried bisecting:
> 
>   int limit = atoi (getenv ("BISECT"));
> 
> (so applying the limit for all calls from try_combine) with an
> abort in distribute_links if the limit caused a link to be skipped.
> The minimum BISECT value that allowed an aarch64-linux-gnu bootstrap
> to succeed with --enable-languages=all --enable-checking=yes,rtl,extra
> was 142, so much lower than the parameter value.  I realised too late
> that --enable-checking=release would probably have been a more
> interesting test.
> 
> The previous patch meant that distribute_links itself is now linear
> for a given i2 definition, since each search starts at the previous
> last use, rather than at i2 itself.

Possibly this can be changed.

>  This means that the limit has
> to be applied cumulatively across all searches for the same link.

Which I think is reasonable to do anyway, this effectively limits
the distribute_link search per definition, right?

> The patch does that by storing a counter in the insn_link structure.
> There was a 32-bit hole there on LP64 hosts.

OK.

Thanks,
Richard.

> gcc/
>       PR testsuite/116398
>       * params.opt (-param=max-combine-search-insns=): New param.
>       * doc/invoke.texi: Document it.
>       * combine.cc (insn_link::insn_count): New field.
>       (alloc_insn_link): Initialize it.
>       (distribute_links): Add a limit parameter.
>       (try_combine): Use the new param to limit distribute_links
>       when only i3 has changed.
> ---
>  gcc/combine.cc      | 21 +++++++++++++++++----
>  gcc/doc/invoke.texi |  9 +++++++++
>  gcc/params.opt      |  4 ++++
>  3 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 9eae1a0111e..197c19f6bdf 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -309,6 +309,7 @@ static int *uid_insn_cost;
>  struct insn_link {
>    rtx_insn *insn;
>    unsigned int regno;
> +  int insn_count;
>    struct insn_link *next;
>  };
>  
> @@ -342,6 +343,7 @@ alloc_insn_link (rtx_insn *insn, unsigned int regno, 
> struct insn_link *next)
>                                         sizeof (struct insn_link));
>    l->insn = insn;
>    l->regno = regno;
> +  l->insn_count = 0;
>    l->next = next;
>    return l;
>  }
> @@ -472,7 +474,8 @@ static void move_deaths (rtx, rtx, int, rtx_insn *, rtx 
> *);
>  static bool reg_bitfield_target_p (rtx, rtx);
>  static void distribute_notes (rtx, rtx_insn *, rtx_insn *, rtx_insn *,
>                             rtx, rtx, rtx);
> -static void distribute_links (struct insn_link *, rtx_insn * = nullptr);
> +static void distribute_links (struct insn_link *, rtx_insn * = nullptr,
> +                           int limit = INT_MAX);
>  static void mark_used_regs_combine (rtx);
>  static void record_promoted_value (rtx_insn *, rtx);
>  static bool unmentioned_reg_p (rtx, rtx);
> @@ -4591,7 +4594,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>        }
>  
>      if (only_i3_changed)
> -      distribute_links (i3links, i3);
> +      distribute_links (i3links, i3, param_max_combine_search_insns);
>      else
>        {
>       distribute_links (i3links);
> @@ -14992,10 +14995,12 @@ distribute_notes (rtx notes, rtx_insn *from_insn, 
> rtx_insn *i3, rtx_insn *i2,
>     pointing at I3 when I3's destination is changed.
>  
>     If START is nonnull and an insn, we know that the next location for each
> -   link is no earlier than START.  */
> +   link is no earlier than START.  LIMIT is the maximum number of nondebug
> +   instructions that can be scanned when looking for the next use of a
> +   definition.  */
>  
>  static void
> -distribute_links (struct insn_link *links, rtx_insn *start)
> +distribute_links (struct insn_link *links, rtx_insn *start, int limit)
>  {
>    struct insn_link *link, *next_link;
>  
> @@ -15061,9 +15066,12 @@ distribute_links (struct insn_link *links, rtx_insn 
> *start)
>        I3 to I2.  Also note that not much searching is typically done here
>        since most links don't point very far away.  */
>  
> +      int count = 0;
>        insn = start;
>        if (!insn || NOTE_P (insn))
>       insn = NEXT_INSN (link->insn);
> +      else
> +     count = link->insn_count;
>        for (;
>          (insn && (this_basic_block->next_bb == EXIT_BLOCK_PTR_FOR_FN (cfun)
>                    || BB_HEAD (this_basic_block->next_bb) != insn));
> @@ -15084,6 +15092,11 @@ distribute_links (struct insn_link *links, rtx_insn 
> *start)
>         }
>       else if (INSN_P (insn) && reg_set_p (reg, insn))
>         break;
> +     else if (count >= limit)
> +       break;
> +     else
> +       count += 1;
> +      link->insn_count = count;
>  
>        /* If we found a place to put the link, place it there unless there
>        is already a link to the same insn as LINK at that point.  */
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 0da3f296beb..4c1f903d370 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -16467,6 +16467,15 @@ in combiner for a pseudo register as last known 
> value of that register.
>  @item max-combine-insns
>  The maximum number of instructions the RTL combiner tries to combine.
>  
> +@item max-combine-search-insns
> +The maximum number of instructions that the RTL combiner searches in order
> +to find the next use of a given register definition.  If this limit is 
> reached
> +without finding such a use, the combiner will stop trying to optimize the
> +definition.
> +
> +Currently this limit only applies after certain successful combination
> +attempts, but it could be extended to other cases in future.
> +
>  @item integer-share-limit
>  Small integer constants can use a shared data structure, reducing the
>  compiler's memory usage and increasing its speed.  This sets the maximum
> diff --git a/gcc/params.opt b/gcc/params.opt
> index 4f4eb4d7a2a..422d082b355 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -477,6 +477,10 @@ The maximum number of instructions to consider to unroll 
> in a loop on average.
>  Common Joined UInteger Var(param_max_combine_insns) Init(4) IntegerRange(2, 
> 4) Param Optimization
>  The maximum number of insns combine tries to combine.
>  
> +-param=max-combine-search-insns=
> +Common Joined UInteger Var(param_max_combine_search_insns) Init(3000) Param 
> Optimization
> +The maximum number of instructions that combine searches in order to find 
> the next use of a particular register.
> +
>  -param=max-completely-peel-loop-nest-depth=
>  Common Joined UInteger Var(param_max_unroll_iterations) Init(8) Param 
> Optimization
>  The maximum depth of a loop nest we completely peel.
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to