On Thu, 3 Apr 2025, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > On Wed, 2 Apr 2025, Richard Sandiford wrote:
> >
> >> Richard Biener <rguent...@suse.de> writes:
> >> > On Tue, 1 Apr 2025, Richard Sandiford wrote:
> >> >
> >> >> The problem in PR101523 was that, after each successful 2->2
> >> >> combination attempt, distribute_links would search further and
> >> >> further for the next combinable use of the i2 destination.
> >> >> The original patch for the PR dealt with that by disallowing such
> >> >> combinations.  However, that led to various optimisation regressions,
> >> >> so there was interest in allowing the combinations again, at least
> >> >> until an alternative way of getting the same results is in place.
> >> >> 
> >> >> This patch does that, but limits distribute_links to a certain number
> >> >> of instructions when i2 is unchanged.  As Segher said in the PR trail,
> >> >> it would make more conceptual sense to apply the limit unconditionally,
> >> >> but I thought it would be better to change as little as possible at
> >> >> this development stage.  Logically, in stage 1, the --param should
> >> >> be applied directly by distribute_links with no input from callers.
> >> >> 
> >> >> As I mentioned in:
> >> >> 
> >> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c28
> >> >> 
> >> >> I think it's safe to drop log links even if a use exists.  All
> >> >> processing of log links seems to handle the absence of a link
> >> >> for a particular register in a conservative way.
> >> >> 
> >> >> The initial set-up errs on the side of dropping links, since for example
> >> >> create_log_links has:
> >> >> 
> >> >>              /* flow.c claimed:
> >> >> 
> >> >>                  We don't build a LOG_LINK for hard registers contained
> >> >>                  in ASM_OPERANDs.  If these registers get replaced,
> >> >>                  we might wind up changing the semantics of the insn,
> >> >>                  even if reload can make what appear to be valid
> >> >>                  assignments later.  */
> >> >>               if (regno < FIRST_PSEUDO_REGISTER
> >> >>                   && asm_noperands (PATTERN (use_insn)) >= 0)
> >> >>                 continue;
> >> >> 
> >> >> which excludes combinations by dropping log links, rather than during
> >> >> try_combine.  And:
> >> >> 
> >> >>       /* If this register is being initialized using itself, and the
> >> >>          register is uninitialized in this basic block, and there are
> >> >>          no LOG_LINKS which set the register, then part of the
> >> >>          register is uninitialized.  In that case we can't assume
> >> >>          anything about the number of nonzero bits.
> >> >> 
> >> >>          ??? We could do better if we checked this in
> >> >>          reg_{nonzero_bits,num_sign_bit_copies}_for_combine.  Then we
> >> >>          could avoid making assumptions about the insn which initially
> >> >>          sets the register, while still using the information in other
> >> >>          insns.  We would have to be careful to check every insn
> >> >>          involved in the combination.  */
> >> >> 
> >> >>       if (insn
> >> >>           && reg_referenced_p (x, PATTERN (insn))
> >> >>           && !REGNO_REG_SET_P (DF_LR_IN (BLOCK_FOR_INSN (insn)),
> >> >>                                REGNO (x)))
> >> >>         {
> >> >>           struct insn_link *link;
> >> >> 
> >> >>           FOR_EACH_LOG_LINK (link, insn)
> >> >>             if (dead_or_set_p (link->insn, x))
> >> >>               break;
> >> >>           if (!link)
> >> >>             {
> >> >>               rsp->nonzero_bits = GET_MODE_MASK (mode);
> >> >>               rsp->sign_bit_copies = 1;
> >> >>               return;
> >> >>             }
> >> >>         }
> >> >> 
> >> >> treats the lack of a log link is a possible sign of uninitialised data,
> >> >> but that would be a missed optimisation rather than a correctness issue.
> >> >> 
> >> >> One question is what the default --param value should be.  I went with
> >> >> Jakub's suggestion of 3000 from:
> >> >> 
> >> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c25
> >> >> 
> >> >> Also, to answer Jakub's question in that comment, I tried bisecting:
> >> >> 
> >> >>   int limit = atoi (getenv ("BISECT"));
> >> >> 
> >> >> (so applying the limit for all calls from try_combine) with an
> >> >> abort in distribute_links if the limit caused a link to be skipped.
> >> >> The minimum BISECT value that allowed an aarch64-linux-gnu bootstrap
> >> >> to succeed with --enable-languages=all --enable-checking=yes,rtl,extra
> >> >> was 142, so much lower than the parameter value.  I realised too late
> >> >> that --enable-checking=release would probably have been a more
> >> >> interesting test.
> >> >> 
> >> >> Some of the try_combine changes come from Richi's patch in:
> >> >> 
> >> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523#c53
> >> >> 
> >> >> Bootstrapped & regression-tested on aarch64-linux-gnu,
> >> >> powerpc64le-linux-gnu and x86_64-linux-gnu.  OK to install?
> >> >> 
> >> >> Richard
> >> >> 
> >> >> 
> >> >> gcc/
> >> >>         PR testsuite/116398
> >> >>         * params.opt (-param=max-combine-search-insns=): New param.
> >> >>         * doc/invoke.texi: Document it.
> >> >>         * combine.cc (distribute_links): Add a limit parameter.
> >> >>         (try_combine): Use the new param to limit distribute_links
> >> >>         when i2 is unchanged.  Return i3 rather than i2 if i2 is 
> >> >> unchanged.
> >> >> 
> >> >> gcc/testsuite/
> >> >>         * gcc.target/aarch64/popcnt-le-1.c: Account for commutativity 
> >> >> of TST.
> >> >>         * gcc.target/aarch64/popcnt-le-3.c: Likewise AND.
> >> >>         * gcc.target/aarch64/sve/pred-not-gen-1.c: Revert previous 
> >> >> patch.
> >> >>         * gcc.target/aarch64/sve/pred-not-gen-4.c: Likewise.
> >> >>         * gcc.target/aarch64/sve/var_stride_2.c: Likewise.
> >> >>         * gcc.target/aarch64/sve/var_stride_4.c: Likewise.
> >> >> 
> >> >> Co-authored-by: Richard Biener <rguent...@suse.de>
> >> >> ---
> >> >>  gcc/combine.cc                                | 30 +++++++++----------
> >> >>  gcc/doc/invoke.texi                           |  9 ++++++
> >> >>  gcc/params.opt                                |  4 +++
> >> >>  .../gcc.target/aarch64/popcnt-le-1.c          |  4 +--
> >> >>  .../gcc.target/aarch64/popcnt-le-3.c          |  4 +--
> >> >>  gcc/testsuite/gcc.target/aarch64/pr100056.c   |  4 ++-
> >> >>  .../gcc.target/aarch64/sve/pred-not-gen-1.c   |  4 +--
> >> >>  .../gcc.target/aarch64/sve/pred-not-gen-4.c   |  4 +--
> >> >>  .../gcc.target/aarch64/sve/var_stride_2.c     |  3 +-
> >> >>  .../gcc.target/aarch64/sve/var_stride_4.c     |  3 +-
> >> >>  10 files changed, 42 insertions(+), 27 deletions(-)
> >> >> 
> >> >> diff --git a/gcc/combine.cc b/gcc/combine.cc
> >> >> index ef13f5d5e90..28403d1f9e0 100644
> >> >> --- a/gcc/combine.cc
> >> >> +++ b/gcc/combine.cc
> >> >> @@ -472,7 +472,7 @@ static void move_deaths (rtx, rtx, int, rtx_insn *, 
> >> >> rtx *);
> >> >>  static bool reg_bitfield_target_p (rtx, rtx);
> >> >>  static void distribute_notes (rtx, rtx_insn *, rtx_insn *, rtx_insn *,
> >> >>                               rtx, rtx, rtx);
> >> >> -static void distribute_links (struct insn_link *);
> >> >> +static void distribute_links (struct insn_link *, int limit = INT_MAX);
> >> >>  static void mark_used_regs_combine (rtx);
> >> >>  static void record_promoted_value (rtx_insn *, rtx);
> >> >>  static bool unmentioned_reg_p (rtx, rtx);
> >> >> @@ -4208,16 +4208,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, 
> >> >> rtx_insn *i1, rtx_insn *i0,
> >> >>        adjust_for_new_dest (i3);
> >> >>      }
> >> >>  
> >> >> -  /* If I2 didn't change, this is not a combination (but a 
> >> >> simplification or
> >> >> -     canonicalisation with context), which should not be done here.  
> >> >> Doing
> >> >> -     it here explodes the algorithm.  Don't.  */
> >> >> -  if (rtx_equal_p (newi2pat, PATTERN (i2)))
> >> >> -    {
> >> >> -      if (dump_file)
> >> >> -       fprintf (dump_file, "i2 didn't change, not doing this\n");
> >> >> -      undo_all ();
> >> >> -      return 0;
> >> >> -    }
> >> >> +  bool i2_unchanged = rtx_equal_p (newi2pat, PATTERN (i2));
> >> >>  
> >> >>    /* We now know that we can do this combination.  Merge the insns and
> >> >>       update the status of registers and LOG_LINKS.  */
> >> >> @@ -4593,10 +4584,11 @@ try_combine (rtx_insn *i3, rtx_insn *i2, 
> >> >> rtx_insn *i1, rtx_insn *i0,
> >> >>                             NULL_RTX, NULL_RTX, NULL_RTX);
> >> >>        }
> >> >>  
> >> >> -    distribute_links (i3links);
> >> >> -    distribute_links (i2links);
> >> >> -    distribute_links (i1links);
> >> >> -    distribute_links (i0links);
> >> >> +    int limit = i2_unchanged ? param_max_combine_search_insns : 
> >> >> INT_MAX;
> >> >> +    distribute_links (i3links, limit);
> >> >> +    distribute_links (i2links, limit);
> >> >> +    distribute_links (i1links, limit);
> >> >> +    distribute_links (i0links, limit);
> >> >
> >> > I'm not sure I understand how the distribute_links mixes in with
> >> > the change below.
> >> >
> >> > The original issue reported in the PR was that returning i2
> >> > caused evern insn between i2 and i3 to be re-processed by the
> >> > main loop, re-trying previously failed combine attempts and
> >> > producing garbage.
> >> >
> >> > The disadvantage of directly returning i3 is that if added_links_insn
> >> > was inbetween i2 and i3 we've missed to re-try on insn with changed
> >> > links.
> >> 
> >> Can that happen though?  log links are supposed to be attached to the
> >> next use of the register (only), so I wouldn't expect the next use to be
> >> before i3 (putatively the previous first use).
> >
> > But log-links on i2 are for uses in i2 for defs before i2, given
> > i0/i1/i2 changed/have gone away we adjust those to eventually end
> > up on insns between i2 and i3 (or indeed after i3).  Combine then
> > want's to try combine the insns with changed log-links.
> 
> Right.  But I meant in the case where i2 hasn't changed, which
> I thought was the case in contention.

Yes.

> If i2 hasn't changed then
> its uses are the same, and so there is no need to distribute its
> log links.

But we still do that, even with my patch?  I probably fail to fully
understand how log-links are initially set up and how they "evolve"
when successfully applying a combination.

> >> > But what you do now also limits the insn walk in distribute_links
> >> > (IMO a good thing on its own), but not taking advantage of that
> >> > when i2_unchanged (distributing the link to insns between i2 and i3
> >> > is useless in that case).
> >> 
> >> Yeah, it's true that distribute_links could potentially start the
> >> search at a better place.  But that's IMO a third, somewhat independent
> >> optimisation.  And it seems more dangerous than the proposed patch,
> >> since if it turns out that the next use is before i3 in some current
> >> or future circumstances, we could end up generating wrong code.
> >> 
> >> > Ideally we'd distribute links to insns between i2 and i3 in a backward
> >> > walk from i3, but pick up the first use if within the limit, as we
> >> > want to limit the number of insns we reprocess in the try_combine callers
> >> > loop.
> >> >
> >> > Really ideally we'd want to remember all added_links_insn and only
> >> > re-process those, not all other unaffected insns between i2 and i3
> >> > (or earlier).
> >> >
> >> > That said, your change looks like two independent changes?  IIRC Jakub
> >> > at some point proposed to limit the 'return i3' below to cases where
> >> > i2 is too far away or sth like that.
> >> 
> >> Yes, it's two independent changes.  The return i3 one is taken directly
> >> from your comment in:
> >> 
> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523#c53
> >> 
> >> I think that's the right thing to do, for the reason you explained there
> >> and above, so I didn't want to lose it.  But like you say in the PR,
> >> combine did still take a large slice of time with that change in isolation.
> >> 
> >> Jakub discussed the idea of augmenting that change with one that makes
> >> try_combine check luids.  The distribute_links part of the patch is
> >> supposed to be an alternative to that.
> >
> > I see.  I understood Jakub thinks my change is too aggressive since
> > it does indeed miss combine attempts from changed log-links.
> >
> > I remember that when allowing added_links_insn to override the
> > i3 return for i2_unchaged_p the bad compile-time re-surfaced.  So
> > now I wonder whether, if i2_unchanged, distribute_links (i2links)
> > does (should do?) anything?  I don't see anything that would avoid
> > adjusting links when there's still a use in newi2_patt (IIRC
> > i0 and i1 are always gone?).
> 
> Yeah, that's also what I meant.  But, once the "return i3" is in place,
> the remaining large compile time partly comes from distributing i3's link
> for i2's destination, and I suppose from the general "walk all instructions
> between A and B" that try_combine uses to check dataflow, with A and B
> getting gradually further apart.

Yep.  At least there should be no reason to distribute links on i2
(or new_i2patt).  Also, since we know i3 is now the closest use of i2
there's no need to start the search for the next use at i2 but we
can search from i3?  Of course distribute_links only gets passed the
list of log-links, not the use_stmt they are on, so it starts on
NEXT_INSN (link->insn) instead of NEXT_INSN (use_insn) (in fact,
when there's still a use on use_insn, should we update the link at all?
currently it doesn't I guess, since it will find i3 in the search).

So unconditionally doing

  distribute_links (i2links, i2);
  distribute_links (i3links, i3);

etc. would be OK and never change anything?  We'd need to start at
use_insn, not NEXT_INSN (use_insn) because of when there are still
uses on use_insn.


Note I generally agree with Jakub that since this is a P1 we should
go with some change.  I'd split it up though, install the 'return i3'
separately from limiting distribute_links work.  And for the latter
I still wonder about the above thing and doing

 if (!i2_unchanged)
   distribute_links (i2links/*, i2 */);

thus not distribute links of i2 if it did not change (probably
not much effect if we start the search at i2 itself).

Richard.


> FWIW, with:
> 
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index ef13f5d5e90..e12a78c8d89 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -4208,16 +4208,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>        adjust_for_new_dest (i3);
>      }
>  
> -  /* If I2 didn't change, this is not a combination (but a simplification or
> -     canonicalisation with context), which should not be done here.  Doing
> -     it here explodes the algorithm.  Don't.  */
> -  if (rtx_equal_p (newi2pat, PATTERN (i2)))
> -    {
> -      if (dump_file)
> -     fprintf (dump_file, "i2 didn't change, not doing this\n");
> -      undo_all ();
> -      return 0;
> -    }
> +  bool only_i3_changed = !i0 && !i1 && rtx_equal_p (newi2pat, PATTERN (i2));
>  
>    /* We now know that we can do this combination.  Merge the insns and
>       update the status of registers and LOG_LINKS.  */
> @@ -4785,6 +4776,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>    combine_successes++;
>    undo_commit ();
>  
> +  if (only_i3_changed)
> +    return i3;
> +
>    rtx_insn *ret = newi2pat ? i2 : i3;
>    if (added_links_insn && DF_INSN_LUID (added_links_insn) < DF_INSN_LUID 
> (ret))
>      ret = added_links_insn;
> 
> so essentially your patch, I see an 11.9x improvement in combine's
> compile time in the original testcase for PR101523, going from 88%
> to 41% of total compile time.  Memory usage improves 110x, from
> 97% to 25%.  The output is unchanged.  Adding:
> 
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index e12a78c8d89..becfd0a65ff 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -472,7 +472,7 @@ static void move_deaths (rtx, rtx, int, rtx_insn *, rtx 
> *);
>  static bool reg_bitfield_target_p (rtx, rtx);
>  static void distribute_notes (rtx, rtx_insn *, rtx_insn *, rtx_insn *,
>                             rtx, rtx, rtx);
> -static void distribute_links (struct insn_link *);
> +static void distribute_links (struct insn_link *, rtx_insn * = nullptr);
>  static void mark_used_regs_combine (rtx);
>  static void record_promoted_value (rtx_insn *, rtx);
>  static bool unmentioned_reg_p (rtx, rtx);
> @@ -4209,6 +4209,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>      }
>  
>    bool only_i3_changed = !i0 && !i1 && rtx_equal_p (newi2pat, PATTERN (i2));
> +  if (only_i3_changed)
> +    swap_i2i3 = false;
>  
>    /* We now know that we can do this combination.  Merge the insns and
>       update the status of registers and LOG_LINKS.  */
> @@ -4584,8 +4586,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>                           NULL_RTX, NULL_RTX, NULL_RTX);
>        }
>  
> -    distribute_links (i3links);
> -    distribute_links (i2links);
> +    distribute_links (i3links, only_i3_changed ? i3 : nullptr);
> +    distribute_links (i2links, !i0 && !i1 ? i2 : nullptr);
>      distribute_links (i1links);
>      distribute_links (i0links);
>  
> @@ -14978,10 +14980,13 @@ distribute_notes (rtx notes, rtx_insn *from_insn, 
> rtx_insn *i3, rtx_insn *i2,
>  
>  /* Similarly to above, distribute the LOG_LINKS that used to be present on
>     I3, I2, and I1 to new locations.  This is also called to add a link
> -   pointing at I3 when I3's destination is changed.  */
> +   pointing at I3 when I3's destination is changed.
> +
> +   If START is nonnull and an insn, we know that the next location for
> +   each link is no earlier than START.  */
>  
>  static void
> -distribute_links (struct insn_link *links)
> +distribute_links (struct insn_link *links, rtx_insn *start)
>  {
>    struct insn_link *link, *next_link;
>  
> @@ -15047,7 +15052,10 @@ distribute_links (struct insn_link *links)
>        I3 to I2.  Also note that not much searching is typically done here
>        since most links don't point very far away.  */
>  
> -      for (insn = NEXT_INSN (link->insn);
> +      insn = start;
> +      if (!insn || NOTE_P (insn))
> +     insn = NEXT_INSN (link->insn);
> +      for (;
>          (insn && (this_basic_block->next_bb == EXIT_BLOCK_PTR_FOR_FN (cfun)
>                    || BB_HEAD (this_basic_block->next_bb) != insn));
>          insn = NEXT_INSN (insn))
> 
> improves compile time a further 17.7%, going from 41% to 33% of
> compile time.  This still generates the same code for the testcase
> as the unpatched compiler, and I checked that a boostrap and regression
> test succeeded with the (independent rather than cumulative) patch:
> 
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index ef13f5d5e90..ba9649b9b13 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -472,7 +472,7 @@ static void move_deaths (rtx, rtx, int, rtx_insn *, rtx 
> *);
>  static bool reg_bitfield_target_p (rtx, rtx);
>  static void distribute_notes (rtx, rtx_insn *, rtx_insn *, rtx_insn *,
>                             rtx, rtx, rtx);
> -static void distribute_links (struct insn_link *);
> +static void distribute_links (struct insn_link *, rtx_insn * = nullptr);
>  static void mark_used_regs_combine (rtx);
>  static void record_promoted_value (rtx_insn *, rtx);
>  static bool unmentioned_reg_p (rtx, rtx);
> @@ -4208,16 +4208,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>        adjust_for_new_dest (i3);
>      }
>  
> -  /* If I2 didn't change, this is not a combination (but a simplification or
> -     canonicalisation with context), which should not be done here.  Doing
> -     it here explodes the algorithm.  Don't.  */
> -  if (rtx_equal_p (newi2pat, PATTERN (i2)))
> -    {
> -      if (dump_file)
> -     fprintf (dump_file, "i2 didn't change, not doing this\n");
> -      undo_all ();
> -      return 0;
> -    }
> +  bool only_i3_changed = !i0 && !i1 && rtx_equal_p (newi2pat, PATTERN (i2));
>  
>    /* We now know that we can do this combination.  Merge the insns and
>       update the status of registers and LOG_LINKS.  */
> @@ -4301,6 +4292,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>             FOR_EACH_LOG_LINK (link, insn)
>               if (link->insn == i3 && link->regno == regno)
>                 {
> +                 gcc_assert (!only_i3_changed);
>                   link->insn = i2;
>                   done = true;
>                   break;
> @@ -4593,8 +4585,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>                           NULL_RTX, NULL_RTX, NULL_RTX);
>        }
>  
> -    distribute_links (i3links);
> -    distribute_links (i2links);
> +    distribute_links (i3links, only_i3_changed ? i3 : nullptr);
> +    distribute_links (i2links, !i0 && !i1 ? i2 : nullptr);
>      distribute_links (i1links);
>      distribute_links (i0links);
>  
> @@ -14984,10 +14976,13 @@ distribute_notes (rtx notes, rtx_insn *from_insn, 
> rtx_insn *i3, rtx_insn *i2,
>  
>  /* Similarly to above, distribute the LOG_LINKS that used to be present on
>     I3, I2, and I1 to new locations.  This is also called to add a link
> -   pointing at I3 when I3's destination is changed.  */
> +   pointing at I3 when I3's destination is changed.
> +
> +   LIMIT is the maximum number of nondebug instructions that can be scanned
> +   when looking for the next use of a definition.  */
>  
>  static void
> -distribute_links (struct insn_link *links)
> +distribute_links (struct insn_link *links, rtx_insn *start)
>  {
>    struct insn_link *link, *next_link;
>  
> @@ -15053,10 +15048,15 @@ distribute_links (struct insn_link *links)
>        I3 to I2.  Also note that not much searching is typically done here
>        since most links don't point very far away.  */
>  
> +      if (link->insn == start)
> +     start = nullptr;
>        for (insn = NEXT_INSN (link->insn);
>          (insn && (this_basic_block->next_bb == EXIT_BLOCK_PTR_FOR_FN (cfun)
>                    || BB_HEAD (this_basic_block->next_bb) != insn));
>          insn = NEXT_INSN (insn))
> +     {
> +       if (insn == start)
> +         start = nullptr;
>       if (DEBUG_INSN_P (insn))
>         continue;
>       else if (INSN_P (insn) && reg_overlap_mentioned_p (reg, PATTERN (insn)))
> @@ -15073,6 +15073,7 @@ distribute_links (struct insn_link *links)
>         }
>       else if (INSN_P (insn) && reg_set_p (reg, insn))
>         break;
> +     }
>  
>        /* If we found a place to put the link, place it there unless there
>        is already a link to the same insn as LINK at that point.  */
> @@ -15081,6 +15082,7 @@ distribute_links (struct insn_link *links)
>       {
>         struct insn_link *link2;
>  
> +       gcc_assert (!start || NOTE_P (start));
>         FOR_EACH_LOG_LINK (link2, place)
>           if (link2->insn == link->insn && link2->regno == link->regno)
>             break;
> 
> But still, 33% is a lot for one pass.  So I think we still need a limit
> of some kind, even though that would drop optimisations in extreme cases.
> If we don't put a limit on distribute_links, and don't put a limit
> on try_combine (as for Jakub's luid patch), we could instead limit
> the number of times that distribute_links distributes a particular link.
> There is a 32-bit hole in insn_link on LP64 hosts.
> 
> That said, I'm kind-of becoming more sympathetic to Segher's view that
> these kinds of 2->2 combination were a mistake, although I still stand
> by my original question of why we don't just revert
> c4c5ad1d6d1e1e1fe7a1c2b3bb097cc269dc7306.
> 
> Thanks,
> Richard
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to