https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523
--- Comment #53 from Richard Biener <rguenth at gcc dot gnu.org> --- So just to recap, with reverting the change and instead doing diff --git a/gcc/combine.cc b/gcc/combine.cc index a4479f8d836..ff25752cac4 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -4186,6 +4186,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, adjust_for_new_dest (i3); } + bool i2_unchanged = false; + if (rtx_equal_p (newi2pat, PATTERN (i2))) + i2_unchanged = true; + /* We now know that we can do this combination. Merge the insns and update the status of registers and LOG_LINKS. */ @@ -4752,6 +4756,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, combine_successes++; undo_commit (); + if (i2_unchanged) + return i3; + rtx_insn *ret = newi2pat ? i2 : i3; if (added_links_insn && DF_INSN_LUID (added_links_insn) < DF_INSN_LUID (ret)) ret = added_links_insn; combine time is down from 79s (93%) to 3.5s (37%), quite a bit more than with the currently installed patch which has combine down to 0.02s (0%). But notably peak memory use is down from 9GB to 400MB (installed patch 340MB). That was with a cross from x86_64-linux and a release checking build. This change should avoid any code generation changes, I do think if the pattern doesn't change what distribute_notes/links does should be a no-op even to I2 so we can ignore added_{links,notes}_insn (not ignoring them only provides a 50% speedup). I like the 0% combine result of the installed patch but the regressions observed probably mean this needs to be defered to stage1.