https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523
--- Comment #53 from Richard Biener <rguenth at gcc dot gnu.org> ---
So just to recap, with reverting the change and instead doing
diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d836..ff25752cac4 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -4186,6 +4186,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx_insn *i0,
adjust_for_new_dest (i3);
}
+ bool i2_unchanged = false;
+ if (rtx_equal_p (newi2pat, PATTERN (i2)))
+ i2_unchanged = true;
+
/* We now know that we can do this combination. Merge the insns and
update the status of registers and LOG_LINKS. */
@@ -4752,6 +4756,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx_insn *i0,
combine_successes++;
undo_commit ();
+ if (i2_unchanged)
+ return i3;
+
rtx_insn *ret = newi2pat ? i2 : i3;
if (added_links_insn && DF_INSN_LUID (added_links_insn) < DF_INSN_LUID
(ret))
ret = added_links_insn;
combine time is down from 79s (93%) to 3.5s (37%), quite a bit more than
with the currently installed patch which has combine down to 0.02s (0%).
But notably peak memory use is down from 9GB to 400MB (installed patch 340MB).
That was with a cross from x86_64-linux and a release checking build.
This change should avoid any code generation changes, I do think if the
pattern doesn't change what distribute_notes/links does should be a no-op
even to I2 so we can ignore added_{links,notes}_insn (not ignoring them
only provides a 50% speedup).
I like the 0% combine result of the installed patch but the regressions
observed probably mean this needs to be defered to stage1.