TL;DR: fixing a misdetection of what is a "simple move".
Looking into performace degradation after de-cc0 for CRIS, I
noticed combine behaving badly; it changed a move and a
right-shift into two right-shifts, where the "combined" move was
not eliminated in later passes, and where the deficiency caused
an extra insn in a hot loop: crcu16 (and crcu32) in coremark.
Before de-cc0, the insns input to combine looked like:
33: r58:SI=r56:SI 0>>r48:SI
REG_DEAD r56:SI
35: r37:HI=r58:SI#0
and after:
33: {r58:SI=r56:SI 0>>r48:SI;clobber dccr:CC;}
REG_DEAD r56:SI
REG_UNUSED dccr:CC
35: {r37:HI=r58:SI#0;clobber dccr:CC;}
REG_UNUSED dccr:CC
That is, there's always a parallel with a clobber of the
condition-codes register. Being a parallel, it's not an
is_just_move, but e.g. a single_set.
For the de-cc0:ed "combination", it ended up as
33: {r58:SI=r56:SI 0>>r48:SI;clobber dccr:CC;}
REG_UNUSED dccr:CC
35: {r37:HI#0=r56:SI 0>>r48:SI;clobber dccr:CC;}
REG_DEAD r56:SI
REG_UNUSED dccr:CC
That is, a move and a shift turned into two shifts; the extra
shift is not eliminated by later passes, while the move was
(with cc0, and "will be again") leading to redundant
instructions.
At first I thought this was due to parallel-ignorant old code
but the "guilty" change is actually pretty recent. Regarding a
parallel with a clobber not being "just" a move, there's only
the two adjacent callers seen in the patch (obviously with the
rename), and their use exactly matches to check that the
argument is a single_set which is a move. It's always applied
to an rtx_insn, so I changed the type and name to avoid the
"just" issue. I had to adjust the type when calling single_set.
I checked the original commit, c4c5ad1d6d1e1e a.k.a r263067 and
it seems parallels-as-sets were just overlooked and that this
patch appears to agree with the intent and the comment at the
use of i2_was_move and i3_was_move, which has a clause saying
"Also do this if we started with two insns neither of which was
a simple move".
With this correction in place, the performance degradation
related to de-cc0 of the CRIS port as measured by coremark is
gone and turned into a small win. N.B.: there certainly is a
code difference in other hot functions, and the swing between
different functions is larger than this difference; to be dealt
with separately.
Tested cris-elf, x86_64-linux, powerpc64le-linux, 2/3 through
aarch64-linux (unexpectedly slow).
Ok to commit?
2020-07-06 Hans-Peter Nilsson <[email protected]>
PR target/93372
* gcc/combine.c (is_move): Rename from is_just_move. Use
single_set instead of of peeking directly into the PATTERN.
---
gcc/combine.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/gcc/combine.c b/gcc/combine.c
index 7da144e..ed90b16 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -2624,13 +2624,15 @@ can_split_parallel_of_n_reg_sets (rtx_insn *insn, int n)
return true;
}
-/* Return whether X is just a single set, with the source
- a general_operand. */
+/* Return whether INSN is just a single set, with the source
+ a general_operand. INSN must be an insn, not stripped to its PATTERN. */
static bool
-is_just_move (rtx x)
+is_move (const rtx_insn *insn)
{
- if (INSN_P (x))
- x = PATTERN (x);
+ rtx x = single_set (insn);
+
+ if (x == NULL_RTX)
+ return false;
return (GET_CODE (x) == SET && general_operand (SET_SRC (x), VOIDmode));
}
@@ -3103,8 +3105,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx_insn *i0,
}
/* Record whether i2 and i3 are trivial moves. */
- i2_was_move = is_just_move (i2);
- i3_was_move = is_just_move (i3);
+ i2_was_move = is_move (i2);
+ i3_was_move = is_move (i3);
/* Record whether I2DEST is used in I2SRC and similarly for the other
cases. Knowing this will help in register status updating below. */
--
2.11.0