On 01/26/15 22:11, Segher Boessenkool wrote:
On Mon, Jan 26, 2015 at 08:07:29PM -0700, Jeff Law wrote:
The second change we need is an additional simplification.
If we have
(subreg:M1 (zero_extend:M2 (x))
Where M1 > M2 and both are scalar integer modes. It's advantageous to
strip the SUBREG and instead have a wider extension.
Should you also check M1 is not multiple registers?
We're generally working with pseudos, so we could estimate, but not know
for sure if we're dealing with multiple hard regs. But more
importantly, I'm not sure what that check would buy us.
Bootstrapped and regression tested on x86_64-unknown-linux-gnu.
Thoughts?
It looks fine to me. Well, some comments...
@@ -2643,6 +2644,24 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx_insn *i0,
|| GET_CODE (src) == LSHIFTRT)
nshift++;
}
+
+ /* If I0 loads a memory and I3 sets the same memory, then I2 and I3
+ are likely manipulating its value. Ideally we'll be able to combine
+ all four insns into a bitfield insertion of some kind.
+
+ Note the source in I0 might be inside a sign/zero extension and the
+ memory modes in I0 and I3 might be different. So extract the address
+ from the destination of I3 and search for it in the source of I0.
+
+ In the event that there's a match but the source/dest do not actually
+ refer to the same memory, the worst that happens is we try some
+ combinations that we wouldn't have otherwise. */
+ if ((set0 = single_set (i0))
+ && (set3 = single_set (i3))
+ && GET_CODE (SET_DEST (set3)) == MEM
+ && rtx_referenced_p (XEXP (SET_DEST (set3), 0), SET_SRC (set0)))
+ ngood += 2;
I think you should test MEM_P (SET_SRC (set0)), too. Or even just test
rtx_equal_p (SET_DEST (set3), SET_SRC (set0)) ?
Yea, we need a tighter test on set0 to ensure it's a MEM. That code got
twidded before the last testrun. I'll take care of that.
Earlier versions checked reg_equal_p on the MEM. But that's often a
mistake because the modes of the two memory references may be different.
I don't recall which of the various tests, but I was definitely seeing
SImode in the load and HImode in the store.
Similarly you don't want to check reg_equal_p on the addresses as they
aren't necessarily the same either (they're obviously related).
That's how I ultimately settled on rtx_referenced_p form you see above.
I'm still not sure that's 100% what I want, but I don't have any tests
yet which require something more complex.
+
if (ngood < 2 && nshift < 2)
return 0;
}
@@ -5663,6 +5682,25 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int
in_dest,
return CONST0_RTX (mode);
}
+ /* If we have (subreg:M1 (zero_extend:M2 (x))) or
+ (subreg:M1 (sign_extend: M2 (x))) where M1 is wider
+ then M2, then go ahead and just widen the original extension.
+
+ While the subreg is useful in saying "I don't care about those
+ upper bits. Squashing out the subreg results in simpler RTL that
+ is more easily matched. */
Closing quote missing.
Fixed locally.
+ if ((GET_CODE (SUBREG_REG (x)) == ZERO_EXTEND
+ || GET_CODE (SUBREG_REG (x)) == SIGN_EXTEND)
+ && SCALAR_INT_MODE_P (GET_MODE (x))
+ && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (x)))
+ && GET_MODE (x) > GET_MODE (SUBREG_REG (x)))
GET_MODE_SIZE instead?
It's work as-is. But using GET_MODE_SIZE shows the intent clearer.
I'll fix that momentarily.
Does this do anything good for the "dec mem" thing on x86? That would
be a nice bonus :-)
It might, but I haven't tested for that specifically. If you've got
sample code or a PR in mind, pass it along and I'll take a look. I'd
think dec mem would generally be handled by 3->1 insn combination code
unless there's something else going on.
jef