On 10/15/23 03:49, Roger Sayle wrote:
Hi Jeff,
Thanks for the speedy review(s).
From: Jeff Law <jeffreya...@gmail.com>
Sent: 15 October 2023 00:03
To: Roger Sayle <ro...@nextmovesoftware.com>; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in
make_compound_operation.
On 10/14/23 16:14, Roger Sayle wrote:
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a
ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is
possible for combine's make_compound_operation to unintentionally
generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is
unlikely to be matched by the backend.
For the new test case:
const int table[2] = {1, 2};
int foo (char i) { return table[i]; }
compiling with -O2 -mlarge on msp430 we currently see:
Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
which results in the following code:
foo: AND #0xff, R12
RLAM.A #4, R12 { RRAM.A #4, R12
RLAM.A #1, R12
MOVX.W table(R12), R12
RETA
With this patch, we now see:
Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing
combination of insns 2 and 7 original costs 4 + 8 = 12 replacement
cost 8
foo: MOV.B R12, R12
RLAM.A #1, R12
MOVX.W table(R12), R12
RETA
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures. Ok for mainline?
2023-10-14 Roger Sayle <ro...@nextmovesoftware.com>
gcc/ChangeLog
PR rtl-optimization/91865
* combine.cc (make_compound_operation): Avoid creating a
ZERO_EXTEND of a ZERO_EXTEND.
gcc/testsuite/ChangeLog
PR rtl-optimization/91865
* gcc.target/msp430/pr91865.c: New test case.
Neither an ACK or NAK at this point.
The bug report includes a patch from Segher which purports to fix this in
simplify-
rtx. Any thoughts on Segher's approach and whether or not it should be
considered?
The BZ also indicates that removal of 2 patterns from msp430.md would solve this
too (though it may cause regressions elsewhere?). Any thoughts on that approach
as well?
Great questions. I believe Segher's proposed patch (in comment #4) was an
msp430-specific proof-of-concept workaround rather than intended to be fix.
Eliminating a ZERO_EXTEND simply by changing the mode of a hard register
is not a solution that'll work on many platforms (and therefore not really
suitable
for target-independent middle-end code in the RTL optimizers).
Thanks. I didn't really look at Segher's patch, so thanks for digging
into it. Certainly just flipping the mode of the hard register isn't
correct.
The underlying issue, which is applicable to all targets, is that combine.cc's
make_compound_operation is expected to reverse the local transformations
made by expand_compound_operation. Hence, if an RTL expression is
canonical going into expand_compound_operation, it is expected (hoped)
to be canonical (and equivalent) coming out of make_compound_operation.
In theory, correct.
Hence, rather than be a MSP430 specific issue, no target should expect (or
be expected to see) a ZERO_EXTEND of a ZERO_EXTEND, or a SIGN_EXTEND
of a ZERO_EXTEND in the RTL stream. Much like a binary operator with two
CONST_INT operands, or a shift by zero, it's something the middle-end might
reasonably be expected to clean-up. [Yeah, I know... 😊]
Agreed.
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
As a rule of thumb, if the missed optimization bug report includes combine's
diagnostic "Failed to match this instruction:", things can be improved by adding
a pattern (often a define_insn_and_split) that matches the shown RTL.
Yes, but we also need to ponder if that's the right way to fix any given
problem. Sometimes we're going to be better off simplifying elsewhere
in the pipeline. I think we can agree this is one of the cases where
matching the RTL in the backend is not the desired approach.
Occasionally things like those two patterns show up for various reasons.
Hopefully they can be removed :-) Though the first looks awful close
to something I did for the mn102 (not to be confused with the mn103)
eons ago. Partial modes aren't exactly handled well.
In this case the perhaps reasonable assumption is that the above would/should
(normally) match the backend's existing (zero_extend:PSI (reg:QI ...)) insn
pattern.
Or that's my understanding of why this PR is classified as rtl-optimization and
not target.
I wouldn't put a lot of faith in the classification ;-)
Finally, my patch shouldn't block a (updated corrected) variant of Segher's
patch
or other solution to PR 91865. The more (safe) solutions the merrier.
Generally agreed.
jeff