On Mon, Oct 16, 2023 at 9:27 PM Jeff Law <jeffreya...@gmail.com> wrote: > > > > On 10/15/23 03:49, Roger Sayle wrote: > > > > Hi Jeff, > > Thanks for the speedy review(s). > > > >> From: Jeff Law <jeffreya...@gmail.com> > >> Sent: 15 October 2023 00:03 > >> To: Roger Sayle <ro...@nextmovesoftware.com>; gcc-patches@gcc.gnu.org > >> Subject: Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in > >> make_compound_operation. > >> > >> On 10/14/23 16:14, Roger Sayle wrote: > >>> > >>> This patch is my proposed solution to PR rtl-optimization/91865. > >>> Normally RTX simplification canonicalizes a ZERO_EXTEND of a > >>> ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is > >>> possible for combine's make_compound_operation to unintentionally > >>> generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is > >>> unlikely to be matched by the backend. > >>> > >>> For the new test case: > >>> > >>> const int table[2] = {1, 2}; > >>> int foo (char i) { return table[i]; } > >>> > >>> compiling with -O2 -mlarge on msp430 we currently see: > >>> > >>> Trying 2 -> 7: > >>> 2: r25:HI=zero_extend(R12:QI) > >>> REG_DEAD R12:QI > >>> 7: r28:PSI=sign_extend(r25:HI)#0 > >>> REG_DEAD r25:HI > >>> Failed to match this instruction: > >>> (set (reg:PSI 28 [ iD.1772 ]) > >>> (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) > >>> > >>> which results in the following code: > >>> > >>> foo: AND #0xff, R12 > >>> RLAM.A #4, R12 { RRAM.A #4, R12 > >>> RLAM.A #1, R12 > >>> MOVX.W table(R12), R12 > >>> RETA > >>> > >>> With this patch, we now see: > >>> > >>> Trying 2 -> 7: > >>> 2: r25:HI=zero_extend(R12:QI) > >>> REG_DEAD R12:QI > >>> 7: r28:PSI=sign_extend(r25:HI)#0 > >>> REG_DEAD r25:HI > >>> Successfully matched this instruction: > >>> (set (reg:PSI 28 [ iD.1772 ]) > >>> (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing > >>> combination of insns 2 and 7 original costs 4 + 8 = 12 replacement > >>> cost 8 > >>> > >>> foo: MOV.B R12, R12 > >>> RLAM.A #1, R12 > >>> MOVX.W table(R12), R12 > >>> RETA > >>> > >>> > >>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > >>> and make -k check, both with and without --target_board=unix{-m32} > >>> with no new failures. Ok for mainline? > >>> > >>> 2023-10-14 Roger Sayle <ro...@nextmovesoftware.com> > >>> > >>> gcc/ChangeLog > >>> PR rtl-optimization/91865 > >>> * combine.cc (make_compound_operation): Avoid creating a > >>> ZERO_EXTEND of a ZERO_EXTEND. > >>> > >>> gcc/testsuite/ChangeLog > >>> PR rtl-optimization/91865 > >>> * gcc.target/msp430/pr91865.c: New test case. > >> Neither an ACK or NAK at this point. > >> > >> The bug report includes a patch from Segher which purports to fix this in > >> simplify- > >> rtx. Any thoughts on Segher's approach and whether or not it should be > >> considered? > >> > >> The BZ also indicates that removal of 2 patterns from msp430.md would > >> solve this > >> too (though it may cause regressions elsewhere?). Any thoughts on that > >> approach > >> as well? > >> > > > > Great questions. I believe Segher's proposed patch (in comment #4) was an > > msp430-specific proof-of-concept workaround rather than intended to be fix. > > Eliminating a ZERO_EXTEND simply by changing the mode of a hard register > > is not a solution that'll work on many platforms (and therefore not really > > suitable > > for target-independent middle-end code in the RTL optimizers). > Thanks. I didn't really look at Segher's patch, so thanks for digging > into it. Certainly just flipping the mode of the hard register isn't > correct. > > > > > > The underlying issue, which is applicable to all targets, is that > > combine.cc's > > make_compound_operation is expected to reverse the local transformations > > made by expand_compound_operation. Hence, if an RTL expression is > > canonical going into expand_compound_operation, it is expected (hoped) > > to be canonical (and equivalent) coming out of make_compound_operation. > In theory, correct. > > > > > > Hence, rather than be a MSP430 specific issue, no target should expect (or > > be expected to see) a ZERO_EXTEND of a ZERO_EXTEND, or a SIGN_EXTEND > > of a ZERO_EXTEND in the RTL stream. Much like a binary operator with two > > CONST_INT operands, or a shift by zero, it's something the middle-end might > > reasonably be expected to clean-up. [Yeah, I know... 😊] > Agreed. > > > > > > >>> (set (reg:PSI 28 [ iD.1772 ]) > >>> (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) > > > > As a rule of thumb, if the missed optimization bug report includes combine's > > diagnostic "Failed to match this instruction:", things can be improved by > > adding > > a pattern (often a define_insn_and_split) that matches the shown RTL. > Yes, but we also need to ponder if that's the right way to fix any given > problem. Sometimes we're going to be better off simplifying elsewhere > in the pipeline. I think we can agree this is one of the cases where > matching the RTL in the backend is not the desired approach. > > Occasionally things like those two patterns show up for various reasons. > Hopefully they can be removed :-) Though the first looks awful close > to something I did for the mn102 (not to be confused with the mn103) > eons ago. Partial modes aren't exactly handled well. > > > > > In this case the perhaps reasonable assumption is that the above > > would/should > > (normally) match the backend's existing (zero_extend:PSI (reg:QI ...)) insn > > pattern. > > Or that's my understanding of why this PR is classified as rtl-optimization > > and > > not target. > I wouldn't put a lot of faith in the classification ;-) > > > > > > Finally, my patch shouldn't block a (updated corrected) variant of Segher's > > patch > > or other solution to PR 91865. The more (safe) solutions the merrier. > Generally agreed.
Looking at the patch I wonder whether handling (zero_extend (zero_extend ..)) shouldn't be done by using simplify_unary_operation instead of simplify_const_unary_operation here? If that's by design then I agree the patch looks reasonable (albeit ugly) as long as the reverse still works. But you probably need Seghers ack here. Thanks, Richard. > jeff