On Tue, Apr 1, 2025 at 10:55 AM Hongtao Liu <crazy...@gmail.com> wrote: > > On Tue, Apr 1, 2025 at 4:40 PM Hongyu Wang <hongyu.w...@intel.com> wrote: > > > > Hi, > > > > For spiltter after <rotate_insn><mode>3_mask it now splits the pattern > > to *<rotate_insn><mode>3_mask, causing the splitter doesn't generate > > nf variant. Add corresponding nf counterpart for define_insn_and_split > > to make the splitter also works for nf insn. > > > > Bootstrapped & regtested on x86-64-pc-linux-gnu. > > > > Ok for trunk? > > > > gcc/ChangeLog: > > > > PR target/119539 > > * config/i386/i386.md (*<insn><mode>3_mask_nf): New > > define_insn_and_split. > > (*<insn><mode>3_mask_1_nf): Likewise. > > (*<insn><mode>3_mask): Use force_lowpart_subreg. > > > > gcc/testsuite/ChangeLog: > > > > PR target/119539 > > * gcc.target/i386/apx-nf-pr119539.c: New test. > > --- > > gcc/config/i386/i386.md | 46 ++++++++++++++++++- > > .../gcc.target/i386/apx-nf-pr119539.c | 6 +++ > > 2 files changed, 50 insertions(+), 2 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c > > > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > > index f7f790d2aeb..42312f0c330 100644 > > --- a/gcc/config/i386/i386.md > > +++ b/gcc/config/i386/i386.md > > @@ -18131,6 +18131,30 @@ (define_expand "<insn><mode>3" > > DONE; > > }) > > > > +;; Avoid useless masking of count operand. > > +(define_insn_and_split "*<insn><mode>3_mask_nf" > > + [(set (match_operand:SWI 0 "nonimmediate_operand") > > + (any_rotate:SWI > > + (match_operand:SWI 1 "nonimmediate_operand") > > + (subreg:QI > > + (and > > + (match_operand 2 "int248_register_operand" "c") > > + (match_operand 3 "const_int_operand")) 0)))] > > + "TARGET_APX_NF > > + && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands) > > + && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) > > + == GET_MODE_BITSIZE (<MODE>mode)-1 > > + && ix86_pre_reload_split ()" > > + "#" > > + "&& 1" > > + [(set (match_dup 0) > > + (any_rotate:SWI (match_dup 1) > > + (match_dup 2)))] > > +{ > > + operands[2] = force_lowpart_subreg (QImode, operands[2], > > + GET_MODE (operands[2])); > > +}) > Can we just change the output in original pattern, I think combine > will still match the pattern even w/ clobber flags. > > like > > @@ -17851,8 +17851,17 @@ (define_insn_and_split "*<insn><mode>3_mask" > (match_dup 2))) > (clobber (reg:CC FLAGS_REG))])] > { > - operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); > - operands[2] = gen_lowpart (QImode, operands[2]); > + if (TARGET_APX_F) > + { > + emit_move_insn (operands[0], > + gen_rtx_<code> (<MODE>mode, operands[1], operands[2])); > + DONE; > + } > + else > + { > + operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); > + operands[2] = gen_lowpart (QImode, operands[2]);
Please note we have a new "force_lowpart_subreg" function that operates on a register operand and (if possible) simplifies a subreg of a subreg, otherwise it forces the operand to register and creates a subreg of the temporary. Similar to the above combination, with a possibility to avoid a temporary reg. > Also we can remove constraint "c" in the original pattern. The constraint is here to handle corner case, where combine propagated an insn RTX with fixed register, other than %ecx, into shift RTX. Register allocator was not able to fix the combination, so TARGET_LEGITIMATE_COMBINED_INSN hook was introduced that rejected unwanted combinations. Please see the comment in ix86_legitimate_combined_insn function. Perhaps the above is not relevant anymore with the new register allocator (LRA), and the constraint can indeed be removed. But please take some caution. Uros.