> -----Original Message-----
> From: Uros Bizjak <ubiz...@gmail.com>
> Sent: Tuesday, April 1, 2025 5:24 PM
> To: Hongtao Liu <crazy...@gmail.com>
> Cc: Wang, Hongyu <hongyu.w...@intel.com>; gcc-patches@gcc.gnu.org; Liu,
> Hongtao <hongtao....@intel.com>
> Subject: Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR
> 119539]
>
> On Tue, Apr 1, 2025 at 10:55 AM Hongtao Liu <crazy...@gmail.com> wrote:
> >
> > On Tue, Apr 1, 2025 at 4:40 PM Hongyu Wang <hongyu.w...@intel.com>
> wrote:
> > >
> > > Hi,
> > >
> > > For spiltter after <rotate_insn><mode>3_mask it now splits the
> > > pattern to *<rotate_insn><mode>3_mask, causing the splitter doesn't
> > > generate nf variant. Add corresponding nf counterpart for
> > > define_insn_and_split to make the splitter also works for nf insn.
> > >
> > > Bootstrapped & regtested on x86-64-pc-linux-gnu.
> > >
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/119539
> > > * config/i386/i386.md (*<insn><mode>3_mask_nf): New
> > > define_insn_and_split.
> > > (*<insn><mode>3_mask_1_nf): Likewise.
> > > (*<insn><mode>3_mask): Use force_lowpart_subreg.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR target/119539
> > > * gcc.target/i386/apx-nf-pr119539.c: New test.
> > > ---
> > > gcc/config/i386/i386.md | 46 ++++++++++++++++++-
> > > .../gcc.target/i386/apx-nf-pr119539.c | 6 +++
> > > 2 files changed, 50 insertions(+), 2 deletions(-) create mode
> > > 100644 gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c
> > >
> > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index
> > > f7f790d2aeb..42312f0c330 100644
> > > --- a/gcc/config/i386/i386.md
> > > +++ b/gcc/config/i386/i386.md
> > > @@ -18131,6 +18131,30 @@ (define_expand "<insn><mode>3"
> > > DONE;
> > > })
> > >
> > > +;; Avoid useless masking of count operand.
> > > +(define_insn_and_split "*<insn><mode>3_mask_nf"
> > > + [(set (match_operand:SWI 0 "nonimmediate_operand")
> > > + (any_rotate:SWI
> > > + (match_operand:SWI 1 "nonimmediate_operand")
> > > + (subreg:QI
> > > + (and
> > > + (match_operand 2 "int248_register_operand" "c")
> > > + (match_operand 3 "const_int_operand")) 0)))]
> > > + "TARGET_APX_NF
> > > + && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
> > > + && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
> > > + == GET_MODE_BITSIZE (<MODE>mode)-1
> > > + && ix86_pre_reload_split ()"
> > > + "#"
> > > + "&& 1"
> > > + [(set (match_dup 0)
> > > + (any_rotate:SWI (match_dup 1)
> > > + (match_dup 2)))] {
> > > + operands[2] = force_lowpart_subreg (QImode, operands[2],
> > > + GET_MODE (operands[2]));
> > > +})
> > Can we just change the output in original pattern, I think combine
> > will still match the pattern even w/ clobber flags.
> >
> > like
> >
> > @@ -17851,8 +17851,17 @@ (define_insn_and_split
> "*<insn><mode>3_mask"
> > (match_dup 2)))
> > (clobber (reg:CC FLAGS_REG))])] {
> > - operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> > - operands[2] = gen_lowpart (QImode, operands[2]);
> > + if (TARGET_APX_F)
> > + {
> > + emit_move_insn (operands[0],
> > + gen_rtx_<code> (<MODE>mode, operands[1], operands[2]));
> > + DONE;
> > + }
> > + else
> > + {
> > + operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> > + operands[2] = gen_lowpart (QImode, operands[2]);
>
> Please note we have a new "force_lowpart_subreg" function that operates on
> a register operand and (if possible) simplifies a subreg of a subreg,
> otherwise it
> forces the operand to register and creates a subreg of the temporary. Similar
> to the above combination, with a possibility to avoid a temporary reg.
>
> > Also we can remove constraint "c" in the original pattern.
>
> The constraint is here to handle corner case, where combine propagated an
> insn RTX with fixed register, other than %ecx, into shift RTX.
> Register allocator was not able to fix the combination, so
> TARGET_LEGITIMATE_COMBINED_INSN hook was introduced that rejected
> unwanted combinations. Please see the comment in
> ix86_legitimate_combined_insn function.
Oh, thanks for the explanation.
>
> Perhaps the above is not relevant anymore with the new register allocator
> (LRA), and the constraint can indeed be removed. But please take some
> caution.
>
> Uros.