On Wed, Apr 2, 2025 at 2:58 PM Hongyu Wang <wwwhhhyyy...@gmail.com> wrote:
>
> > Can we just change the output in original pattern, I think combine
> > will still match the pattern even w/ clobber flags.
>
> Yes, adjusted and updated the patch in attachment.
Ok.
>
> Liu, Hongtao <hongtao....@intel.com> 于2025年4月2日周三 08:57写道:
> >
> >
> >
> > > -----Original Message-----
> > > From: Uros Bizjak <ubiz...@gmail.com>
> > > Sent: Tuesday, April 1, 2025 5:24 PM
> > > To: Hongtao Liu <crazy...@gmail.com>
> > > Cc: Wang, Hongyu <hongyu.w...@intel.com>; gcc-patches@gcc.gnu.org; Liu,
> > > Hongtao <hongtao....@intel.com>
> > > Subject: Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR
> > > 119539]
> > >
> > > On Tue, Apr 1, 2025 at 10:55 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > > >
> > > > On Tue, Apr 1, 2025 at 4:40 PM Hongyu Wang <hongyu.w...@intel.com>
> > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > For spiltter after <rotate_insn><mode>3_mask it now splits the
> > > > > pattern to *<rotate_insn><mode>3_mask, causing the splitter doesn't
> > > > > generate nf variant. Add corresponding nf counterpart for
> > > > > define_insn_and_split to make the splitter also works for nf insn.
> > > > >
> > > > > Bootstrapped & regtested on x86-64-pc-linux-gnu.
> > > > >
> > > > > Ok for trunk?
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >         PR target/119539
> > > > >         * config/i386/i386.md (*<insn><mode>3_mask_nf): New
> > > > >         define_insn_and_split.
> > > > >         (*<insn><mode>3_mask_1_nf): Likewise.
> > > > >         (*<insn><mode>3_mask): Use force_lowpart_subreg.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >         PR target/119539
> > > > >         * gcc.target/i386/apx-nf-pr119539.c: New test.
> > > > > ---
> > > > >  gcc/config/i386/i386.md                       | 46 
> > > > > ++++++++++++++++++-
> > > > >  .../gcc.target/i386/apx-nf-pr119539.c         |  6 +++
> > > > >  2 files changed, 50 insertions(+), 2 deletions(-)  create mode
> > > > > 100644 gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c
> > > > >
> > > > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index
> > > > > f7f790d2aeb..42312f0c330 100644
> > > > > --- a/gcc/config/i386/i386.md
> > > > > +++ b/gcc/config/i386/i386.md
> > > > > @@ -18131,6 +18131,30 @@ (define_expand "<insn><mode>3"
> > > > >    DONE;
> > > > >  })
> > > > >
> > > > > +;; Avoid useless masking of count operand.
> > > > > +(define_insn_and_split "*<insn><mode>3_mask_nf"
> > > > > +  [(set (match_operand:SWI 0 "nonimmediate_operand")
> > > > > +       (any_rotate:SWI
> > > > > +         (match_operand:SWI 1 "nonimmediate_operand")
> > > > > +         (subreg:QI
> > > > > +           (and
> > > > > +             (match_operand 2 "int248_register_operand" "c")
> > > > > +             (match_operand 3 "const_int_operand")) 0)))]
> > > > > +  "TARGET_APX_NF
> > > > > +   && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
> > > > > +   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
> > > > > +      == GET_MODE_BITSIZE (<MODE>mode)-1
> > > > > +   && ix86_pre_reload_split ()"
> > > > > +  "#"
> > > > > +  "&& 1"
> > > > > +  [(set (match_dup 0)
> > > > > +       (any_rotate:SWI (match_dup 1)
> > > > > +                       (match_dup 2)))] {
> > > > > +  operands[2] = force_lowpart_subreg (QImode, operands[2],
> > > > > +                                     GET_MODE (operands[2]));
> > > > > +})
> > > > Can we just change the output in original pattern, I think combine
> > > > will still match the pattern even w/ clobber flags.
> > > >
> > > > like
> > > >
> > > > @@ -17851,8 +17851,17 @@ (define_insn_and_split
> > > "*<insn><mode>3_mask"
> > > >                             (match_dup 2)))
> > > >        (clobber (reg:CC FLAGS_REG))])]  {
> > > > -  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> > > > -  operands[2] = gen_lowpart (QImode, operands[2]);
> > > > +  if (TARGET_APX_F)
> > > > +    {
> > > > +      emit_move_insn (operands[0],
> > > > +                    gen_rtx_<code> (<MODE>mode, operands[1], 
> > > > operands[2]));
> > > > +      DONE;
> > > > +    }
> > > > +  else
> > > > +    {
> > > > +      operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> > > > +      operands[2] = gen_lowpart (QImode, operands[2]);
> > >
> > > Please note we have a new "force_lowpart_subreg" function that operates on
> > > a register operand and (if possible) simplifies a subreg of a subreg, 
> > > otherwise it
> > > forces the operand to register and creates a subreg of the temporary. 
> > > Similar
> > > to the above combination, with a possibility to avoid a temporary reg.
> > >
> > > > Also we can remove constraint "c" in the original pattern.
> > >
> > > The constraint is here to handle corner case, where combine propagated an
> > > insn RTX with fixed register, other than %ecx, into shift RTX.
> > > Register allocator was not able to fix the combination, so
> > > TARGET_LEGITIMATE_COMBINED_INSN hook was introduced that rejected
> > > unwanted combinations. Please see the comment in
> > > ix86_legitimate_combined_insn function.
> > Oh, thanks for the explanation.
> > >
> > > Perhaps the above is not relevant anymore with the new register allocator
> > > (LRA), and the constraint can indeed be removed. But please take some
> > > caution.
> > >
> > > Uros.



-- 
BR,
Hongtao

Reply via email to