On Tue, Apr 1, 2025 at 10:55 AM Hongtao Liu <crazy...@gmail.com> wrote:
>
> On Tue, Apr 1, 2025 at 4:40 PM Hongyu Wang <hongyu.w...@intel.com> wrote:
> >
> > Hi,
> >
> > For spiltter after <rotate_insn><mode>3_mask it now splits the pattern
> > to *<rotate_insn><mode>3_mask, causing the splitter doesn't generate
> > nf variant. Add corresponding nf counterpart for define_insn_and_split
> > to make the splitter also works for nf insn.
> >
> > Bootstrapped & regtested on x86-64-pc-linux-gnu.
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >         PR target/119539
> >         * config/i386/i386.md (*<insn><mode>3_mask_nf): New
> >         define_insn_and_split.
> >         (*<insn><mode>3_mask_1_nf): Likewise.
> >         (*<insn><mode>3_mask): Use force_lowpart_subreg.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         PR target/119539
> >         * gcc.target/i386/apx-nf-pr119539.c: New test.
> > ---
> >  gcc/config/i386/i386.md                       | 46 ++++++++++++++++++-
> >  .../gcc.target/i386/apx-nf-pr119539.c         |  6 +++
> >  2 files changed, 50 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c
> >
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index f7f790d2aeb..42312f0c330 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -18131,6 +18131,30 @@ (define_expand "<insn><mode>3"
> >    DONE;
> >  })
> >
> > +;; Avoid useless masking of count operand.
> > +(define_insn_and_split "*<insn><mode>3_mask_nf"
> > +  [(set (match_operand:SWI 0 "nonimmediate_operand")
> > +       (any_rotate:SWI
> > +         (match_operand:SWI 1 "nonimmediate_operand")
> > +         (subreg:QI
> > +           (and
> > +             (match_operand 2 "int248_register_operand" "c")
> > +             (match_operand 3 "const_int_operand")) 0)))]
> > +  "TARGET_APX_NF
> > +   && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
> > +   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
> > +      == GET_MODE_BITSIZE (<MODE>mode)-1
> > +   && ix86_pre_reload_split ()"
> > +  "#"
> > +  "&& 1"
> > +  [(set (match_dup 0)
> > +       (any_rotate:SWI (match_dup 1)
> > +                       (match_dup 2)))]
> > +{
> > +  operands[2] = force_lowpart_subreg (QImode, operands[2],
> > +                                     GET_MODE (operands[2]));
> > +})
> Can we just change the output in original pattern, I think combine
> will still match the pattern even w/ clobber flags.
>
> like
>
> @@ -17851,8 +17851,17 @@ (define_insn_and_split "*<insn><mode>3_mask"
>                             (match_dup 2)))
>        (clobber (reg:CC FLAGS_REG))])]
>  {
> -  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> -  operands[2] = gen_lowpart (QImode, operands[2]);
> +  if (TARGET_APX_F)
> +    {
> +      emit_move_insn (operands[0],
> +                    gen_rtx_<code> (<MODE>mode, operands[1], operands[2]));
> +      DONE;
> +    }
> +  else
> +    {
> +      operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> +      operands[2] = gen_lowpart (QImode, operands[2]);

Please note we have a new "force_lowpart_subreg" function that
operates on a register operand and (if possible) simplifies a subreg
of a subreg, otherwise it forces the operand to register and creates a
subreg of the temporary. Similar to the above combination, with a
possibility to avoid a temporary reg.

> Also we can remove constraint "c" in the original pattern.

The constraint is here to handle corner case, where combine propagated
an insn RTX with fixed register, other than %ecx, into shift RTX.
Register allocator was not able to fix the combination, so
TARGET_LEGITIMATE_COMBINED_INSN hook was introduced that rejected
unwanted combinations. Please see the comment in
ix86_legitimate_combined_insn function.

Perhaps the above is not relevant anymore with the new register
allocator (LRA), and the constraint can indeed be removed. But please
take some caution.

Uros.

Reply via email to