On Mon, 14 Oct 2024, Hongtao Liu wrote:

> On Mon, Oct 14, 2024 at 1:50 PM Richard Biener <rguent...@suse.de> wrote:
> >
> > On Mon, 14 Oct 2024, Hongtao Liu wrote:
> >
> > > On Sun, Oct 13, 2024 at 8:02 PM Richard Biener <rguent...@suse.de> wrote:
> > > >
> > > > On Sun, 13 Oct 2024, Hongtao Liu wrote:
> > > >
> > > > > On Fri, Oct 11, 2024 at 8:33 PM Hongtao Liu <crazy...@gmail.com> 
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Oct 11, 2024 at 8:22 PM Richard Biener <rguent...@suse.de> 
> > > > > > wrote:
> > > > > > >
> > > > > > > The following helps the x86 backend by canonicalizing FMAs to have
> > > > > > > any negation done to one of the commutative multiplication 
> > > > > > > operands
> > > > > > > be done to a register (and not a memory operand).  Likewise to
> > > > > > > put a register operand first and a memory operand second;
> > > > > > > swap_commutative_operands_p seems to treat REG_P and MEM_P the
> > > > > > > same but comments indicate "complex expressiosn should be first".
> > > > > > >
> > > > > > > In particular this does (fma MEM REG REG) -> (fma REG MEM REG) and
> > > > > > > (fma (neg MEM) REG REG) -> (fma (neg REG) MEM REG) which are the
> > > > > > > reasons for the testsuite regressions in 
> > > > > > > gcc.target/i386/cond_op_fma*.c
> > > > > > >
> > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > > > >
> > > > > > > I'm not quite sure this is the correct approach - simplify-rtx
> > > > > > > doesn't seem to do "only canonicalization" but the existing FMA
> > > > > > > case looks odd in that context.
> > > > > > >
> > > > > > > Should the target simply reject cases with wrong 
> > > > > > > "canonicalization"
> > > > > > > or does it need to cope with all variants in the patterns that 
> > > > > > > fail
> > > > > > > matching during combine without the change?
> > > > > > Let me try the backend fix first.
> > > > > The backend fix requires at least 8 more patterns, so I think RTL
> > > > > canonicalization would be better.
> > > > > Please go ahead with the patch.
> > > >
> > > > I'm still looking for insights on how we usually canonicalize on RTL
> > > > (and where we document canonicalizations) and how we maintain RTL
> > > > in canonical form.
> > > >
> > > > I'm also still wondering why the backend accepts "non-canonical" RTL
> > > > instead of rejecting it, giving the middle-end the chance to try
> > > > an alternative variant?
> > > So you mean middle-end will alway canonicalize (fma: reg mem reg) to
> > > (fma: mem reg reg)?
> > > I only saw that RTL will canonicalize (fma: a (neg: b) c) to (fma: (neg 
> > > a) b c).
> > > Or what do you mean about "non-canonical" RTL in the backend?
> >
> > "non-canonical" RTL in the backend is what the patterns in question
> > for this bugreport do not accept.  But maybe I'm missing something here.
> >
> > IIRC there's code somewhere in combine to try several "canonical"
> > varaints of an insns in recog_for_combine, but as said I'm not very
> > familiar with how things work on RTL to decide what's conceptually the
> > correct thing to do here.  I just discvered simplify_rtx already does some
> > minor canonicalization for FMAs ...
> I think FMA itself is ok, reg or mem is also ok, the problem is with
> masked FMA(with mult operands).
> so there are
> (vec_merge: (fma: op1 op2 op3) op1 mask) or (vec_merge: (fma:op2 op1
> op3) op1 mask)
> They're the same instruction since op1 and op2 are commutative.
> Either the middle-end should canonicalize them, or the backend adds an
> extra alternative to match the potential optimization.
> 
> So maybe in combine.cc: maybe_swap_commutative_operands, let's always
> canonicalize to (vec_merge: (fma: op1 op2 op3) op1 mask),  together
> with the backend patch(which relaxes the predicates + some other
> potential changes for the pattern), should fix the issue.

I think only the MULT will get to maybe_swap_commutative_operands
and x86 asks for the "simple" (reg) operand to go
first and the MEM to go last.  Likewise for
(mult (neg (MEM..)) (reg)) to become (mult (neg reg) (MEM..)) which
makes a former simple operand complex - it's also not a simple
swap of operands given we only have FMA and not FNMA like on GIMPLE
but we represent the N as (neg ..) on the first MULT operand.

Other than that it uses swap_commutative_operands_p which doesn't
distinguish MEM from REG and considers (neg ...) complex so put
first (that part works already).

Richard.

> >
> > Richard.
> >
> > > >
> > > > Richard.
> > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Richard.
> > > > > > >
> > > > > > >         PR target/117072
> > > > > > >         * simplify-rtx.cc 
> > > > > > > (simplify_context::simplify_ternary_operation):
> > > > > > >         Adjust FMA canonicalization.
> > > > > > > ---
> > > > > > >  gcc/simplify-rtx.cc | 15 +++++++++++++--
> > > > > > >  1 file changed, 13 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> > > > > > > index e8e60404ef6..8b4fa0d7aa4 100644
> > > > > > > --- a/gcc/simplify-rtx.cc
> > > > > > > +++ b/gcc/simplify-rtx.cc
> > > > > > > @@ -6830,10 +6830,21 @@ 
> > > > > > > simplify_context::simplify_ternary_operation (rtx_code code, 
> > > > > > > machine_mode mode,
> > > > > > >             op0 = tem, op1 = XEXP (op1, 0), any_change = true;
> > > > > > >         }
> > > > > > >
> > > > > > > -      /* Canonicalize the two multiplication operands.  */
> > > > > > > +      /* Canonicalize the two multiplication operands.  A 
> > > > > > > negation
> > > > > > > +        should go first and if possible the negation should be
> > > > > > > +        to a register.  */
> > > > > > >        /* a * -b + c  =>  -b * a + c.  */
> > > > > > > -      if (swap_commutative_operands_p (op0, op1))
> > > > > > > +      if (swap_commutative_operands_p (op0, op1)
> > > > > > > +         || (REG_P (op1) && !REG_P (op0) && GET_CODE (op0) != 
> > > > > > > NEG))
> > > > > > >         std::swap (op0, op1), any_change = true;
> > > > > > > +      else if (GET_CODE (op0) == NEG && !REG_P (XEXP (op0, 0))
> > > > > > > +              && REG_P (op1))
> > > > > > > +       {
> > > > > > > +         op0 = XEXP (op0, 0);
> > > > > > > +         op1 = simplify_gen_unary (NEG, mode, op1, mode);
> > > > > > > +         std::swap (op0, op1);
> > > > > > > +         any_change = true;
> > > > > > > +       }
> > > > > > >
> > > > > > >        if (any_change)
> > > > > > >         return gen_rtx_FMA (mode, op0, op1, op2);
> > > > > > > --
> > > > > > > 2.43.0
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > BR,
> > > > > > Hongtao
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguent...@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 
> > > > Nuernberg,
> > > > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> > > > HRB 36809 (AG Nuernberg)
> > >
> > >
> > >
> > >
> >
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 
> 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to