On Mon, Oct 14, 2024 at 1:50 PM Richard Biener <rguent...@suse.de> wrote: > > On Mon, 14 Oct 2024, Hongtao Liu wrote: > > > On Sun, Oct 13, 2024 at 8:02 PM Richard Biener <rguent...@suse.de> wrote: > > > > > > On Sun, 13 Oct 2024, Hongtao Liu wrote: > > > > > > > On Fri, Oct 11, 2024 at 8:33 PM Hongtao Liu <crazy...@gmail.com> wrote: > > > > > > > > > > On Fri, Oct 11, 2024 at 8:22 PM Richard Biener <rguent...@suse.de> > > > > > wrote: > > > > > > > > > > > > The following helps the x86 backend by canonicalizing FMAs to have > > > > > > any negation done to one of the commutative multiplication operands > > > > > > be done to a register (and not a memory operand). Likewise to > > > > > > put a register operand first and a memory operand second; > > > > > > swap_commutative_operands_p seems to treat REG_P and MEM_P the > > > > > > same but comments indicate "complex expressiosn should be first". > > > > > > > > > > > > In particular this does (fma MEM REG REG) -> (fma REG MEM REG) and > > > > > > (fma (neg MEM) REG REG) -> (fma (neg REG) MEM REG) which are the > > > > > > reasons for the testsuite regressions in > > > > > > gcc.target/i386/cond_op_fma*.c > > > > > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > > > > > > > > > I'm not quite sure this is the correct approach - simplify-rtx > > > > > > doesn't seem to do "only canonicalization" but the existing FMA > > > > > > case looks odd in that context. > > > > > > > > > > > > Should the target simply reject cases with wrong "canonicalization" > > > > > > or does it need to cope with all variants in the patterns that fail > > > > > > matching during combine without the change? > > > > > Let me try the backend fix first. > > > > The backend fix requires at least 8 more patterns, so I think RTL > > > > canonicalization would be better. > > > > Please go ahead with the patch. > > > > > > I'm still looking for insights on how we usually canonicalize on RTL > > > (and where we document canonicalizations) and how we maintain RTL > > > in canonical form. > > > > > > I'm also still wondering why the backend accepts "non-canonical" RTL > > > instead of rejecting it, giving the middle-end the chance to try > > > an alternative variant? > > So you mean middle-end will alway canonicalize (fma: reg mem reg) to > > (fma: mem reg reg)? > > I only saw that RTL will canonicalize (fma: a (neg: b) c) to (fma: (neg a) > > b c). > > Or what do you mean about "non-canonical" RTL in the backend? > > "non-canonical" RTL in the backend is what the patterns in question > for this bugreport do not accept. But maybe I'm missing something here. > > IIRC there's code somewhere in combine to try several "canonical" > varaints of an insns in recog_for_combine, but as said I'm not very > familiar with how things work on RTL to decide what's conceptually the > correct thing to do here. I just discvered simplify_rtx already does some > minor canonicalization for FMAs ... I think FMA itself is ok, reg or mem is also ok, the problem is with masked FMA(with mult operands). so there are (vec_merge: (fma: op1 op2 op3) op1 mask) or (vec_merge: (fma:op2 op1 op3) op1 mask) They're the same instruction since op1 and op2 are commutative. Either the middle-end should canonicalize them, or the backend adds an extra alternative to match the potential optimization.
So maybe in combine.cc: maybe_swap_commutative_operands, let's always canonicalize to (vec_merge: (fma: op1 op2 op3) op1 mask), together with the backend patch(which relaxes the predicates + some other potential changes for the pattern), should fix the issue. > > Richard. > > > > > > > Richard. > > > > > > > > > > > > > > > Thanks, > > > > > > Richard. > > > > > > > > > > > > PR target/117072 > > > > > > * simplify-rtx.cc > > > > > > (simplify_context::simplify_ternary_operation): > > > > > > Adjust FMA canonicalization. > > > > > > --- > > > > > > gcc/simplify-rtx.cc | 15 +++++++++++++-- > > > > > > 1 file changed, 13 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > > > > > > index e8e60404ef6..8b4fa0d7aa4 100644 > > > > > > --- a/gcc/simplify-rtx.cc > > > > > > +++ b/gcc/simplify-rtx.cc > > > > > > @@ -6830,10 +6830,21 @@ > > > > > > simplify_context::simplify_ternary_operation (rtx_code code, > > > > > > machine_mode mode, > > > > > > op0 = tem, op1 = XEXP (op1, 0), any_change = true; > > > > > > } > > > > > > > > > > > > - /* Canonicalize the two multiplication operands. */ > > > > > > + /* Canonicalize the two multiplication operands. A negation > > > > > > + should go first and if possible the negation should be > > > > > > + to a register. */ > > > > > > /* a * -b + c => -b * a + c. */ > > > > > > - if (swap_commutative_operands_p (op0, op1)) > > > > > > + if (swap_commutative_operands_p (op0, op1) > > > > > > + || (REG_P (op1) && !REG_P (op0) && GET_CODE (op0) != NEG)) > > > > > > std::swap (op0, op1), any_change = true; > > > > > > + else if (GET_CODE (op0) == NEG && !REG_P (XEXP (op0, 0)) > > > > > > + && REG_P (op1)) > > > > > > + { > > > > > > + op0 = XEXP (op0, 0); > > > > > > + op1 = simplify_gen_unary (NEG, mode, op1, mode); > > > > > > + std::swap (op0, op1); > > > > > > + any_change = true; > > > > > > + } > > > > > > > > > > > > if (any_change) > > > > > > return gen_rtx_FMA (mode, op0, op1, op2); > > > > > > -- > > > > > > 2.43.0 > > > > > > > > > > > > > > > > > > > > -- > > > > > BR, > > > > > Hongtao > > > > > > > > > > > > > > > > > > > > > > -- > > > Richard Biener <rguent...@suse.de> > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, > > > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; > > > HRB 36809 (AG Nuernberg) > > > > > > > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) -- BR, Hongtao