Andrew Pinski <pins...@gmail.com> writes: > Hi, > I was looking into why we don't produce fmls with a scalar register > as the last argument but I found a difference in how fnma<mode>4 is > described in RTL which I think is causing the missed optimization. > Look at the scalar version: > > (define_insn "fnma<mode>4" > [(set (match_operand:GPF_F16 0 "register_operand" "=w") > (fma:GPF_F16 > (neg:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")) > (match_operand:GPF_F16 2 "register_operand" "w") > (match_operand:GPF_F16 3 "register_operand" "w")))] > "TARGET_FLOAT" > "fmsub\\t%<s>0, %<s>1, %<s>2, %<s>3" > [(set_attr "type" "fmac<stype>")] > ) > > vs the vector version: > (define_insn "fnma<mode>4" > [(set (match_operand:VHSDF 0 "register_operand" "=w") > (fma:VHSDF > (match_operand:VHSDF 1 "register_operand" "w") > (neg:VHSDF > (match_operand:VHSDF 2 "register_operand" "w")) > (match_operand:VHSDF 3 "register_operand" "0")))] > "TARGET_SIMD" > "fmls\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>" > [(set_attr "type" "neon_fp_mla_<stype><q>")] > ) > > Notice how the neg is a different location for both of them. What is > the reason for that?
Yeah, that looks weird. We should be treating the first two operands of FMA as commutative, which with the normal canonicalization rules would make the scalar version right and the vector version the one that should change. Does that give the output you wanted? Or does it need to be the other way around? Thanks, Richard