Andrew Pinski <pins...@gmail.com> writes:
> Hi,
>   I was looking into why we don't produce fmls with a scalar register
> as the last argument but I found a difference in how fnma<mode>4 is
> described in RTL which I think is causing the missed optimization.
> Look at the scalar version:
>
> (define_insn "fnma<mode>4"
>   [(set (match_operand:GPF_F16 0 "register_operand" "=w")
>         (fma:GPF_F16
>           (neg:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w"))
>           (match_operand:GPF_F16 2 "register_operand" "w")
>           (match_operand:GPF_F16 3 "register_operand" "w")))]
>   "TARGET_FLOAT"
>   "fmsub\\t%<s>0, %<s>1, %<s>2, %<s>3"
>   [(set_attr "type" "fmac<stype>")]
> )
>
> vs the vector version:
> (define_insn "fnma<mode>4"
>   [(set (match_operand:VHSDF 0 "register_operand" "=w")
>         (fma:VHSDF
>           (match_operand:VHSDF 1 "register_operand" "w")
>           (neg:VHSDF
>             (match_operand:VHSDF 2 "register_operand" "w"))
>           (match_operand:VHSDF 3 "register_operand" "0")))]
>   "TARGET_SIMD"
>   "fmls\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
>   [(set_attr "type" "neon_fp_mla_<stype><q>")]
> )
>
> Notice how the neg is a different location for both of them.  What is
> the reason for that?

Yeah, that looks weird.  We should be treating the first two operands of
FMA as commutative, which with the normal canonicalization rules would
make the scalar version right and the vector version the one that should
change.

Does that give the output you wanted?  Or does it need to be the other
way around?

Thanks,
Richard

Reply via email to