Richard Biener <richard.guent...@gmail.com> writes:
> On Tue, Oct 15, 2024 at 5:30 AM liuhongt <hongtao....@intel.com> wrote:
>>
>> For x86 masked fma, there're 2 rtl representations
>> 1) (vec_merge (fma op2 op1 op3) op1 mask)
>> 2) (vec_merge (fma op1 op2 op3) op1 mask).
>>
>>  5894(define_insn "<avx512>_fmadd_<mode>_mask<round_name>"
>>  5895  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
>>  5896        (vec_merge:VFH_AVX512VL
>>  5897          (fma:VFH_AVX512VL
>>  5898            (match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0")
>>  5899            (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" 
>> "<round_constraint>,v")
>>  5900            (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" 
>> "v,<round_constraint>"))
>>  5901          (match_dup 1)
>>  5902          (match_operand:<avx512fmaskmode> 4 "register_operand" 
>> "Yk,Yk")))]
>>  5903  "TARGET_AVX512F && <round_mode_condition>"
>>  5904  "@
>>  5905   vfmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, 
>> %2<round_op5>}
>>  5906   vfmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, 
>> %3<round_op5>}"
>>  5907  [(set_attr "type" "ssemuladd")
>>  5908   (set_attr "prefix" "evex")
>>  5909   (set_attr "mode" "<MODE>")])
>>
>> Here op1 has constraint "0", and the scecond op1 is (match_dup 1),
>> we once tried to replace it with (match_operand:M 5
>> "nonimmediate_operand" "0")) to enable more flexibility for pattern
>> match and recog, but it triggered an ICE in reload(reload can handle
>> at most one perand with "0" constraint).
>>
>> So we need either add 2 patterns in the backend or just do the
>> canonicalization in the middle-end.
>
> Nice spot to handle this.  OK with the minor not below fixed
> and in case there are no further comments from CCed folks.
>
> I'll note there's (vec_select (vec_concat (....)) as alternate
> way to perform a (vec_merge ...) but I don't feel strongly
> for supporting that alternative without evidence it's used.
> aarch64 seems to use an UNSPEC for masking but it
> seems to have at least two patterns to merge with
> either the first or the third input but failing to handle the
> other (second) operand of a multiplication (*cond_fma<mode>_2 and _4);
> as both are "register_operand" I don't see how canonicalization
> works there?

We try to handle that in the expander instead:

    /* Swap the multiplication operands if the fallback value is the
       second of the two.  */
    if (rtx_equal_p (operands[3], operands[5]))
      std::swap (operands[2], operands[3]);

I suppose that doesn't help if the equivalence is only discovered
by later RTL optimisations.  Hopefully that should be rare though.

> Of course we can't do anything for UNSPECs.

Yeah.  If we were going to represent the SVE operations in generic RTL,
then I think the natural choice would be if_then_else.  vec_merge wouldn't
really work, since the predicates are vector-like rather than integers,
and since the operands can be variable length.

But there were two reasons for not doing that:

(1) It doesn't accurately describe the behaviour of FP operations.
    E.g. an if_then_else or vec_merge of an fma does a full-vector fma
    and then discards some of the results.  Taken at face value, that
    should raise the same exceptions as a plain fma.  The SVE instructions
    instead only raise exceptions for active lanes.

(2) For SVE, the predicate is often mandatory.  I was worried that if we
    exposed it as generic RTL, simplify-rtx might apply some tricks that
    get rid of the condition/predicate, and so get trapped in dead-end
    recog attempts that made less progress than the UNSPEC versions.

Thanks,
Richard

Reply via email to