Richard Biener <[email protected]> writes:
> On Tue, Oct 15, 2024 at 5:30 AM liuhongt <[email protected]> wrote:
>>
>> For x86 masked fma, there're 2 rtl representations
>> 1) (vec_merge (fma op2 op1 op3) op1 mask)
>> 2) (vec_merge (fma op1 op2 op3) op1 mask).
>>
>> 5894(define_insn "<avx512>_fmadd_<mode>_mask<round_name>"
>> 5895 [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
>> 5896 (vec_merge:VFH_AVX512VL
>> 5897 (fma:VFH_AVX512VL
>> 5898 (match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0")
>> 5899 (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>"
>> "<round_constraint>,v")
>> 5900 (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>"
>> "v,<round_constraint>"))
>> 5901 (match_dup 1)
>> 5902 (match_operand:<avx512fmaskmode> 4 "register_operand"
>> "Yk,Yk")))]
>> 5903 "TARGET_AVX512F && <round_mode_condition>"
>> 5904 "@
>> 5905 vfmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3,
>> %2<round_op5>}
>> 5906 vfmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2,
>> %3<round_op5>}"
>> 5907 [(set_attr "type" "ssemuladd")
>> 5908 (set_attr "prefix" "evex")
>> 5909 (set_attr "mode" "<MODE>")])
>>
>> Here op1 has constraint "0", and the scecond op1 is (match_dup 1),
>> we once tried to replace it with (match_operand:M 5
>> "nonimmediate_operand" "0")) to enable more flexibility for pattern
>> match and recog, but it triggered an ICE in reload(reload can handle
>> at most one perand with "0" constraint).
>>
>> So we need either add 2 patterns in the backend or just do the
>> canonicalization in the middle-end.
>
> Nice spot to handle this. OK with the minor not below fixed
> and in case there are no further comments from CCed folks.
>
> I'll note there's (vec_select (vec_concat (....)) as alternate
> way to perform a (vec_merge ...) but I don't feel strongly
> for supporting that alternative without evidence it's used.
> aarch64 seems to use an UNSPEC for masking but it
> seems to have at least two patterns to merge with
> either the first or the third input but failing to handle the
> other (second) operand of a multiplication (*cond_fma<mode>_2 and _4);
> as both are "register_operand" I don't see how canonicalization
> works there?
We try to handle that in the expander instead:
/* Swap the multiplication operands if the fallback value is the
second of the two. */
if (rtx_equal_p (operands[3], operands[5]))
std::swap (operands[2], operands[3]);
I suppose that doesn't help if the equivalence is only discovered
by later RTL optimisations. Hopefully that should be rare though.
> Of course we can't do anything for UNSPECs.
Yeah. If we were going to represent the SVE operations in generic RTL,
then I think the natural choice would be if_then_else. vec_merge wouldn't
really work, since the predicates are vector-like rather than integers,
and since the operands can be variable length.
But there were two reasons for not doing that:
(1) It doesn't accurately describe the behaviour of FP operations.
E.g. an if_then_else or vec_merge of an fma does a full-vector fma
and then discards some of the results. Taken at face value, that
should raise the same exceptions as a plain fma. The SVE instructions
instead only raise exceptions for active lanes.
(2) For SVE, the predicate is often mandatory. I was worried that if we
exposed it as generic RTL, simplify-rtx might apply some tricks that
get rid of the condition/predicate, and so get trapped in dead-end
recog attempts that made less progress than the UNSPEC versions.
Thanks,
Richard