Richard Biener <richard.guent...@gmail.com> writes: > On Tue, Oct 15, 2024 at 5:30 AM liuhongt <hongtao....@intel.com> wrote: >> >> For x86 masked fma, there're 2 rtl representations >> 1) (vec_merge (fma op2 op1 op3) op1 mask) >> 2) (vec_merge (fma op1 op2 op3) op1 mask). >> >> 5894(define_insn "<avx512>_fmadd_<mode>_mask<round_name>" >> 5895 [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v") >> 5896 (vec_merge:VFH_AVX512VL >> 5897 (fma:VFH_AVX512VL >> 5898 (match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0") >> 5899 (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" >> "<round_constraint>,v") >> 5900 (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" >> "v,<round_constraint>")) >> 5901 (match_dup 1) >> 5902 (match_operand:<avx512fmaskmode> 4 "register_operand" >> "Yk,Yk")))] >> 5903 "TARGET_AVX512F && <round_mode_condition>" >> 5904 "@ >> 5905 vfmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, >> %2<round_op5>} >> 5906 vfmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, >> %3<round_op5>}" >> 5907 [(set_attr "type" "ssemuladd") >> 5908 (set_attr "prefix" "evex") >> 5909 (set_attr "mode" "<MODE>")]) >> >> Here op1 has constraint "0", and the scecond op1 is (match_dup 1), >> we once tried to replace it with (match_operand:M 5 >> "nonimmediate_operand" "0")) to enable more flexibility for pattern >> match and recog, but it triggered an ICE in reload(reload can handle >> at most one perand with "0" constraint). >> >> So we need either add 2 patterns in the backend or just do the >> canonicalization in the middle-end. > > Nice spot to handle this. OK with the minor not below fixed > and in case there are no further comments from CCed folks. > > I'll note there's (vec_select (vec_concat (....)) as alternate > way to perform a (vec_merge ...) but I don't feel strongly > for supporting that alternative without evidence it's used. > aarch64 seems to use an UNSPEC for masking but it > seems to have at least two patterns to merge with > either the first or the third input but failing to handle the > other (second) operand of a multiplication (*cond_fma<mode>_2 and _4); > as both are "register_operand" I don't see how canonicalization > works there?
We try to handle that in the expander instead: /* Swap the multiplication operands if the fallback value is the second of the two. */ if (rtx_equal_p (operands[3], operands[5])) std::swap (operands[2], operands[3]); I suppose that doesn't help if the equivalence is only discovered by later RTL optimisations. Hopefully that should be rare though. > Of course we can't do anything for UNSPECs. Yeah. If we were going to represent the SVE operations in generic RTL, then I think the natural choice would be if_then_else. vec_merge wouldn't really work, since the predicates are vector-like rather than integers, and since the operands can be variable length. But there were two reasons for not doing that: (1) It doesn't accurately describe the behaviour of FP operations. E.g. an if_then_else or vec_merge of an fma does a full-vector fma and then discards some of the results. Taken at face value, that should raise the same exceptions as a plain fma. The SVE instructions instead only raise exceptions for active lanes. (2) For SVE, the predicate is often mandatory. I was worried that if we exposed it as generic RTL, simplify-rtx might apply some tricks that get rid of the condition/predicate, and so get trapped in dead-end recog attempts that made less progress than the UNSPEC versions. Thanks, Richard