For x86 masked fma, there're 2 rtl representations
1) (vec_merge (fma op2 op1 op3) op1 mask)
2) (vec_merge (fma op1 op2 op3) op1 mask).
5894(define_insn "<avx512>_fmadd_<mode>_mask<round_name>"
5895 [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
5896 (vec_merge:VFH_AVX512VL
5897 (fma:VFH_AVX512VL
5898 (match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0")
5899 (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>"
"<round_constraint>,v")
5900 (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>"
"v,<round_constraint>"))
5901 (match_dup 1)
5902 (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
5903 "TARGET_AVX512F && <round_mode_condition>"
5904 "@
5905 vfmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3,
%2<round_op5>}
5906 vfmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2,
%3<round_op5>}"
5907 [(set_attr "type" "ssemuladd")
5908 (set_attr "prefix" "evex")
5909 (set_attr "mode" "<MODE>")])
Here op1 has constraint "0", and the scecond op1 is (match_dup 1),
we once tried to replace it with (match_operand:M 5
"nonimmediate_operand" "0")) to enable more flexibility for pattern
match and recog, but it triggered an ICE in reload(reload can handle
at most one perand with "0" constraint).
So we need either add 2 patterns in the backend or just do the
canonicalization in the middle-end.
The patch canonicalize it at combine, and adjust x86 backend patterns.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Any comments?
liuhongt (2):
Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to (vec_merge (fma
op1 op2 op3) op1 mask).
[x86] Canonicalize (vec_merge (fma: op2 op1 op3) (match_dup 1)) mask)
to (vec_merge (fma: op1 op2 op3) (match_dup 1)) mask)
gcc/combine.cc | 25 ++++++++++++
gcc/config/i386/sse.md | 86 +++++++++++++++++++++---------------------
2 files changed, 68 insertions(+), 43 deletions(-)
--
2.31.1