https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98348

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
In the light of the recent discussions I've been wondering about doing it as
combine splitters only, like roughly:
--- sse.md.jj   2020-12-03 10:04:35.862093285 +0100
+++ sse.md      2020-12-19 11:00:14.272140859 +0100
@@ -2965,6 +2965,40 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "<MODE>")])

+(define_split
+  [(set (match_operand 0 "register_operand")
+       (vec_merge
+         (match_operand 1 "vector_all_ones_operand")
+         (match_operand 2 "const0_operand")
+         (unspec
+           [(match_operand 3 "register_operand")
+            (match_operand 4 "nonimmediate_operand")
+            (match_operand:SI 5 "const_0_to_31_operand")]
+            UNSPEC_PCMP)))]
+  "TARGET_AVX512VL
+   && GET_MODE_CLASS (GET_MODE (operands[0])) == MODE_VECTOR_INT
+   && (GET_MODE_SIZE (GET_MODE (operands[1])) == 16
+       || GET_MODE_SIZE (GET_MODE (operands[1])) == 32)
+   && GET_MODE (operands[1]) == GET_MODE (operands[0])
+   && GET_MODE (operands[2]) == GET_MODE (operands[0])
+   && GET_MODE_CLASS (GET_MODE (operands[3])) == MODE_VECTOR_FLOAT
+   && (GET_MODE_SIZE (GET_MODE (operands[3]))
+       == GET_MODE_SIZE (GET_MODE (operands[0])))
+   && (GET_MODE_UNIT_SIZE (GET_MODE (operands[3]))
+       == GET_MODE_UNIT_SIZE (GET_MODE (operands[0])))
+   && GET_MODE (operands[4]) == GET_MODE (operands[3])"
+  [(set (match_dup 6) (match_dup 7))
+   (set (match_dup 0) (match_dup 8))]
+{
+  operands[6] = gen_reg_rtx (GET_MODE (operands[3]));
+  operands[7]
+    = gen_rtx_UNSPEC (GET_MODE (operands[3]),
+                     gen_rtvec (3, operands[3], operands[4], operands[5]),
+                     UNSPEC_PCMP);
+  operands[8] = lowpart_subreg (GET_MODE (operands[0]), operands[6],
+                               GET_MODE (operands[3]));
+})
+
 (define_insn "avx_vmcmp<mode>3"
   [(set (match_operand:VF_128 0 "register_operand" "=x")
        (vec_merge:VF_128

The advantage is that one pattern can then handle in theory all (or half) of
the floating point comparison cases.
One problem is that combiner still doesn't even try the splitting if only
combining two insns.
Also, but I think that is in your patch too, vector_all_ones_operand will match
only integral all ones vectors, I think we want another predicate that will
match even MEMs with the floating point version thereof (a NaN kind with all
bits set).  And, we should have splitters for not just the -1 0 order in
VEC_MERGE, but also the 0 -1 order by inverting the comparison carefully.

Reply via email to