Hi Robin,

+(define_expand "vcond_mask_<mode><vm>"
+  [(set (match_operand:V_VLS 0 "register_operand")
+        (if_then_else:V_VLS
+          (match_operand:<VM> 3 "register_operand")
+          (match_operand:V_VLS 1 "nonmemory_operand")
+          (match_operand:V_VLS 2 "vector_register_or_const_0_operand")))]
+  "TARGET_VECTOR"

Would it hurt to allow any nonmemory operand here and just force the
"unsupported" constants into a register?

Are you talking about why operand 2 doesn't use nonmemory_operand predicate? If so, I think this is because our vmerge.v[vxi]m insns only supports that operand 1 is a scalar and operand 2 must be a vector register.


+  {
+    if (satisfies_constraint_Wc0 (operands[2]))
+      {
+        rtx reg = gen_reg_rtx (<MODE>mode);
+        emit_insn (gen_vec_duplicate_const_0<mode> (reg, operands[2]));

Can't we emit a move_insn directly without going through the new pattern?
Or will that be optimized away in between?

Currently, the move_insn of moving vector const 0 to a vector register will produce the bellow rtl from vector.md, I hope keep the simple pattern before split1 pass:

 (set (reg:RVVM2HI 149)
         (if_then_else:RVVM2HI (unspec:RVVMF8BI [
                     (const_vector:RVVMF8BI repeat [
                             (const_int 1 [0x1])
                         ])
                     (reg:DI 146)
                     (const_int 2 [0x2]) repeated x2
                     (const_int 1 [0x1])
                     (reg:SI 66 vl)
                     (reg:SI 67 vtype)
                 ] UNSPEC_VPREDICATE)
             (const_vector:RVVM2HI repeat [
                     (const_int 0 [0])
                 ])
             (unspec:RVVM2HI [
                     (reg:SI 0 zero)
                 ] UNSPEC_VUNDEF)))

And the new pattern isn't
actually a duplicate but a move anyway so maybe a force_reg (operands[2]) would
already do?  Probably initial values other than 0 don't work out of the box?

Yes, the name vec_duplicate_const_0 is missleading, maybe change it to mov_vec_const_0 is better.

In any case it wouldn't hurt to describe the "design decisions" (i.e. why
we need one thing and not another) so it's easier to follow the patterns
in the future.
Yes, I should have described the design in more detail. Before this patch, if I wanted to implement the current combine optimization, I would need to create a combine pattern that combines the following three instructions:

       (set (reg:RVVM2HI 149)
         (if_then_else:RVVM2HI (unspec:RVVMF8BI [
                     (const_vector:RVVMF8BI repeat [
                             (const_int 1 [0x1])
                         ])
                     (reg:DI 146)
                     (const_int 2 [0x2]) repeated x2
                     (const_int 1 [0x1])
                     (reg:SI 66 vl)
                     (reg:SI 67 vtype)
                 ] UNSPEC_VPREDICATE)
             (const_vector:RVVM2HI repeat [
                     (const_int 0 [0])
                 ])
             (unspec:RVVM2HI [
                     (reg:SI 0 zero)
                 ] UNSPEC_VUNDEF)))

       (set (reg:RVVM2HI 138 [ vect__ifc__38.17 ])
         (if_then_else:RVVM2HI (reg:RVVMF8BI 135 [ mask__18.11 ])
             (sign_extend:RVVM2HI (reg:RVVM1QI 136 [ vect__6.14 ]))
             (reg:RVVM2HI 149)))

       (set (reg:HI 151)
         (unspec:HI [
                 (reg:RVVM2HI 138 [ vect__ifc__38.17 ])
             ] UNSPEC_REDUC_SUM))

into one insn:
   (set (reg:SF 139 [ <retval> ])
        (unspec:SF [
             (if_then_else:RVVM2SF (reg:RVVMF16BI 135 [ mask__11.32 ])
                 (float_extend:RVVM2SF (reg:RVVM1HF 136 [ vect__7.35 ]))
                 (if_then_else:RVVM2SF (unspec:RVVMF16BI [
                             (const_vector:RVVMF16BI repeat [
                                     (const_int 1 [0x1])
                                 ])
                             (reg:DI 143)
                             (const_int 2 [0x2]) repeated x2
                             (const_int 1 [0x1])
                             (reg:SI 66 vl)
                             (reg:SI 67 vtype)
                         ] UNSPEC_VPREDICATE)
                     (const_vector:RVVM2SF repeat [
                             (const_double:SF 0.0 [0x0.0p+0])
                         ])
                     (unspec:RVVM2SF [
                             (reg:SI 0 zero)
                         ] UNSPEC_VUNDEF)))
         ] UNSPEC_REDUC_SUM_UNORDERED))

But I think the combine pattern is ugly because the move_vector_const_0_to_vector_registers has many extra operands we don't care about. So I want to introduce a simple pattern of it. I have two options here, one is to modify the generic mov pattern, changing define_expand to define_insn_and_split to keep pattern simple. The reason why I did not choose this one is that mov pattern uses a lot of places, including reload pass. The impact is relatively big.

The other option (this patch chooses) is to split the vcond_mask pattern for this particular case, allowing vec_const_0 to be passed into operand 2, and then generating it into a simple mov pattern. The purpose of splitting the vcond_mask pattern into three patterns is to generate a simple mov pattern. Simply allowing operand 2 of vcond_mask to allow vec const 0 will invalidate other existing combine patterns because they only allow the operand to be register. I wonder if these details explain why?

--
Best,
Lehua

Reply via email to