Hi Robin,
+(define_expand "vcond_mask_<mode><vm>"
+ [(set (match_operand:V_VLS 0 "register_operand")
+ (if_then_else:V_VLS
+ (match_operand:<VM> 3 "register_operand")
+ (match_operand:V_VLS 1 "nonmemory_operand")
+ (match_operand:V_VLS 2 "vector_register_or_const_0_operand")))]
+ "TARGET_VECTOR"
Would it hurt to allow any nonmemory operand here and just force the
"unsupported" constants into a register?
Are you talking about why operand 2 doesn't use nonmemory_operand
predicate? If so, I think this is because our vmerge.v[vxi]m insns only
supports that operand 1 is a scalar and operand 2 must be a vector register.
+ {
+ if (satisfies_constraint_Wc0 (operands[2]))
+ {
+ rtx reg = gen_reg_rtx (<MODE>mode);
+ emit_insn (gen_vec_duplicate_const_0<mode> (reg, operands[2]));
Can't we emit a move_insn directly without going through the new pattern?
Or will that be optimized away in between?
Currently, the move_insn of moving vector const 0 to a vector register
will produce the bellow rtl from vector.md, I hope keep the simple
pattern before split1 pass:
(set (reg:RVVM2HI 149)
(if_then_else:RVVM2HI (unspec:RVVMF8BI [
(const_vector:RVVMF8BI repeat [
(const_int 1 [0x1])
])
(reg:DI 146)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM2HI repeat [
(const_int 0 [0])
])
(unspec:RVVM2HI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)))
And the new pattern isn't
actually a duplicate but a move anyway so maybe a force_reg (operands[2]) would
already do? Probably initial values other than 0 don't work out of the box?
Yes, the name vec_duplicate_const_0 is missleading, maybe change it to
mov_vec_const_0 is better.
In any case it wouldn't hurt to describe the "design decisions" (i.e. why
we need one thing and not another) so it's easier to follow the patterns
in the future.
Yes, I should have described the design in more detail. Before this
patch, if I wanted to implement the current combine optimization, I
would need to create a combine pattern that combines the following three
instructions:
(set (reg:RVVM2HI 149)
(if_then_else:RVVM2HI (unspec:RVVMF8BI [
(const_vector:RVVMF8BI repeat [
(const_int 1 [0x1])
])
(reg:DI 146)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM2HI repeat [
(const_int 0 [0])
])
(unspec:RVVM2HI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)))
(set (reg:RVVM2HI 138 [ vect__ifc__38.17 ])
(if_then_else:RVVM2HI (reg:RVVMF8BI 135 [ mask__18.11 ])
(sign_extend:RVVM2HI (reg:RVVM1QI 136 [ vect__6.14 ]))
(reg:RVVM2HI 149)))
(set (reg:HI 151)
(unspec:HI [
(reg:RVVM2HI 138 [ vect__ifc__38.17 ])
] UNSPEC_REDUC_SUM))
into one insn:
(set (reg:SF 139 [ <retval> ])
(unspec:SF [
(if_then_else:RVVM2SF (reg:RVVMF16BI 135 [ mask__11.32 ])
(float_extend:RVVM2SF (reg:RVVM1HF 136 [ vect__7.35 ]))
(if_then_else:RVVM2SF (unspec:RVVMF16BI [
(const_vector:RVVMF16BI repeat [
(const_int 1 [0x1])
])
(reg:DI 143)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM2SF repeat [
(const_double:SF 0.0 [0x0.0p+0])
])
(unspec:RVVM2SF [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)))
] UNSPEC_REDUC_SUM_UNORDERED))
But I think the combine pattern is ugly because the
move_vector_const_0_to_vector_registers has many extra operands we don't
care about. So I want to introduce a simple pattern of it. I have two
options here, one is to modify the generic mov pattern, changing
define_expand to define_insn_and_split to keep pattern simple. The
reason why I did not choose this one is that mov pattern uses a lot of
places, including reload pass. The impact is relatively big.
The other option (this patch chooses) is to split the vcond_mask pattern
for this particular case, allowing vec_const_0 to be passed into operand
2, and then generating it into a simple mov pattern. The purpose of
splitting the vcond_mask pattern into three patterns is to generate a
simple mov pattern. Simply allowing operand 2 of vcond_mask to allow vec
const 0 will invalidate other existing combine patterns because they
only allow the operand to be register. I wonder if these details explain
why?
--
Best,
Lehua