On 18/10/2024 15:22, Robin Dapp wrote:
This patch adds an undefined else operand to the masked loads.
@@ -4027,7 +4025,8 @@ (define_expand "mask_gather_load<mode><vnsi>"
(match_operand:<VnSI> 2 "register_operand")
(match_operand 3 "immediate_operand")
(match_operand:SI 4 "gcn_alu_operand")
- (match_operand:DI 5 "")]
+ (match_operand:DI 5 "")
+ (match_operand:V_MOV 6 "maskload_else_operand")]
""
{
rtx exec = force_reg (DImode, operands[5]);
@@ -4036,9 +4035,6 @@ (define_expand "mask_gather_load<mode><vnsi>"
operands[2], operands[4],
INTVAL (operands[3]), exec);
- /* Masked lanes are required to hold zero. */
- emit_move_insn (operands[0], gcn_vec_constant (<MODE>mode, 0));
-
if (GET_MODE (addr) == <VnDI>mode)
emit_insn (gen_gather<mode>_insn_1offset_exec (operands[0], addr,
const0_rtx, const0_rtx,
I'm not sure how this is different to just deleting the
zero-initializer, which is what I already tested and found some random
behaviour?
I'm not even sure why it makes any difference because what happens is
the "init-regs" pass sees the read-before-write and inserts a
zero-initialize instruction.
diff --git a/gcc/config/gcn/predicates.md b/gcc/config/gcn/predicates.md
index 3f59396a649..21beeb586a4 100644
--- a/gcc/config/gcn/predicates.md
+++ b/gcc/config/gcn/predicates.md
@@ -228,3 +228,5 @@ (define_predicate "ascending_zero_int_parallel"
return gcn_stepped_zero_int_parallel_p (op, 1);
})
+(define_predicate "maskload_else_operand"
+ (match_operand 0 "scratch_operand"))
Is this just a way to pass "undefined" to a pattern?
Anyway, I have some tests running, so we'll see what happens.
Andrew