Indu Bhagat <indu.bha...@oracle.com> writes: >>> starting bb 3 >>> 33: {cc:CC=cmp(r121:DI,0x10);r121:DI=r121:DI-0x10;} >>> 32: r122:DI=r122:DI+0x10 >>> 31: [r122:DI+0]=unspec/v[[r122:DI+0],r120:DI] 17 >>> mem count failure >>> mem count failure >> >> Yeah, we'd need to update auto-inc-dec for this case. >> >> But I'd forgotten that match_dup applies strict rtx_equal_p equality, >> whereas here we'd need the looser operands_match_p equality, with: >> >> /* If two operands must match, because they are really a single >> operand of an assembler insn, then two postincrements are invalid >> because the assembler insn would increment only once. >> On the other hand, a postincrement matches ordinary indexing >> if the postincrement is the output operand. */ >> if (code == POST_DEC || code == POST_INC || code == POST_MODIFY) >> return operands_match_p (XEXP (x, 0), y); >> /* Two preincrements are invalid >> because the assembler insn would increment only once. >> On the other hand, a preincrement matches ordinary indexing >> if the preincrement is the input operand. >> In this case, return 2, since some callers need to do special >> things when this happens. */ >> if (GET_CODE (y) == PRE_DEC || GET_CODE (y) == PRE_INC >> || GET_CODE (y) == PRE_MODIFY) >> return operands_match_p (x, XEXP (y, 0)) ? 2 : 0; >> >> So we probably do want two operands with matching constraints after all, >> rather than a simple match_dup: >> >> (match_operand:TI 1 "..._memory_operand" "0") >> >> It's a long time since I worked on a target that wanted to use this >> matching feature though. >> >> An alternative would be to define separate instructions that do the >> register increment in parallel with the memory operation, like the >> ldp/stp patterns. >> > > Just to make sure I understand correctly: And we generate postfix > stg/st2g via a gen_rtx_PARALLEL (stg, inc) rtx in aarch64.cc, right ? > > So with something like: > > (define_insn "*stg_postfix_wb" > [(set (match_operand:TI 0 "aarch64_granule16_memory_operand" "+Umg") > (unspec_volatile:TI > [(match_operand:TI 1 "aarch64_granule16_memory_operand" "0") > (match_operand:DI 2 "register_operand" "rk")] > UNSPECV_TAG_SPACE)) > (set (match_operand:DI 3 "register_operand" "=rk") > (plus:DI (match_dup 3) (match_operand:DI 4 > "aarch64_granule16_simm9" "i")))] > "TARGET_MEMTAG && (operands[3] == XEXP (operands[0], 0))" > "stg\\t%2, [%3], %4" > [(set_attr "type" "memtag")] > )
I think the addition conventionally goes before the load or store. (It doesn't make any semantic difference, but it makes things easier for load/store multiple patterns.) But the rtl pattern needs to directly tie the memory address to the register that is being incremented, and the source of the increment should be handled using matching constraints. So it would be something like: (define_insn "*stg_postfix_wb" [(set (match_operand:DI 0 "register_operand" "=rk") (plus:DI (match_operand:DI 1 "register_operand" "0") (match_operand:DI 2 "aarch64_granule16_simm9")))] (set (mem:TI (match_dup 1)) (unspec_volatile:TI [(mem:TI (match_dup 1)) (match_operand:DI 3 "register_operand" "rk")] UNSPECV_TAG_SPACE)) "TARGET_MEMTAG" "stg\t%3, [%0], %2" [(set_attr "type" "memtag")] ) (completely untested, so probably off). The pre-increment version would be similar, except that the addresses would be: (plus:DI (match_dup 1) (match_dup 2)) Thanks, Richard