Indu Bhagat <indu.bha...@oracle.com> writes: > On 4/15/25 11:52 AM, Richard Sandiford wrote: >> Indu Bhagat <indu.bha...@oracle.com> writes: >>> Using post-index st2g is a faster way of memory tagging/untagging. >>> Because a post-index 'st2g tag, [addr], #32' is equivalent to: >>> stg tag, addr, #0 >>> stg tag, addr, #16 >>> add addr, addr, #32 >>> >>> TBD: >>> - Currently generated by in the aarch64 backend. Not sure if this is >>> the right way to do it. >> >> If we do go for the "aarch64_granule_memory_operand" approach that >> I described for patch 3, then that predicate (and the associated constrant) >> could handle PRE_MODIFY and POST_MODIFY addresseses, which would remove >> the need for separate patterns. >> > > I think I understand :) I will try it out. I guess one of the unknowns > for me is whether the PRE_MODIFY / POST_MODIFY will be generated as > expected, even when the involved instructions have an unspec...
I wouldn't expect an ordinary unspec to matter, but yeah, perhaps the unspec_volatile would cause issues. If it turns out to be necessary, we could add a target hook that says that auto inc/dec address adjustments are allowed for a given insn. >>> - Also not clear how to weave in the generation of stz2g. >> >> I think stz2g could be: >> >> (set (match_operand:OI 0 "aarch64_granule_memory_operand" "+<new >> constraint>") >> (unspec_volatile:OI >> [(const_int 0) >> (match_operand:DI 1 "register_operand" "rk")] >> UNSPECV...)) >> > > The question I have is what changes will be necessary to have the > compiler DTRT: > > i.e. for the zero-init case, instead of > stg x1, [x1, #0] > str wzr, [x1] > generate > stzg x0, [x0] > > Similarly for the value init case, instead of > stg x0, [x0, #0] > mov w1, 42 > str w1, [x0] > generate > mov w1, #42 > stgp x1, xzr, [x0] > > I guess once I have worked out the patterns for above, I should see the > combiner in action DTRT, but I dont know for sure if something else in > the compiler will also need adjustments for these new MTE insns. Yeah, I'd expect combine to handle the STZG example. But the STGP example might need some changes to aarch64-ldp-fusion.cc. Thanks, Richard