Indu Bhagat <indu.bha...@oracle.com> writes:
> On 4/15/25 11:52 AM, Richard Sandiford wrote:
>> Indu Bhagat <indu.bha...@oracle.com> writes:
>>> Using post-index st2g is a faster way of memory tagging/untagging.
>>> Because a post-index 'st2g tag, [addr], #32' is equivalent to:
>>>     stg tag, addr, #0
>>>     stg tag, addr, #16
>>>     add addr, addr, #32
>>>
>>> TBD:
>>>    - Currently generated by in the aarch64 backend.  Not sure if this is
>>>      the right way to do it.
>> 
>> If we do go for the "aarch64_granule_memory_operand" approach that
>> I described for patch 3, then that predicate (and the associated constrant)
>> could handle PRE_MODIFY and POST_MODIFY addresseses, which would remove
>> the need for separate patterns.
>> 
>
> I think I understand :)  I will try it out.  I guess one of the unknowns 
> for me is whether the PRE_MODIFY / POST_MODIFY will be generated as 
> expected, even when the involved instructions have an unspec...

I wouldn't expect an ordinary unspec to matter, but yeah, perhaps the
unspec_volatile would cause issues.  If it turns out to be necessary,
we could add a target hook that says that auto inc/dec address adjustments
are allowed for a given insn.

>>>    - Also not clear how to weave in the generation of stz2g.
>> 
>> I think stz2g could be:
>> 
>> (set (match_operand:OI 0 "aarch64_granule_memory_operand" "+<new 
>> constraint>")
>>       (unspec_volatile:OI
>>         [(const_int 0)
>>          (match_operand:DI 1 "register_operand" "rk")]
>>         UNSPECV...))
>> 
>
> The question I have is what changes will be necessary to have the 
> compiler DTRT:
>
>    i.e. for the zero-init case, instead of
>          stg     x1, [x1, #0]
>          str     wzr, [x1]
>    generate
>          stzg x0, [x0]
>
>    Similarly for the value init case, instead of
>          stg     x0, [x0, #0]
>          mov     w1, 42
>          str     w1, [x0]
>     generate
>          mov  w1, #42
>          stgp x1, xzr, [x0]
>
> I guess once I have worked out the patterns for above, I should see the 
> combiner in action DTRT, but I dont know for sure if something else in 
> the compiler will also need adjustments for these new MTE insns.

Yeah, I'd expect combine to handle the STZG example.  But the STGP
example might need some changes to aarch64-ldp-fusion.cc.

Thanks,
Richard

Reply via email to