On 5/1/25 11:48 AM, Richard Sandiford wrote:
Indu Bhagat <indu.bha...@oracle.com> writes:
MEMTAG sanitizer, which is based on the HWASAN sanitizer, will invoke
the target-specific hooks to create a random tag, add tag to memory
address, and finally tag and untag memory.
Implement the target hooks to emit MTE instructions if MEMTAG sanitizer
is in effect. Continue to use the default target hook if HWASAN is
being used. Following target hooks are implemented:
- TARGET_MEMTAG_INSERT_RANDOM_TAG
- TARGET_MEMTAG_ADD_TAG
- TARGET_MEMTAG_TAG_MEMORY
Apart from the target-specific hooks, set the following to values
defined by the Memory Tagging Extension (MTE) in aarch64:
- TARGET_MEMTAG_TAG_SIZE
- TARGET_MEMTAG_GRANULE_SIZE
How about using "bits"/"bytes" rather than "size", since the two hooks
use different units? Alternatively, we could follow the machine_mode
convention of using "size" for bytes and "bitsize" for bits.
Sure. I can add a patch that renames the usages of
TARGET_MEMTAG_TAG_SIZE (and the matching targ hooks ix86_memtag_tag_size
() and aarch64_memtag_tag_size ()) to use TARGET_MEMTAG_TAG_BITSIZE (and
_bitsize () in the targ hooks).
As noted earlier, TARGET_MEMTAG_TAG_MEMORY is a target-specific hook,
the _only_ use of which is done by the MEMTAG sanitizer. On aarch64,
TARGET_MEMTAG_TAG_MEMORY will emit MTE instructions to tag/untag memory
of a given size. TARGET_MEMTAG_TAG_MEMORY target hook implementation
may emit an actual loop to tag/untag memory when size of memory block is
an expression to be evaluated. Both aarch64_memtag_tag_memory () and
aarch64_memtag_tag_memory_via_loop () may generate stg or st2g,
depending on the number of iterations.
TBD:
- rtx generation in the target-hooks not tested well. WIP.
- See how AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD is defined and then
used to generate a loop to tag/untag memory. Is there a better way
to do this ?
gcc/ChangeLog:
* asan.cc (memtag_sanitize_p): New definition.
* asan.h (memtag_sanitize_p): New declaration.
* config/aarch64/aarch64.cc (AARCH64_MEMTAG_GRANULE_SIZE):
Define.
(AARCH64_MEMTAG_TAG_SIZE): Define.
(aarch64_can_tag_addresses): Add MEMTAG specific handling.
(aarch64_memtag_tag_size): Likewise.
(aarch64_memtag_granule_size): Likewise.
(aarch64_memtag_insert_random_tag): Generate irg insn.
(aarch64_memtag_add_tag): Generate addg/subg insn.
(AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD): Define.
(aarch64_memtag_tag_memory_via_loop): New definition.
(aarch64_memtag_tag_memory): Likewise.
(aarch64_gen_tag_memory_postindex): Likewise.
(TARGET_MEMTAG_TAG_SIZE): Define target-hook.
(TARGET_MEMTAG_GRANULE_SIZE): Likewise.
(TARGET_MEMTAG_INSERT_RANDOM_TAG): Likewise.
(TARGET_MEMTAG_ADD_TAG): Likewise.
(TARGET_MEMTAG_TAG_MEMORY): Likewise.
[...]
+/* Implement TARGET_MEMTAG_INSERT_RANDOM_TAG. */
+rtx
+aarch64_memtag_insert_random_tag (rtx untagged, rtx target)
+{
+ rtx ret;
+ if (memtag_sanitize_p ())
+ {
+ gcc_assert (param_memtag_instrument_stack ||
param_memtag_instrument_allocas);
It might be better to drop this assert. The hook should be prepared
to do what it's told without inquiring into the reason.
+ if (!target)
+ target = gen_reg_rtx (Pmode);
+
+ rtx insn = gen_irg (target, untagged, untagged);
+ emit_insn (insn);
Usually, this kind of hook would allow "untagged" to be any reasonable
operand, and similarly "target". They might not match the predicates
on irg.
So I think this should use the expand_insn machinery, with
create_output_operand for the result and create_input_operand
for the input. This will also avoid the need to call gen_reg_rtx
explicitly.
Same for the other uses of gen_* in the patch.
+
+ ret = XEXP (insn, 0);
+ }
+ else
+ ret = default_memtag_insert_random_tag (untagged, target);
+
+ return ret;
+}
+
+/* Implement TARGET_MEMTAG_ADD_TAG. */
+rtx
+aarch64_memtag_add_tag (rtx base, poly_int64 offset, uint8_t tag_offset)
+{
+ rtx offset_rtx, tag_offset_rtx;
+ rtx target, insn;
+ poly_int64 abs_offset;
+ enum rtx_code code;
+ bool neg_p;
+ rtx ret;
+
+ if (memtag_sanitize_p ())
+ {
+ target = gen_reg_rtx (Pmode);
+ tag_offset_rtx = gen_int_mode (tag_offset, DImode);
+ gcc_assert (aarch64_memtag_tag_offset (tag_offset_rtx, DImode));
+
+ neg_p = known_lt (offset, 0);
+ abs_offset = neg_p ? offset * (-1) : offset;
+ offset_rtx = gen_int_mode (abs_offset, DImode);
+
+ if (!aarch64_granule16_uimm6 (offset_rtx, DImode))
+ {
+ /* Emit addr arithmetic prior to addg/subg. */
+ code = neg_p ? MINUS : PLUS;
+ insn = expand_simple_binop (Pmode, code, base, offset_rtx,
+ target, true, OPTAB_LIB_WIDEN);
+ offset_rtx = const0_rtx;
+ }
+
+ /* Addr offset offset must be within bounds at this time. */
+ gcc_assert (aarch64_granule16_uimm6 (offset_rtx, DImode));
+
+ /* Even if tag_offset_rtx is CONST0_RTX, generate a subg/addg; this
+ provides better opportunity for combining instructions later. */
+ if (neg_p)
+ insn = gen_subg (target, base, offset_rtx, tag_offset_rtx);
+ else
+ insn = gen_addg (target, base, offset_rtx, tag_offset_rtx);
+ emit_insn (insn);
FWIW, this should hopefully be simpler with the combined addg/subg
pattern that we discussed earlier. It should be possible to keep
offset unnegated and always use expand_simple_binop with PLUS
rather than MINUS.
Yes it looks simpler now. Thanks.
+
+ ret = XEXP (insn, 0);
+ }
+ else
+ ret = default_memtag_add_tag (base, offset, tag_offset);
+
+ return ret;
+}
+
+/* FIXME - Whats a good threshold ? Is there another way to do this? */
+/* Threshold in number of granules beyond which an explicit loop for
+ tagging a memory block is emitted. */
+#define AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD 10
I wouldn't want to guess a better value :) But yeah, using a macro like
this is IMO the right choice until we find a need for something more
complex.
+
+static void
+aarch64_memtag_tag_memory_via_loop (rtx base, rtx size, rtx tagged_pointer);
+
+/* The default implementation of TARGET_MEMTAG_TAG_MEMORY. */
Not the default :)
+rtx
+aarch64_memtag_tag_memory (rtx base, rtx size, rtx tagged_pointer)
+{
+ rtx stg_rtx;
+ HOST_WIDE_INT factor;
+ HOST_WIDE_INT len, offset;
+ unsigned HOST_WIDE_INT granule_sz;
+ unsigned HOST_WIDE_INT iters;
+
+ granule_sz = (HOST_WIDE_INT) AARCH64_MEMTAG_GRANULE_SIZE;
+
+ /* FIXME check predicate on offset (from base) + size. */
+ if (CONST_INT_P (size) && aarch64_granule16_simm9 (size, DImode))
+ {
+ len = INTVAL (size);
+ /* The amount of memory to tag must be aligned to granule size by now.
*/
+ gcc_assert (abs_hwi (len) % granule_sz == 0);
+
+ factor = (known_le (len, 0)) ? -1 : 1;
+ iters = abs_hwi (len) / granule_sz;
+
+ offset = 0;
+
+ if (iters > AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD)
+ goto emit_loop_tag_memory;
+
+ /* gen_stg / gen_st2g expects a simple PLUS (reg, offset) as addr
operand. */
+ if (!REG_P (base))
+ {
+ rtx addr = simplify_gen_binary (PLUS, Pmode, base,
+ gen_int_mode (offset, Pmode));
+ if (!CONST_INT_P (XEXP (addr, 1)))
+ {
+ emit_insn (addr);
+ base = XEXP (addr, 0);
+ }
+ else
+ {
+ base = XEXP (addr, 0);
+ offset += INTVAL (XEXP (addr, 1));
+ }
+ }
+
+ while (iters)
+ {
+ if (iters / 2)
+ {
+ stg_rtx = gen_st2g (tagged_pointer, base,
+ gen_int_mode (offset, Pmode),
+ gen_int_mode (offset - 16, Pmode));
+ iters -= 2;
+ }
+ else
+ {
+ stg_rtx = gen_stg (tagged_pointer, base, gen_int_mode (offset,
Pmode));
+ iters -= 1;
+ }
+
+ emit_insn (stg_rtx);
+ offset = granule_sz * iters * factor;
+ }
+
+ return stg_rtx;
+ }
+
+emit_loop_tag_memory:
+ /* FIXME - stg_rtx to be returned. Update signature to return void / bool ?
*/
+ stg_rtx = NULL;
+ aarch64_memtag_tag_memory_via_loop (base, size, tagged_pointer);
+
+ return stg_rtx;
+}
I suppose this is personal preference, but this could be written:
if (rtx stg_rtx = aarch64_memtag_tag_memory_noloop (base, size,
tagged_pointer))
return stg_rtx;
return aarch64_memtag_tag_memory_via_loop (base, size, tagged_pointer);
(I'm not anti-goto though.)
Yes, I plan to refactor these functions for tagging memory towards the
end, when other items (machine description, generation of a variety of
st*g* instructions optimally) are settled.