Indu Bhagat <[email protected]> writes:
> MEMTAG sanitizer, which is based on the HWASAN sanitizer, will invoke
> the target-specific hooks to create a random tag, add tag to memory
> address, and finally tag and untag memory.
>
> Implement the target hooks to emit MTE instructions if MEMTAG sanitizer
> is in effect. Continue to use the default target hook if HWASAN is
> being used. Following target hooks are implemented:
> - TARGET_MEMTAG_INSERT_RANDOM_TAG
> - TARGET_MEMTAG_ADD_TAG
> - TARGET_MEMTAG_TAG_MEMORY
>
> Apart from the target-specific hooks, set the following to values
> defined by the Memory Tagging Extension (MTE) in aarch64:
> - TARGET_MEMTAG_TAG_SIZE
> - TARGET_MEMTAG_GRANULE_SIZE
How about using "bits"/"bytes" rather than "size", since the two hooks
use different units? Alternatively, we could follow the machine_mode
convention of using "size" for bytes and "bitsize" for bits.
> As noted earlier, TARGET_MEMTAG_TAG_MEMORY is a target-specific hook,
> the _only_ use of which is done by the MEMTAG sanitizer. On aarch64,
> TARGET_MEMTAG_TAG_MEMORY will emit MTE instructions to tag/untag memory
> of a given size. TARGET_MEMTAG_TAG_MEMORY target hook implementation
> may emit an actual loop to tag/untag memory when size of memory block is
> an expression to be evaluated. Both aarch64_memtag_tag_memory () and
> aarch64_memtag_tag_memory_via_loop () may generate stg or st2g,
> depending on the number of iterations.
>
> TBD:
> - rtx generation in the target-hooks not tested well. WIP.
> - See how AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD is defined and then
> used to generate a loop to tag/untag memory. Is there a better way
> to do this ?
>
> gcc/ChangeLog:
>
> * asan.cc (memtag_sanitize_p): New definition.
> * asan.h (memtag_sanitize_p): New declaration.
> * config/aarch64/aarch64.cc (AARCH64_MEMTAG_GRANULE_SIZE):
> Define.
> (AARCH64_MEMTAG_TAG_SIZE): Define.
> (aarch64_can_tag_addresses): Add MEMTAG specific handling.
> (aarch64_memtag_tag_size): Likewise.
> (aarch64_memtag_granule_size): Likewise.
> (aarch64_memtag_insert_random_tag): Generate irg insn.
> (aarch64_memtag_add_tag): Generate addg/subg insn.
> (AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD): Define.
> (aarch64_memtag_tag_memory_via_loop): New definition.
> (aarch64_memtag_tag_memory): Likewise.
> (aarch64_gen_tag_memory_postindex): Likewise.
> (TARGET_MEMTAG_TAG_SIZE): Define target-hook.
> (TARGET_MEMTAG_GRANULE_SIZE): Likewise.
> (TARGET_MEMTAG_INSERT_RANDOM_TAG): Likewise.
> (TARGET_MEMTAG_ADD_TAG): Likewise.
> (TARGET_MEMTAG_TAG_MEMORY): Likewise.
>
> [...]
> +/* Implement TARGET_MEMTAG_INSERT_RANDOM_TAG. */
> +rtx
> +aarch64_memtag_insert_random_tag (rtx untagged, rtx target)
> +{
> + rtx ret;
> + if (memtag_sanitize_p ())
> + {
> + gcc_assert (param_memtag_instrument_stack ||
> param_memtag_instrument_allocas);
It might be better to drop this assert. The hook should be prepared
to do what it's told without inquiring into the reason.
> + if (!target)
> + target = gen_reg_rtx (Pmode);
> +
> + rtx insn = gen_irg (target, untagged, untagged);
> + emit_insn (insn);
Usually, this kind of hook would allow "untagged" to be any reasonable
operand, and similarly "target". They might not match the predicates
on irg.
So I think this should use the expand_insn machinery, with
create_output_operand for the result and create_input_operand
for the input. This will also avoid the need to call gen_reg_rtx
explicitly.
Same for the other uses of gen_* in the patch.
> +
> + ret = XEXP (insn, 0);
> + }
> + else
> + ret = default_memtag_insert_random_tag (untagged, target);
> +
> + return ret;
> +}
> +
> +/* Implement TARGET_MEMTAG_ADD_TAG. */
> +rtx
> +aarch64_memtag_add_tag (rtx base, poly_int64 offset, uint8_t tag_offset)
> +{
> + rtx offset_rtx, tag_offset_rtx;
> + rtx target, insn;
> + poly_int64 abs_offset;
> + enum rtx_code code;
> + bool neg_p;
> + rtx ret;
> +
> + if (memtag_sanitize_p ())
> + {
> + target = gen_reg_rtx (Pmode);
> + tag_offset_rtx = gen_int_mode (tag_offset, DImode);
> + gcc_assert (aarch64_memtag_tag_offset (tag_offset_rtx, DImode));
> +
> + neg_p = known_lt (offset, 0);
> + abs_offset = neg_p ? offset * (-1) : offset;
> + offset_rtx = gen_int_mode (abs_offset, DImode);
> +
> + if (!aarch64_granule16_uimm6 (offset_rtx, DImode))
> + {
> + /* Emit addr arithmetic prior to addg/subg. */
> + code = neg_p ? MINUS : PLUS;
> + insn = expand_simple_binop (Pmode, code, base, offset_rtx,
> + target, true, OPTAB_LIB_WIDEN);
> + offset_rtx = const0_rtx;
> + }
> +
> + /* Addr offset offset must be within bounds at this time. */
> + gcc_assert (aarch64_granule16_uimm6 (offset_rtx, DImode));
> +
> + /* Even if tag_offset_rtx is CONST0_RTX, generate a subg/addg; this
> + provides better opportunity for combining instructions later. */
> + if (neg_p)
> + insn = gen_subg (target, base, offset_rtx, tag_offset_rtx);
> + else
> + insn = gen_addg (target, base, offset_rtx, tag_offset_rtx);
> + emit_insn (insn);
FWIW, this should hopefully be simpler with the combined addg/subg
pattern that we discussed earlier. It should be possible to keep
offset unnegated and always use expand_simple_binop with PLUS
rather than MINUS.
> +
> + ret = XEXP (insn, 0);
> + }
> + else
> + ret = default_memtag_add_tag (base, offset, tag_offset);
> +
> + return ret;
> +}
> +
> +/* FIXME - Whats a good threshold ? Is there another way to do this? */
> +/* Threshold in number of granules beyond which an explicit loop for
> + tagging a memory block is emitted. */
> +#define AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD 10
I wouldn't want to guess a better value :) But yeah, using a macro like
this is IMO the right choice until we find a need for something more
complex.
> +
> +static void
> +aarch64_memtag_tag_memory_via_loop (rtx base, rtx size, rtx tagged_pointer);
> +
> +/* The default implementation of TARGET_MEMTAG_TAG_MEMORY. */
Not the default :)
> +rtx
> +aarch64_memtag_tag_memory (rtx base, rtx size, rtx tagged_pointer)
> +{
> + rtx stg_rtx;
> + HOST_WIDE_INT factor;
> + HOST_WIDE_INT len, offset;
> + unsigned HOST_WIDE_INT granule_sz;
> + unsigned HOST_WIDE_INT iters;
> +
> + granule_sz = (HOST_WIDE_INT) AARCH64_MEMTAG_GRANULE_SIZE;
> +
> + /* FIXME check predicate on offset (from base) + size. */
> + if (CONST_INT_P (size) && aarch64_granule16_simm9 (size, DImode))
> + {
> + len = INTVAL (size);
> + /* The amount of memory to tag must be aligned to granule size by now.
> */
> + gcc_assert (abs_hwi (len) % granule_sz == 0);
> +
> + factor = (known_le (len, 0)) ? -1 : 1;
> + iters = abs_hwi (len) / granule_sz;
> +
> + offset = 0;
> +
> + if (iters > AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD)
> + goto emit_loop_tag_memory;
> +
> + /* gen_stg / gen_st2g expects a simple PLUS (reg, offset) as addr
> operand. */
> + if (!REG_P (base))
> + {
> + rtx addr = simplify_gen_binary (PLUS, Pmode, base,
> + gen_int_mode (offset, Pmode));
> + if (!CONST_INT_P (XEXP (addr, 1)))
> + {
> + emit_insn (addr);
> + base = XEXP (addr, 0);
> + }
> + else
> + {
> + base = XEXP (addr, 0);
> + offset += INTVAL (XEXP (addr, 1));
> + }
> + }
> +
> + while (iters)
> + {
> + if (iters / 2)
> + {
> + stg_rtx = gen_st2g (tagged_pointer, base,
> + gen_int_mode (offset, Pmode),
> + gen_int_mode (offset - 16, Pmode));
> + iters -= 2;
> + }
> + else
> + {
> + stg_rtx = gen_stg (tagged_pointer, base, gen_int_mode (offset,
> Pmode));
> + iters -= 1;
> + }
> +
> + emit_insn (stg_rtx);
> + offset = granule_sz * iters * factor;
> + }
> +
> + return stg_rtx;
> + }
> +
> +emit_loop_tag_memory:
> + /* FIXME - stg_rtx to be returned. Update signature to return void / bool
> ? */
> + stg_rtx = NULL;
> + aarch64_memtag_tag_memory_via_loop (base, size, tagged_pointer);
> +
> + return stg_rtx;
> +}
I suppose this is personal preference, but this could be written:
if (rtx stg_rtx = aarch64_memtag_tag_memory_noloop (base, size,
tagged_pointer))
return stg_rtx;
return aarch64_memtag_tag_memory_via_loop (base, size, tagged_pointer);
(I'm not anti-goto though.)
Thanks,
Richard