Re: [GCC16,RFC,V2 08/14] aarch64: memtag: implement target hooks

Richard Sandiford Thu, 01 May 2025 11:49:27 -0700

Indu Bhagat <indu.bha...@oracle.com> writes:
> MEMTAG sanitizer, which is based on the HWASAN sanitizer, will invoke
> the target-specific hooks to create a random tag, add tag to memory
> address, and finally tag and untag memory.
>
> Implement the target hooks to emit MTE instructions if MEMTAG sanitizer
> is in effect.  Continue to use the default target hook if HWASAN is
> being used.  Following target hooks are implemented:
>    - TARGET_MEMTAG_INSERT_RANDOM_TAG
>    - TARGET_MEMTAG_ADD_TAG
>    - TARGET_MEMTAG_TAG_MEMORY
>
> Apart from the target-specific hooks, set the following to values
> defined by the Memory Tagging Extension (MTE) in aarch64:
>    - TARGET_MEMTAG_TAG_SIZE
>    - TARGET_MEMTAG_GRANULE_SIZE


How about using "bits"/"bytes" rather than "size", since the two hooks
use different units?  Alternatively, we could follow the machine_mode
convention of using "size" for bytes and "bitsize" for bits.

> As noted earlier, TARGET_MEMTAG_TAG_MEMORY is a target-specific hook,
> the  _only_ use of which is done by the MEMTAG sanitizer.  On aarch64,
> TARGET_MEMTAG_TAG_MEMORY will emit MTE instructions to tag/untag memory
> of a given size.  TARGET_MEMTAG_TAG_MEMORY target hook implementation
> may emit an actual loop to tag/untag memory when size of memory block is
> an expression to be evaluated.  Both aarch64_memtag_tag_memory () and
> aarch64_memtag_tag_memory_via_loop () may generate stg or st2g,
> depending on the number of iterations.
>
> TBD:
> - rtx generation in the target-hooks not tested well.  WIP.
> - See how AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD is defined and then
> used to generate a loop to tag/untag memory.  Is there a better way
> to do this ?
>
> gcc/ChangeLog:
>
>       * asan.cc (memtag_sanitize_p): New definition.
>       * asan.h (memtag_sanitize_p): New declaration.
>       * config/aarch64/aarch64.cc (AARCH64_MEMTAG_GRANULE_SIZE):
>       Define.
>       (AARCH64_MEMTAG_TAG_SIZE): Define.
>       (aarch64_can_tag_addresses): Add MEMTAG specific handling.
>       (aarch64_memtag_tag_size): Likewise.
>       (aarch64_memtag_granule_size): Likewise.
>       (aarch64_memtag_insert_random_tag): Generate irg insn.
>       (aarch64_memtag_add_tag): Generate addg/subg insn.
>       (AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD): Define.
>       (aarch64_memtag_tag_memory_via_loop): New definition.
>       (aarch64_memtag_tag_memory): Likewise.
>         (aarch64_gen_tag_memory_postindex): Likewise.
>       (TARGET_MEMTAG_TAG_SIZE): Define target-hook.
>       (TARGET_MEMTAG_GRANULE_SIZE): Likewise.
>       (TARGET_MEMTAG_INSERT_RANDOM_TAG): Likewise.
>       (TARGET_MEMTAG_ADD_TAG): Likewise.
>       (TARGET_MEMTAG_TAG_MEMORY): Likewise.
>
> [...]
> +/* Implement TARGET_MEMTAG_INSERT_RANDOM_TAG.  */
> +rtx
> +aarch64_memtag_insert_random_tag (rtx untagged, rtx target)
> +{
> +  rtx ret;
> +  if (memtag_sanitize_p ())
> +    {
> +      gcc_assert (param_memtag_instrument_stack || 
> param_memtag_instrument_allocas);

It might be better to drop this assert.  The hook should be prepared
to do what it's told without inquiring into the reason.

> +      if (!target)
> +     target = gen_reg_rtx (Pmode);
> +
> +      rtx insn = gen_irg (target, untagged, untagged);
> +      emit_insn (insn);

Usually, this kind of hook would allow "untagged" to be any reasonable
operand, and similarly "target".  They might not match the predicates
on irg.

So I think this should use the expand_insn machinery, with
create_output_operand for the result and create_input_operand
for the input.  This will also avoid the need to call gen_reg_rtx
explicitly.

Same for the other uses of gen_* in the patch.

> +
> +      ret = XEXP (insn, 0);
> +    }
> +  else
> +    ret = default_memtag_insert_random_tag (untagged, target);
> +
> +  return ret;
> +}
> +
> +/* Implement TARGET_MEMTAG_ADD_TAG.  */
> +rtx
> +aarch64_memtag_add_tag (rtx base, poly_int64 offset, uint8_t tag_offset)
> +{
> +  rtx offset_rtx, tag_offset_rtx;
> +  rtx target, insn;
> +  poly_int64 abs_offset;
> +  enum rtx_code code;
> +  bool neg_p;
> +  rtx ret;
> +
> +  if (memtag_sanitize_p ())
> +    {
> +      target = gen_reg_rtx (Pmode);
> +      tag_offset_rtx = gen_int_mode (tag_offset, DImode);
> +      gcc_assert (aarch64_memtag_tag_offset (tag_offset_rtx, DImode));
> +
> +      neg_p = known_lt (offset, 0);
> +      abs_offset = neg_p ? offset * (-1) : offset;
> +      offset_rtx = gen_int_mode (abs_offset, DImode);
> +
> +      if (!aarch64_granule16_uimm6 (offset_rtx, DImode))
> +     {
> +       /* Emit addr arithmetic prior to addg/subg.  */
> +       code = neg_p ? MINUS : PLUS;
> +       insn = expand_simple_binop (Pmode, code, base, offset_rtx,
> +                                   target, true, OPTAB_LIB_WIDEN);
> +       offset_rtx = const0_rtx;
> +     }
> +
> +      /* Addr offset offset must be within bounds at this time.  */
> +      gcc_assert (aarch64_granule16_uimm6 (offset_rtx, DImode));
> +
> +      /* Even if tag_offset_rtx is CONST0_RTX, generate a subg/addg;  this
> +      provides better opportunity for combining instructions later.  */
> +      if (neg_p)
> +     insn = gen_subg (target, base, offset_rtx, tag_offset_rtx);
> +      else
> +     insn = gen_addg (target, base, offset_rtx, tag_offset_rtx);
> +      emit_insn (insn);

FWIW, this should hopefully be simpler with the combined addg/subg
pattern that we discussed earlier.  It should be possible to keep
offset unnegated and always use expand_simple_binop with PLUS
rather than MINUS.

> +
> +      ret = XEXP (insn, 0);
> +    }
> +  else
> +    ret = default_memtag_add_tag (base, offset, tag_offset);
> +
> +  return ret;
> +}
> +
> +/* FIXME - Whats a good threshold ? Is there another way to do this?  */
> +/* Threshold in number of granules beyond which an explicit loop for
> +   tagging a memory block is emitted.  */
> +#define AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD 10

I wouldn't want to guess a better value :)  But yeah, using a macro like
this is IMO the right choice until we find a need for something more
complex.

> +
> +static void
> +aarch64_memtag_tag_memory_via_loop (rtx base, rtx size, rtx tagged_pointer);
> +
> +/* The default implementation of TARGET_MEMTAG_TAG_MEMORY.  */

Not the default :)

> +rtx
> +aarch64_memtag_tag_memory (rtx base, rtx size, rtx tagged_pointer)
> +{
> +  rtx stg_rtx;
> +  HOST_WIDE_INT factor;
> +  HOST_WIDE_INT len, offset;
> +  unsigned HOST_WIDE_INT granule_sz;
> +  unsigned HOST_WIDE_INT iters;
> +
> +  granule_sz = (HOST_WIDE_INT) AARCH64_MEMTAG_GRANULE_SIZE;
> +
> +  /* FIXME check predicate on offset (from base) + size.  */
> +  if (CONST_INT_P (size) && aarch64_granule16_simm9 (size, DImode))
> +    {
> +      len = INTVAL (size);
> +      /* The amount of memory to tag must be aligned to granule size by now. 
>  */
> +      gcc_assert (abs_hwi (len) % granule_sz == 0);
> +
> +      factor = (known_le (len, 0)) ? -1 : 1;
> +      iters = abs_hwi (len) / granule_sz;
> +
> +      offset = 0;
> +
> +      if (iters > AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD)
> +     goto emit_loop_tag_memory;
> +
> +      /* gen_stg / gen_st2g expects a simple PLUS (reg, offset) as addr 
> operand.  */
> +      if (!REG_P (base))
> +     {
> +       rtx addr = simplify_gen_binary (PLUS, Pmode, base,
> +                                       gen_int_mode (offset, Pmode));
> +       if (!CONST_INT_P (XEXP (addr, 1)))
> +         {
> +           emit_insn (addr);
> +           base = XEXP (addr, 0);
> +         }
> +       else
> +         {
> +           base = XEXP (addr, 0);
> +           offset += INTVAL (XEXP (addr, 1));
> +         }
> +     }
> +
> +      while (iters)
> +     {
> +       if (iters / 2)
> +         {
> +           stg_rtx = gen_st2g (tagged_pointer, base,
> +                               gen_int_mode (offset, Pmode),
> +                               gen_int_mode (offset - 16, Pmode));
> +           iters -= 2;
> +         }
> +       else
> +         {
> +           stg_rtx = gen_stg (tagged_pointer, base, gen_int_mode (offset, 
> Pmode));
> +           iters -= 1;
> +         }
> +
> +       emit_insn (stg_rtx);
> +       offset = granule_sz * iters * factor;
> +     }
> +
> +      return stg_rtx;
> +    }
> +
> +emit_loop_tag_memory:
> +  /* FIXME - stg_rtx to be returned. Update signature to return void / bool 
> ?  */
> +  stg_rtx = NULL;
> +  aarch64_memtag_tag_memory_via_loop (base, size, tagged_pointer);
> +
> +  return stg_rtx;
> +}

I suppose this is personal preference, but this could be written:

  if (rtx stg_rtx = aarch64_memtag_tag_memory_noloop (base, size,
                                                      tagged_pointer))
    return stg_rtx;

  return aarch64_memtag_tag_memory_via_loop (base, size, tagged_pointer);

(I'm not anti-goto though.)

Thanks,
Richard

Re: [GCC16,RFC,V2 08/14] aarch64: memtag: implement target hooks

Reply via email to