Indu Bhagat <indu.bha...@oracle.com> writes: > MEMTAG sanitizer, which is based on the HWASAN sanitizer, will invoke > the target-specific hooks to create a random tag, add tag to memory > address, and finally tag and untag memory. > > Implement the target hooks to emit MTE instructions if MEMTAG sanitizer > is in effect. Continue to use the default target hook if HWASAN is > being used. Following target hooks are implemented: > - TARGET_MEMTAG_INSERT_RANDOM_TAG > - TARGET_MEMTAG_ADD_TAG > - TARGET_MEMTAG_TAG_MEMORY > > Apart from the target-specific hooks, set the following to values > defined by the Memory Tagging Extension (MTE) in aarch64: > - TARGET_MEMTAG_TAG_SIZE > - TARGET_MEMTAG_GRANULE_SIZE
How about using "bits"/"bytes" rather than "size", since the two hooks use different units? Alternatively, we could follow the machine_mode convention of using "size" for bytes and "bitsize" for bits. > As noted earlier, TARGET_MEMTAG_TAG_MEMORY is a target-specific hook, > the _only_ use of which is done by the MEMTAG sanitizer. On aarch64, > TARGET_MEMTAG_TAG_MEMORY will emit MTE instructions to tag/untag memory > of a given size. TARGET_MEMTAG_TAG_MEMORY target hook implementation > may emit an actual loop to tag/untag memory when size of memory block is > an expression to be evaluated. Both aarch64_memtag_tag_memory () and > aarch64_memtag_tag_memory_via_loop () may generate stg or st2g, > depending on the number of iterations. > > TBD: > - rtx generation in the target-hooks not tested well. WIP. > - See how AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD is defined and then > used to generate a loop to tag/untag memory. Is there a better way > to do this ? > > gcc/ChangeLog: > > * asan.cc (memtag_sanitize_p): New definition. > * asan.h (memtag_sanitize_p): New declaration. > * config/aarch64/aarch64.cc (AARCH64_MEMTAG_GRANULE_SIZE): > Define. > (AARCH64_MEMTAG_TAG_SIZE): Define. > (aarch64_can_tag_addresses): Add MEMTAG specific handling. > (aarch64_memtag_tag_size): Likewise. > (aarch64_memtag_granule_size): Likewise. > (aarch64_memtag_insert_random_tag): Generate irg insn. > (aarch64_memtag_add_tag): Generate addg/subg insn. > (AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD): Define. > (aarch64_memtag_tag_memory_via_loop): New definition. > (aarch64_memtag_tag_memory): Likewise. > (aarch64_gen_tag_memory_postindex): Likewise. > (TARGET_MEMTAG_TAG_SIZE): Define target-hook. > (TARGET_MEMTAG_GRANULE_SIZE): Likewise. > (TARGET_MEMTAG_INSERT_RANDOM_TAG): Likewise. > (TARGET_MEMTAG_ADD_TAG): Likewise. > (TARGET_MEMTAG_TAG_MEMORY): Likewise. > > [...] > +/* Implement TARGET_MEMTAG_INSERT_RANDOM_TAG. */ > +rtx > +aarch64_memtag_insert_random_tag (rtx untagged, rtx target) > +{ > + rtx ret; > + if (memtag_sanitize_p ()) > + { > + gcc_assert (param_memtag_instrument_stack || > param_memtag_instrument_allocas); It might be better to drop this assert. The hook should be prepared to do what it's told without inquiring into the reason. > + if (!target) > + target = gen_reg_rtx (Pmode); > + > + rtx insn = gen_irg (target, untagged, untagged); > + emit_insn (insn); Usually, this kind of hook would allow "untagged" to be any reasonable operand, and similarly "target". They might not match the predicates on irg. So I think this should use the expand_insn machinery, with create_output_operand for the result and create_input_operand for the input. This will also avoid the need to call gen_reg_rtx explicitly. Same for the other uses of gen_* in the patch. > + > + ret = XEXP (insn, 0); > + } > + else > + ret = default_memtag_insert_random_tag (untagged, target); > + > + return ret; > +} > + > +/* Implement TARGET_MEMTAG_ADD_TAG. */ > +rtx > +aarch64_memtag_add_tag (rtx base, poly_int64 offset, uint8_t tag_offset) > +{ > + rtx offset_rtx, tag_offset_rtx; > + rtx target, insn; > + poly_int64 abs_offset; > + enum rtx_code code; > + bool neg_p; > + rtx ret; > + > + if (memtag_sanitize_p ()) > + { > + target = gen_reg_rtx (Pmode); > + tag_offset_rtx = gen_int_mode (tag_offset, DImode); > + gcc_assert (aarch64_memtag_tag_offset (tag_offset_rtx, DImode)); > + > + neg_p = known_lt (offset, 0); > + abs_offset = neg_p ? offset * (-1) : offset; > + offset_rtx = gen_int_mode (abs_offset, DImode); > + > + if (!aarch64_granule16_uimm6 (offset_rtx, DImode)) > + { > + /* Emit addr arithmetic prior to addg/subg. */ > + code = neg_p ? MINUS : PLUS; > + insn = expand_simple_binop (Pmode, code, base, offset_rtx, > + target, true, OPTAB_LIB_WIDEN); > + offset_rtx = const0_rtx; > + } > + > + /* Addr offset offset must be within bounds at this time. */ > + gcc_assert (aarch64_granule16_uimm6 (offset_rtx, DImode)); > + > + /* Even if tag_offset_rtx is CONST0_RTX, generate a subg/addg; this > + provides better opportunity for combining instructions later. */ > + if (neg_p) > + insn = gen_subg (target, base, offset_rtx, tag_offset_rtx); > + else > + insn = gen_addg (target, base, offset_rtx, tag_offset_rtx); > + emit_insn (insn); FWIW, this should hopefully be simpler with the combined addg/subg pattern that we discussed earlier. It should be possible to keep offset unnegated and always use expand_simple_binop with PLUS rather than MINUS. > + > + ret = XEXP (insn, 0); > + } > + else > + ret = default_memtag_add_tag (base, offset, tag_offset); > + > + return ret; > +} > + > +/* FIXME - Whats a good threshold ? Is there another way to do this? */ > +/* Threshold in number of granules beyond which an explicit loop for > + tagging a memory block is emitted. */ > +#define AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD 10 I wouldn't want to guess a better value :) But yeah, using a macro like this is IMO the right choice until we find a need for something more complex. > + > +static void > +aarch64_memtag_tag_memory_via_loop (rtx base, rtx size, rtx tagged_pointer); > + > +/* The default implementation of TARGET_MEMTAG_TAG_MEMORY. */ Not the default :) > +rtx > +aarch64_memtag_tag_memory (rtx base, rtx size, rtx tagged_pointer) > +{ > + rtx stg_rtx; > + HOST_WIDE_INT factor; > + HOST_WIDE_INT len, offset; > + unsigned HOST_WIDE_INT granule_sz; > + unsigned HOST_WIDE_INT iters; > + > + granule_sz = (HOST_WIDE_INT) AARCH64_MEMTAG_GRANULE_SIZE; > + > + /* FIXME check predicate on offset (from base) + size. */ > + if (CONST_INT_P (size) && aarch64_granule16_simm9 (size, DImode)) > + { > + len = INTVAL (size); > + /* The amount of memory to tag must be aligned to granule size by now. > */ > + gcc_assert (abs_hwi (len) % granule_sz == 0); > + > + factor = (known_le (len, 0)) ? -1 : 1; > + iters = abs_hwi (len) / granule_sz; > + > + offset = 0; > + > + if (iters > AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD) > + goto emit_loop_tag_memory; > + > + /* gen_stg / gen_st2g expects a simple PLUS (reg, offset) as addr > operand. */ > + if (!REG_P (base)) > + { > + rtx addr = simplify_gen_binary (PLUS, Pmode, base, > + gen_int_mode (offset, Pmode)); > + if (!CONST_INT_P (XEXP (addr, 1))) > + { > + emit_insn (addr); > + base = XEXP (addr, 0); > + } > + else > + { > + base = XEXP (addr, 0); > + offset += INTVAL (XEXP (addr, 1)); > + } > + } > + > + while (iters) > + { > + if (iters / 2) > + { > + stg_rtx = gen_st2g (tagged_pointer, base, > + gen_int_mode (offset, Pmode), > + gen_int_mode (offset - 16, Pmode)); > + iters -= 2; > + } > + else > + { > + stg_rtx = gen_stg (tagged_pointer, base, gen_int_mode (offset, > Pmode)); > + iters -= 1; > + } > + > + emit_insn (stg_rtx); > + offset = granule_sz * iters * factor; > + } > + > + return stg_rtx; > + } > + > +emit_loop_tag_memory: > + /* FIXME - stg_rtx to be returned. Update signature to return void / bool > ? */ > + stg_rtx = NULL; > + aarch64_memtag_tag_memory_via_loop (base, size, tagged_pointer); > + > + return stg_rtx; > +} I suppose this is personal preference, but this could be written: if (rtx stg_rtx = aarch64_memtag_tag_memory_noloop (base, size, tagged_pointer)) return stg_rtx; return aarch64_memtag_tag_memory_via_loop (base, size, tagged_pointer); (I'm not anti-goto though.) Thanks, Richard