Hi Claudiu,
> -----Original Message-----
> From: [email protected] <claudiu.zissulescu-
> [email protected]>
> Sent: 09 December 2025 10:58
> To: [email protected]
> Cc: [email protected]; [email protected]; Tamar Christina
> <[email protected]>; Wilco Dijkstra <[email protected]>
> Subject: [PATCH 1/2] aarch64: Add support for memetag-stack sanitizer using
> MTE insns
>
> From: Claudiu Zissulescu <[email protected]>
>
> MEMTAG sanitizer, which is based on the HWASAN sanitizer, will invoke
> the target-specific hooks to create a random tag, add tag to memory
> address, and finally tag and untag memory.
>
> Implement the target hooks to emit MTE instructions if MEMTAG sanitizer
> is in effect. Continue to use the default target hook if HWASAN is
> being used. Following target hooks are implemented:
> - TARGET_MEMTAG_INSERT_RANDOM_TAG
> - TARGET_MEMTAG_ADD_TAG
> - TARGET_MEMTAG_EXTRACT_TAG
>
> Apart from the target-specific hooks, set the following to values
> defined by the Memory Tagging Extension (MTE) in aarch64:
> - TARGET_MEMTAG_TAG_BITSIZE
> - TARGET_MEMTAG_GRANULE_SIZE
>
> The next instructions were (re-)defined:
> - addg/subg (used by TARGET_MEMTAG_ADD_TAG and
> TARGET_MEMTAG_COMPOSE_OFFSET_TAG hooks)
> - stg/st2g Used to tag/untag a memory granule.
> - tag_memory A target specific instruction, it will will emit MTE
> instructions to tag/untag memory of a given size.
> - compose_tag A target specific instruction that computes a tagged
> address as an offset from a base (tagged) address.
> - gmi Used for randomizing the inserting tag.
> - irg Likewise.
>
> gcc/
>
> * config/aarch64/aarch64.md (addg): Update pattern to use
> addg/subg instructions.
> (stg): Update pattern.
> (st2g): New pattern.
> (tag_memory): Likewise.
> (compose_tag): Likewise.
> (irq): Update pattern to accept xzr register.
> (gmi): Likewise.
> (UNSPECV_TAG_SPACE): Define.
> * config/aarch64/aarch64.cc (AARCH64_MEMTAG_GRANULE_SIZE):
> Define.
> (AARCH64_MEMTAG_TAG_BITSIZE): Likewise.
> (aarch64_override_options_internal): Error out if MTE instructions
> are not available.
> (aarch64_post_cfi_startproc): Emit .cfi_mte_tagged_frame.
> (aarch64_can_tag_addresses): Add MEMTAG specific handling.
> (aarch64_memtag_tag_bitsize): New function
> (aarch64_memtag_granule_size): Likewise.
> (aarch64_memtag_insert_random_tag): Likwise.
> (aarch64_memtag_add_tag): Likewise.
> (aarch64_memtag_extract_tag): Likewise.
> (aarch64_granule16_memory_address_p): Likewise.
> (aarch64_emit_stxg_insn): Likewise.
> (aarch64_memtag_tag_memory_via_loop): New definition.
> (aarch64_expand_tag_memory): Likewise.
> (aarch64_check_memtag_ops): Likewise.
> (TARGET_MEMTAG_TAG_BITSIZE): Likewise.
> (TARGET_MEMTAG_GRANULE_SIZE): Likewise.
> (TARGET_MEMTAG_INSERT_RANDOM_TAG): Likewise.
> (TARGET_MEMTAG_ADD_TAG): Likewise.
> (TARGET_MEMTAG_EXTRACT_TAG): Likewise.
> * config/aarch64/aarch64-builtins.cc
> (aarch64_expand_builtin_memtag): Update set tag builtin logic.
> * config/aarch64/aarch64-linux.h: Pass memtag-stack sanitizer
> specific options to the linker.
> * config/aarch64/aarch64-protos.h
> (aarch64_granule16_memory_address_p): New prototype.
> (aarch64_check_memtag_ops): Likewise.
> (aarch64_expand_tag_memory): Likewise.
> * config/aarch64/constraints.md (Umg): New memory constraint.
> (Uag): New constraint.
> (Ung): Likewise.
> * config/aarch64/predicates.md (aarch64_memtag_tag_offset):
> Refactor it.
> (aarch64_granule16_imm6): Rename from
> aarch64_granule16_uimm6 and
> refactor it.
> (aarch64_granule16_memory_operand): New constraint.
> * config/aarch64/iterators.md (MTE_PP): New code iterator to be
> used for mte instructions.
> (stg_ops): New code attributes.
> (st2g_ops): Likewise.
> (mte_name): Likewise.
> * config/aarch64/aarch64.opt (aarch64-tag-memory-loop-
> threshold):
> New parameter.
>
> doc/
> * invoke.texi: Update documentation.
>
> gcc/testsuite:
>
> * gcc.target/aarch64/acle/memtag_1.c: Update test.
>
> Co-authored-by: Indu Bhagat <[email protected]>
> Signed-off-by: Claudiu Zissulescu <[email protected]>
> ---
> gcc/config/aarch64/aarch64-builtins.cc | 7 +-
> gcc/config/aarch64/aarch64-linux.h | 4 +-
> gcc/config/aarch64/aarch64-protos.h | 3 +
> gcc/config/aarch64/aarch64.cc | 322 +++++++++++++++++-
> gcc/config/aarch64/aarch64.md | 127 +++++--
> gcc/config/aarch64/aarch64.opt | 5 +
> gcc/config/aarch64/constraints.md | 21 ++
> gcc/config/aarch64/iterators.md | 20 ++
> gcc/config/aarch64/predicates.md | 13 +-
> gcc/doc/invoke.texi | 11 +-
> .../gcc.target/aarch64/acle/memtag_1.c | 4 +-
> 11 files changed, 493 insertions(+), 44 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 408099a50e8..31431693cf2 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -3680,8 +3680,11 @@ aarch64_expand_builtin_memtag (int fcode, tree
> exp, rtx target)
> pat = GEN_FCN (icode) (target, op0, const0_rtx);
> break;
> case AARCH64_MEMTAG_BUILTIN_SET_TAG:
> - pat = GEN_FCN (icode) (op0, op0, const0_rtx);
> - break;
> + {
> + rtx mem = gen_rtx_MEM (TImode, op0);
> + pat = GEN_FCN (icode) (mem, op0);
> + break;
> + }
> default:
> gcc_unreachable();
> }
> diff --git a/gcc/config/aarch64/aarch64-linux.h
> b/gcc/config/aarch64/aarch64-linux.h
> index 116bb4e69f3..4fa78e0b2f5 100644
> --- a/gcc/config/aarch64/aarch64-linux.h
> +++ b/gcc/config/aarch64/aarch64-linux.h
> @@ -48,7 +48,9 @@
> %{static-pie:-Bstatic -pie --no-dynamic-linker -z text} \
> -X \
> %{mbig-endian:-EB} %{mlittle-endian:-EL} \
> - -maarch64linux%{mabi=ilp32:32}%{mbig-endian:b}"
> + -maarch64linux%{mabi=ilp32:32}%{mbig-endian:b} \
> + %{%:sanitize(memtag-stack):%{!fsanitize-memtag-mode:-z memtag-stack -
> z memtag-mode=sync}} \
> + %{%:sanitize(memtag-stack):%{fsanitize-memtag-mode=*:-z memtag-stack
> -z memtag-mode=%}}"
>
>
> #define LINK_SPEC LINUX_TARGET_LINK_SPEC AARCH64_ERRATA_LINK_SPEC
> diff --git a/gcc/config/aarch64/aarch64-protos.h
> b/gcc/config/aarch64/aarch64-protos.h
> index a9e407ba340..a316e6af4aa 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1127,6 +1127,9 @@ void aarch64_expand_sve_vec_cmp_float (rtx,
> rtx_code, rtx, rtx);
>
> bool aarch64_prepare_sve_int_fma (rtx *, rtx_code);
> bool aarch64_prepare_sve_cond_int_fma (rtx *, rtx_code);
> +
> +bool aarch64_granule16_memory_address_p (rtx mem);
> +void aarch64_expand_tag_memory (rtx, rtx, rtx);
> #endif /* RTX_CODE */
>
> bool aarch64_process_target_attr (tree);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 9d2c3431ad3..82005a97380 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -19108,6 +19108,10 @@ aarch64_override_options_internal (struct
> gcc_options *opts)
> #endif
> }
>
> + if (flag_sanitize & SANITIZE_MEMTAG_STACK && !TARGET_MEMTAG)
> + error ("%<-fsanitize=memtag-stack%> requires the ISA extension %qs",
> + "memtag");
> +
> aarch64_feature_flags isa_flags = aarch64_get_isa_flags (opts);
> if ((isa_flags & (AARCH64_FL_SM_ON | AARCH64_FL_ZA_ON))
> && !(isa_flags & AARCH64_FL_SME))
> @@ -25679,6 +25683,19 @@ aarch64_asm_output_external (FILE *stream,
> tree decl, const char* name)
> aarch64_asm_output_variant_pcs (stream, decl, name);
> }
>
> +/* Implement TARGET_MEMTAG_CAN_TAG_ADDRESSES. Here we tell the
> rest of the
> + compiler that we automatically ignore the top byte of our pointers, which
> + allows using -fsanitize=hwaddress. In case of -fsanitize=memtag, we
> + additionally ensure that target supports MEMTAG insns. */
> +
> +bool
> +aarch64_can_tag_addresses ()
> +{
> + if (memtag_sanitize_p ())
> + return !TARGET_ILP32 && TARGET_MEMTAG;
> + return !TARGET_ILP32;
> +}
> +
> /* Triggered after a .cfi_startproc directive is emitted into the assembly
> file.
> Used to output the .cfi_b_key_frame directive when signing the current
> function with the B key. */
> @@ -25689,6 +25706,10 @@ aarch64_post_cfi_startproc (FILE *f, tree
> ignored ATTRIBUTE_UNUSED)
> if (cfun->machine->frame.laid_out &&
> aarch64_return_address_signing_enabled ()
> && aarch64_ra_sign_key == AARCH64_KEY_B)
> asm_fprintf (f, "\t.cfi_b_key_frame\n");
> + if (cfun->machine->frame.laid_out && aarch64_can_tag_addresses ()
NIT: can you move the && to the next line?
Patch is OK with that change.
Thanks for working on this!
Tamar
> + && memtag_sanitize_p ()
> + && !known_eq (cfun->machine->frame.frame_size, 0))
> + asm_fprintf (f, "\t.cfi_mte_tagged_frame\n");
> }
>
> /* Implements TARGET_ASM_FILE_START. Output the assembly header. */
> @@ -30365,13 +30386,289 @@ aarch64_invalid_binary_op (int op
> ATTRIBUTE_UNUSED, const_tree type1,
> return NULL;
> }
>
> -/* Implement TARGET_MEMTAG_CAN_TAG_ADDRESSES. Here we tell the
> rest of the
> - compiler that we automatically ignore the top byte of our pointers, which
> - allows using -fsanitize=hwaddress. */
> +#define AARCH64_MEMTAG_GRANULE_SIZE 16
> +#define AARCH64_MEMTAG_TAG_BITSIZE 4
> +
> +/* Implement TARGET_MEMTAG_TAG_BITSIZE. */
> +unsigned char
> +aarch64_memtag_tag_bitsize ()
> +{
> + if (memtag_sanitize_p ())
> + return AARCH64_MEMTAG_TAG_BITSIZE;
> + return default_memtag_tag_bitsize ();
> +}
> +
> +/* Implement TARGET_MEMTAG_GRANULE_SIZE. */
> +unsigned char
> +aarch64_memtag_granule_size ()
> +{
> + if (memtag_sanitize_p ())
> + return AARCH64_MEMTAG_GRANULE_SIZE;
> + return default_memtag_granule_size ();
> +}
> +
> +/* Implement TARGET_MEMTAG_INSERT_RANDOM_TAG. In the case of MTE
> instructions,
> + make sure the gmi and irg instructions are generated when
> + -fsanitize=memtag-stack is used. The first argument UNTAGGED can be a
> + tagged pointer, and its tag is used in the exclusion set. Thus, the
> TARGET
> + doesn't use the same tag. */
> +rtx
> +aarch64_memtag_insert_random_tag (rtx untagged, rtx target)
> +{
> + if (memtag_sanitize_p ())
> + {
> + insn_code icode = CODE_FOR_gmi;
> + expand_operand ops_gmi[3];
> + rtx tmp = gen_reg_rtx (Pmode);
> + create_output_operand (&ops_gmi[0], tmp, Pmode);
> + create_input_operand (&ops_gmi[1], untagged, Pmode);
> + create_integer_operand (&ops_gmi[2], 0);
> + expand_insn (icode, 3, ops_gmi);
> +
> + icode = CODE_FOR_irg;
> + expand_operand ops_irg[3];
> + create_output_operand (&ops_irg[0], target, Pmode);
> + create_input_operand (&ops_irg[1], untagged, Pmode);
> + create_input_operand (&ops_irg[2], ops_gmi[0].value, Pmode);
> + expand_insn (icode, 3, ops_irg);
> + return ops_irg[0].value;
> + }
> + else
> + return default_memtag_insert_random_tag (untagged, target);
> +}
> +
> +/* Implement TARGET_MEMTAG_ADD_TAG. For memtag sanitizer, emit
> addg/subg
> + instructions, otherwise fall back on the default implementation. */
> +rtx
> +aarch64_memtag_add_tag (rtx base, poly_int64 offset, uint8_t tag_offset)
> +{
> + if (memtag_sanitize_p ())
> + {
> + rtx target = NULL;
> + poly_int64 addr_offset = offset;
> + rtx offset_rtx = gen_int_mode (addr_offset, DImode);
> +
> + if (!aarch64_granule16_imm6 (offset_rtx, DImode))
> + {
> + /* Emit addr arithmetic prior to addg/subg. */
> + base = expand_simple_binop (Pmode, PLUS, base, offset_rtx,
> + NULL, true, OPTAB_LIB_WIDEN);
> + addr_offset = 0;
> + }
> +
> + insn_code icode = CODE_FOR_addg;
> + expand_operand ops[4];
> + create_output_operand (&ops[0], target, DImode);
> + create_input_operand (&ops[1], base, DImode);
> + create_integer_operand (&ops[2], addr_offset);
> + create_integer_operand (&ops[3], tag_offset);
> + /* Addr offset and tag offset must be within bounds at this time. */
> + gcc_assert (aarch64_memtag_tag_offset (ops[3].value, DImode));
> +
> + expand_insn (icode, 4, ops);
> + return ops[0].value;
> + }
> + else
> + return default_memtag_add_tag (base, offset, tag_offset);
> +}
> +
> +/* Implement TARGET_MEMTAG_EXTRACT_TAG. In the case of memtag
> sanitizer, MTE
> + instructions allows us to work with tag-address tuple, thus no need to
> + extract the tag, emit a simple move. */
> +rtx
> +aarch64_memtag_extract_tag (rtx tagged_pointer, rtx target)
> +{
> +
> + if (memtag_sanitize_p ())
> + {
> + rtx ret = gen_reg_rtx (DImode);
> + emit_move_insn (ret, gen_lowpart (DImode, tagged_pointer));
> + return ret;
> + }
> + else
> + return default_memtag_extract_tag (tagged_pointer, target);
> +}
> +
> +/* Return TRUE if x is a valid memory address form for memtag loads and
> + stores. */
> bool
> -aarch64_can_tag_addresses ()
> +aarch64_granule16_memory_address_p (rtx x)
> {
> - return !TARGET_ILP32;
> + struct aarch64_address_info addr;
> +
> + if (!MEM_P (x)
> + || !aarch64_classify_address (&addr, XEXP (x, 0), GET_MODE (x), false))
> + return false;
> +
> + /* Check that the offset, if any, is encodable as 9-bit immediate. */
> + switch (addr.type)
> + {
> + case ADDRESS_REG_IMM:
> + return aarch64_granule16_simm9 (gen_int_mode (addr.const_offset,
> DImode),
> + DImode);
> +
> + case ADDRESS_REG_REG:
> + return addr.shift == 0;
> +
> + default:
> + break;
> + }
> + return false;
> +}
> +
> +/* Helper to emit either stg or st2g instruction. */
> +static void
> +aarch64_emit_stxg_insn (machine_mode mode, rtx nxt, rtx addr, rtx tagp)
> +{
> + rtx pat;
> + rtx mem_addr = gen_rtx_MEM (mode, nxt);
> + rtvec vec = gen_rtvec (2, gen_rtx_MEM (mode, addr), tagp);
> + rtx unspec = gen_rtx_UNSPEC_VOLATILE (mode, vec,
> UNSPECV_TAG_SPACE);
> +
> + if (!rtx_equal_p (nxt, addr))
> + {
> + rtx tmp = gen_rtx_CLOBBER (VOIDmode, addr);
> + rtvec parv = gen_rtvec (2, gen_rtx_SET (mem_addr, unspec), tmp);
> + pat = gen_rtx_PARALLEL (VOIDmode, parv);
> + }
> + else
> + {
> + pat = gen_rtx_SET (mem_addr, unspec);
> + }
> + emit_insn (pat);
> +}
> +
> +/* Tag the memory via an explicit loop. This is used when tag_memory
> expand
> + is invoked for:
> + - non-constant size, or
> + - constant but not encodable size (!aarch64_granule16_simm9 ()), or
> + - constant and encodable size (aarch64_granule16_simm9 ()), but over
> the
> + unroll threshold (aarch64_tag_memory_loop_threshold). */
> +
> +static void
> +aarch64_tag_memory_via_loop (rtx base, rtx size, rtx tagged_pointer)
> +{
> + rtx_code_label *top_label, *bottom_label;
> + machine_mode iter_mode;
> + rtx next;
> +
> + iter_mode = GET_MODE (size);
> + if (iter_mode == VOIDmode)
> + iter_mode = word_mode;
> +
> + /* Prepare the addr operand for tagging memory. */
> + rtx addr_reg = gen_reg_rtx (Pmode);
> + emit_move_insn (addr_reg, base);
> +
> + rtx size_reg = gen_reg_rtx (iter_mode);
> + emit_move_insn (size_reg, size);
> +
> + /*
> + tbz size, 4, label1
> + stg tag,[addr], #16
> + label1:
> + */
> + auto *label1 = gen_label_rtx ();
> + auto branch = aarch64_gen_test_and_branch (EQ, size_reg, 4, label1);
> + auto jump = emit_jump_insn (branch);
> + JUMP_LABEL (jump) = label1;
> +
> + next = gen_rtx_POST_INC (Pmode, addr_reg);
> + aarch64_emit_stxg_insn (TImode, next, addr_reg, tagged_pointer);
> +
> + emit_label (label1);
> +
> + /*
> + asr iter, size, 5
> + cbz iter, label2
> + */
> + rtx iter = gen_reg_rtx (iter_mode);
> + emit_insn (gen_rtx_SET (iter,
> + gen_rtx_ASHIFTRT (iter_mode, size_reg, GEN_INT
> (5))));
> + bottom_label = gen_label_rtx ();
> + branch = aarch64_gen_compare_zero_and_branch (EQ, iter, bottom_label);
> + aarch64_emit_unlikely_jump (branch);
> +
> + /*
> + top_label:
> + st2g tag, [addr], #32
> + subs iter, iter, #1
> + bne top_label
> + */
> + top_label = gen_label_rtx ();
> + emit_label (top_label);
> +
> + /* Tag Memory using post-index st2g. */
> + next = gen_rtx_POST_INC (Pmode, addr_reg);
> + aarch64_emit_stxg_insn (OImode, next, addr_reg, tagged_pointer);
> +
> + /* Decrement ITER. */
> + emit_insn (gen_subdi3_compare1_imm (iter, iter, CONST1_RTX
> (iter_mode),
> + CONSTM1_RTX (iter_mode)));
> +
> + rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
> + rtx x = gen_rtx_fmt_ee (NE, CCmode, cc_reg, const0_rtx);
> + jump = emit_jump_insn (gen_aarch64_bcond (x, cc_reg, top_label));
> + JUMP_LABEL (jump) = top_label;
> +
> + emit_label (bottom_label);
> +}
> +
> +/* Implement expand for tag_memory. */
> +void
> +aarch64_expand_tag_memory (rtx base, rtx tagged_pointer, rtx size)
> +{
> + rtx addr;
> + HOST_WIDE_INT len, offset;
> + unsigned HOST_WIDE_INT granule_size;
> + unsigned HOST_WIDE_INT iters = 0;
> +
> + granule_size = (HOST_WIDE_INT) AARCH64_MEMTAG_GRANULE_SIZE;
> +
> + if (!REG_P (tagged_pointer))
> + tagged_pointer = force_reg (Pmode, tagged_pointer);
> +
> + if (!REG_P (base))
> + base = force_reg (Pmode, base);
> +
> + /* If size is small enough, I can can unroll the loop using stg/st2g
> + instructions. */
> + if (CONST_INT_P (size))
> + {
> + len = INTVAL (size);
> + if (len == 0)
> + return; /* Nothing to do. */
> +
> + /* The amount of memory to tag must be aligned to granule size by now.
> */
> + gcc_assert (len % granule_size == 0);
> +
> + iters = len / granule_size;
> + }
> +
> + /* Check predicate on max offset possible: offset (in base rtx) + size. */
> + rtx end_addr = simplify_gen_binary (PLUS, Pmode, base, size);
> + end_addr = gen_rtx_MEM (TImode, end_addr);
> + if (iters > 0
> + && iters <= (unsigned HOST_WIDE_INT)
> aarch64_tag_memory_loop_threshold
> + && aarch64_granule16_memory_address_p (end_addr))
> + {
> + offset = 0;
> + while (iters)
> + {
> + machine_mode mode = TImode;
> + if (iters / 2)
> + {
> + mode = OImode;
> + iters--;
> + }
> + iters--;
> + addr = plus_constant (Pmode, base, offset);
> + offset += GET_MODE_SIZE (mode).to_constant ();
> + aarch64_emit_stxg_insn (mode, addr, addr, tagged_pointer);
> + }
> + }
> + else
> + aarch64_tag_memory_via_loop (base, size, tagged_pointer);
> }
>
> /* Implement TARGET_ASM_FILE_END for AArch64. This adds the AArch64
> GNU NOTE
> @@ -32806,6 +33103,21 @@ aarch64_libgcc_floating_mode_supported_p
> #undef TARGET_MEMTAG_CAN_TAG_ADDRESSES
> #define TARGET_MEMTAG_CAN_TAG_ADDRESSES
> aarch64_can_tag_addresses
>
> +#undef TARGET_MEMTAG_TAG_BITSIZE
> +#define TARGET_MEMTAG_TAG_BITSIZE aarch64_memtag_tag_bitsize
> +
> +#undef TARGET_MEMTAG_GRANULE_SIZE
> +#define TARGET_MEMTAG_GRANULE_SIZE aarch64_memtag_granule_size
> +
> +#undef TARGET_MEMTAG_INSERT_RANDOM_TAG
> +#define TARGET_MEMTAG_INSERT_RANDOM_TAG
> aarch64_memtag_insert_random_tag
> +
> +#undef TARGET_MEMTAG_ADD_TAG
> +#define TARGET_MEMTAG_ADD_TAG aarch64_memtag_add_tag
> +
> +#undef TARGET_MEMTAG_EXTRACT_TAG
> +#define TARGET_MEMTAG_EXTRACT_TAG aarch64_memtag_extract_tag
> +
> #if CHECKING_P
> #undef TARGET_RUN_TARGET_SELFTESTS
> #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
> diff --git a/gcc/config/aarch64/aarch64.md
> b/gcc/config/aarch64/aarch64.md
> index 98c65a74c8e..534c5b766d6 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -412,6 +412,7 @@ (define_c_enum "unspecv" [
> UNSPECV_GCSPOPM ; Represent GCSPOPM.
> UNSPECV_GCSSS1 ; Represent GCSSS1 Xt.
> UNSPECV_GCSSS2 ; Represent GCSSS2 Xt.
> + UNSPECV_TAG_SPACE ; Represent MTE tag memory space.
> UNSPECV_TSTART ; Represent transaction start.
> UNSPECV_TCOMMIT ; Represent transaction commit.
> UNSPECV_TCANCEL ; Represent transaction cancel.
> @@ -8608,46 +8609,48 @@ (define_insn "aarch64_rndrrs"
> ;; Memory Tagging Extension (MTE) instructions.
>
> (define_insn "irg"
> - [(set (match_operand:DI 0 "register_operand" "=rk")
> + [(set (match_operand:DI 0 "register_operand")
> (ior:DI
> - (and:DI (match_operand:DI 1 "register_operand" "rk")
> + (and:DI (match_operand:DI 1 "register_operand")
> (const_int MEMTAG_TAG_MASK))
> - (ashift:DI (unspec:QI [(match_operand:DI 2 "register_operand" "r")]
> + (ashift:DI (unspec:QI [(match_operand:DI 2 "aarch64_reg_or_zero")]
> UNSPEC_GEN_TAG_RND)
> (const_int 56))))]
> "TARGET_MEMTAG"
> - "irg\\t%0, %1, %2"
> - [(set_attr "type" "memtag")]
> + {@ [ cons: =0, 1, 2 ; attrs: type ]
> + [ rk , rk, r ; memtag ] irg\\t%0, %1, %2
> + [ rk , rk, Z ; memtag ] irg\\t%0, %1
> + }
> )
>
> (define_insn "gmi"
> [(set (match_operand:DI 0 "register_operand" "=r")
> - (ior:DI (ashift:DI
> - (const_int 1)
> - (and:QI (lshiftrt:DI
> - (match_operand:DI 1 "register_operand" "rk")
> - (const_int 56)) (const_int 15)))
> - (match_operand:DI 2 "register_operand" "r")))]
> + (ior:DI
> + (unspec:DI [(match_operand:DI 1 "register_operand" "rk")
> + (const_int 0)]
> + UNSPEC_GEN_TAG)
> + (match_operand:DI 2 "aarch64_reg_or_zero" "rZ")))]
> "TARGET_MEMTAG"
> - "gmi\\t%0, %1, %2"
> + "gmi\\t%0, %1, %x2"
> [(set_attr "type" "memtag")]
> )
>
> (define_insn "addg"
> - [(set (match_operand:DI 0 "register_operand" "=rk")
> + [(set (match_operand:DI 0 "register_operand")
> (ior:DI
> - (and:DI (plus:DI (match_operand:DI 1 "register_operand" "rk")
> - (match_operand:DI 2 "aarch64_granule16_uimm6"
> "i"))
> - (const_int -1080863910568919041)) ;; 0xf0ff...
> + (and:DI (plus:DI (match_operand:DI 1 "register_operand")
> + (match_operand:DI 2 "aarch64_granule16_imm6"))
> + (const_int MEMTAG_TAG_MASK))
> (ashift:DI
> - (unspec:QI
> - [(and:QI (lshiftrt:DI (match_dup 1) (const_int 56)) (const_int 15))
> - (match_operand:QI 3 "aarch64_memtag_tag_offset" "i")]
> - UNSPEC_GEN_TAG)
> + (unspec:DI [(match_dup 1)
> + (match_operand:QI 3 "aarch64_memtag_tag_offset")]
> + UNSPEC_GEN_TAG)
> (const_int 56))))]
> "TARGET_MEMTAG"
> - "addg\\t%0, %1, #%2, #%3"
> - [(set_attr "type" "memtag")]
> + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: type ]
> + [ rk , rk , Uag , ; memtag ] addg\t%0, %1, #%2, #%3
> + [ rk , rk , Ung , ; memtag ] subg\t%0, %1, #%n2, #%3
> + }
> )
>
> (define_insn "subp"
> @@ -8681,17 +8684,83 @@ (define_insn "ldg"
> ;; STG doesn't align the address but aborts with alignment fault
> ;; when the address is not 16-byte aligned.
> (define_insn "stg"
> - [(set (mem:QI (unspec:DI
> - [(plus:DI (match_operand:DI 1 "register_operand" "rk")
> - (match_operand:DI 2 "aarch64_granule16_simm9" "i"))]
> - UNSPEC_TAG_SPACE))
> - (and:QI (lshiftrt:DI (match_operand:DI 0 "register_operand" "rk")
> - (const_int 56)) (const_int 15)))]
> + [(set (match_operand:TI 0 "aarch64_granule16_memory_operand"
> "+Umg")
> + (unspec_volatile:TI
> + [(match_dup 0)
> + (match_operand:DI 1 "register_operand" "rk")]
> + UNSPECV_TAG_SPACE))]
> "TARGET_MEMTAG"
> - "stg\\t%0, [%1, #%2]"
> + "stg\\t%1, %0"
> [(set_attr "type" "memtag")]
> )
>
> +(define_insn "stg_<mte_name>"
> + [(set (mem:TI (MTE_PP:DI (match_operand:DI 0 "register_operand" "+rk")))
> + (unspec_volatile:TI
> + [(mem:TI (match_dup 0))
> + (match_operand:DI 1 "register_operand" "rk")]
> + UNSPECV_TAG_SPACE))
> + (clobber (match_dup 0))]
> + "TARGET_MEMTAG"
> + "stg\\t%1, <stg_ops>"
> + [(set_attr "type" "memtag")]
> +)
> +
> +;; ST2G updates allocation tags for two memory granules (i.e. 32 bytes) at
> +;; once, without zero initialization.
> +(define_insn "st2g"
> + [(set (match_operand:OI 0 "aarch64_granule16_memory_operand"
> "+Umg")
> + (unspec_volatile:OI
> + [(match_dup 0)
> + (match_operand:DI 1 "register_operand" "rk")]
> + UNSPECV_TAG_SPACE))]
> + "TARGET_MEMTAG"
> + "st2g\\t%1, %0"
> + [(set_attr "type" "memtag")]
> +)
> +
> +(define_insn "st2g_<mte_name>"
> + [(set (mem:OI (MTE_PP:DI (match_operand:DI 0 "register_operand"
> "+rk")))
> + (unspec_volatile:OI
> + [(mem:OI (match_dup 0))
> + (match_operand:DI 1 "register_operand" "rk")]
> + UNSPECV_TAG_SPACE))
> + (clobber (match_dup 0))]
> + "TARGET_MEMTAG"
> + "st2g\\t%1, <st2g_ops>"
> + [(set_attr "type" "memtag")]
> +)
> +
> +(define_expand "tag_memory"
> + [(match_operand:DI 0 "register_operand" "")
> + (match_operand:DI 1 "nonmemory_operand" "")
> + (match_operand:DI 2 "nonmemory_operand" "")]
> + ""
> +{
> + aarch64_expand_tag_memory (operands[0], operands[1], operands[2]);
> + DONE;
> +})
> +
> +(define_expand "compose_tag"
> + [(set (match_operand:DI 0 "register_operand")
> + (ior:DI
> + (and:DI (plus:DI (match_operand:DI 1 "register_operand")
> + (const_int 0))
> + (const_int MEMTAG_TAG_MASK))
> + (ashift:DI
> + (unspec:DI [(match_dup 1)
> + (match_operand 2 "immediate_operand")]
> + UNSPEC_GEN_TAG)
> + (const_int 56))))]
> + ""
> +{
> + if (INTVAL (operands[2]) == 0)
> + {
> + emit_move_insn (operands[0], operands[1]);
> + DONE;
> + }
> +})
> +
> ;; Load/Store 64-bit (LS64) instructions.
> (define_insn "ld64b"
> [(set (match_operand:V8DI 0 "register_operand" "=r")
> diff --git a/gcc/config/aarch64/aarch64.opt
> b/gcc/config/aarch64/aarch64.opt
> index 8aae953e60d..135b6753ac5 100644
> --- a/gcc/config/aarch64/aarch64.opt
> +++ b/gcc/config/aarch64/aarch64.opt
> @@ -443,6 +443,11 @@ individual writeback accesses where possible. A
> value of two means we
> also try to opportunistically form writeback opportunities by folding in
> trailing destructive updates of the base register used by a pair.
>
> +-param=aarch64-tag-memory-loop-threshold=
> +Target Joined UInteger Var(aarch64_tag_memory_loop_threshold) Init(10)
> IntegerRange(0, 65536) Param
> +Param to control the treshold in number of granules beyond which an
> +explicit loop for tagging a memory block is emitted.
> +
> Wexperimental-fmv-target
> Target Var(warn_experimental_fmv) Warning Init(1)
> This option is deprecated.
> diff --git a/gcc/config/aarch64/constraints.md
> b/gcc/config/aarch64/constraints.md
> index 7b9e5583bc7..94d2ff4d847 100644
> --- a/gcc/config/aarch64/constraints.md
> +++ b/gcc/config/aarch64/constraints.md
> @@ -346,6 +346,12 @@ (define_memory_constraint "Ump"
> (match_test "aarch64_legitimate_address_p (GET_MODE (op), XEXP (op,
> 0),
> true,
> ADDR_QUERY_LDP_STP)")))
>
> +(define_memory_constraint "Umg"
> + "@internal
> + A memory address for MTE load/store tag operation."
> + (and (match_code "mem")
> + (match_test "aarch64_granule16_memory_address_p (op)")))
> +
> ;; Used for storing or loading pairs in an AdvSIMD register using an STP/LDP
> ;; as a vector-concat. The address mode uses the same constraints as if it
> ;; were for a single value.
> @@ -600,6 +606,21 @@ (define_address_constraint "Dp"
> An address valid for a prefetch instruction."
> (match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
>
> +(define_constraint "Uag"
> + "@internal
> + A constant that can be used as address offset for an ADDG operation."
> + (and (match_code "const_int")
> + (match_test "IN_RANGE (ival, 0, 1008)
> + && !(ival & 0xf)")))
> +
> +(define_constraint "Ung"
> + "@internal
> + A constant that can be used as address offset for an SUBG operation (once
> + negated)."
> + (and (match_code "const_int")
> + (match_test "IN_RANGE (ival, -1008, -1)
> + && !(ival & 0xf)")))
> +
> (define_constraint "vgb"
> "@internal
> A constraint that matches an immediate offset valid for SVE LD1B
> diff --git a/gcc/config/aarch64/iterators.md
> b/gcc/config/aarch64/iterators.md
> index 332e7ffd2ea..586c3bc3285 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -2887,6 +2887,9 @@ (define_code_iterator SVE_UNPRED_FP_BINARY
> [plus minus mult])
> ;; SVE integer comparisons.
> (define_code_iterator SVE_INT_CMP [lt le eq ne ge gt ltu leu geu gtu])
>
> +;; pre/post-{inc,dec} for mte instructions.
> +(define_code_iterator MTE_PP [post_inc post_dec pre_inc pre_dec])
> +
> ;; -------------------------------------------------------------------
> ;; Code Attributes
> ;; -------------------------------------------------------------------
> @@ -3233,6 +3236,23 @@ (define_code_attr SVE_COND_FP [(plus
> "UNSPEC_COND_FADD")
> (minus "UNSPEC_COND_FSUB")
> (mult "UNSPEC_COND_FMUL")])
>
> +;; Map MTE pre/post to the right asm format
> +(define_code_attr stg_ops [(post_inc "[%0], 16")
> + (post_dec "[%0], -16")
> + (pre_inc "[%0, 16]!")
> + (pre_dec "[%0, -16]!")])
> +
> +(define_code_attr st2g_ops [(post_inc "[%0], 32")
> + (post_dec "[%0], -32")
> + (pre_inc "[%0, 32]!")
> + (pre_dec "[%0, -32]!")])
> +
> +;; Map MTE pre/post to names
> +(define_code_attr mte_name [(post_inc "postinc")
> + (post_dec "postdec")
> + (pre_inc "preinc")
> + (pre_dec "predec")])
> +
> ;; -------------------------------------------------------------------
> ;; Int Iterators.
> ;; -------------------------------------------------------------------
> diff --git a/gcc/config/aarch64/predicates.md
> b/gcc/config/aarch64/predicates.md
> index 42304cef439..dca0baf75e0 100644
> --- a/gcc/config/aarch64/predicates.md
> +++ b/gcc/config/aarch64/predicates.md
> @@ -1066,13 +1066,20 @@ (define_predicate
> "aarch64_bytes_per_sve_vector_operand"
> (match_test "known_eq (wi::to_poly_wide (op, mode),
> BYTES_PER_SVE_VECTOR)")))
>
> +;; The uimm4 field is a 4-bit field that only accepts immediates in the
> +;; range 0..15.
> (define_predicate "aarch64_memtag_tag_offset"
> (and (match_code "const_int")
> - (match_test "IN_RANGE (INTVAL (op), 0, 15)")))
> + (match_test "UINTVAL (op) <= 15")))
> +
> +(define_predicate "aarch64_granule16_memory_operand"
> + (and (match_test "TARGET_MEMTAG")
> + (match_code "mem")
> + (match_test "aarch64_granule16_memory_address_p (op)")))
>
> -(define_predicate "aarch64_granule16_uimm6"
> +(define_predicate "aarch64_granule16_imm6"
> (and (match_code "const_int")
> - (match_test "IN_RANGE (INTVAL (op), 0, 1008)
> + (match_test "IN_RANGE (INTVAL (op), -1008, 1008)
> && !(INTVAL (op) & 0xf)")))
>
> (define_predicate "aarch64_granule16_simm9"
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 0bc22695931..526c04404da 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -17917,6 +17917,11 @@ would be beneficial to unroll the main
> vectorized loop and by how much. This
> parameter set's the upper bound of how much the vectorizer will unroll the
> main
> loop. The default value is four.
>
> +@item aarch64-tag-memory-loop-threshold
> +Param to control the treshold in number of granules beyond which an
> +explicit loop for tagging a memory block is emitted. The memory block
> +is tagged using MTE instructions.
> +
> @end table
>
> The following choices of @var{name} are available on GCN targets:
> @@ -18396,8 +18401,10 @@ for a list of supported options.
> The option cannot be combined with @option{-fsanitize=thread} or
> @option{-fsanitize=hwaddress}. Note that the only targets
> @option{-fsanitize=hwaddress} is currently supported on are x86-64
> -(only with @code{-mlam=u48} or @code{-mlam=u57} options) and
> AArch64,
> -in both cases only in ABIs with 64-bit pointers.
> +(only with @code{-mlam=u48} or @code{-mlam=u57} options) and
> AArch64, in both
> +cases only in ABIs with 64-bit pointers. Similarly,
> +@option{-fsanitize=memtag-stack} is currently only supported on AArch64
> ABIs
> +with 64-bit pointers.
>
> When compiling with @option{-fsanitize=address}, you should also
> use @option{-g} to produce more meaningful output.
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/memtag_1.c
> b/gcc/testsuite/gcc.target/aarch64/acle/memtag_1.c
> index f8368690032..e94a2220fe3 100644
> --- a/gcc/testsuite/gcc.target/aarch64/acle/memtag_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/memtag_1.c
> @@ -54,9 +54,9 @@ test_memtag_6 (void *p)
> __arm_mte_set_tag (p);
> }
>
> -/* { dg-final { scan-assembler-times {irg\tx..?, x..?, x..?\n} 1 } } */
> +/* { dg-final { scan-assembler-times {irg\tx..?, x..?\n} 1 } } */
> /* { dg-final { scan-assembler-times {gmi\tx..?, x..?, x..?\n} 1 } } */
> /* { dg-final { scan-assembler-times {subp\tx..?, x..?, x..?\n} 1 } } */
> /* { dg-final { scan-assembler-times {addg\tx..?, x..?, #0, #1\n} 1 } } */
> /* { dg-final { scan-assembler-times {ldg\tx..?, \[x..?, #0\]\n} 1 } } */
> -/* { dg-final { scan-assembler-times {stg\tx..?, \[x..?, #0\]\n} 1 } } */
> \ No newline at end of file
> +/* { dg-final { scan-assembler-times {stg\tx..?, \[x..?\]\n} 1 } } */
> --
> 2.52.0