Hi Ard, Thanks for working on this, comments inline.
> -----Original Message----- > From: Gcc-patches <gcc-patches- > bounces+kyrylo.tkachov=arm....@gcc.gnu.org> On Behalf Of Ard > Biesheuvel via Gcc-patches > Sent: Thursday, October 28, 2021 12:27 PM > To: linux-harden...@vger.kernel.org > Cc: keesc...@chromium.org; Richard Sandiford > <richard.sandif...@arm.com>; thomas.preudho...@celest.fr; Keith > Packard <keith...@amazon.com>; gcc-patches@gcc.gnu.org; Ard Biesheuvel > <a...@kernel.org> > Subject: [PATCH v4 1/1] [ARM] Add support for TLS register based stack > protector canary access > > Add support for accessing the stack canary value via the TLS register, > so that multiple threads running in the same address space can use > distinct canary values. This is intended for the Linux kernel running in > SMP mode, where processes entering the kernel are essentially threads > running the same program concurrently: using a global variable for the > canary in that context is problematic because it can never be rotated, > and so the OS is forced to use the same value as long as it remains up. > > Using the TLS register to index the stack canary helps with this, as it > allows each CPU to context switch the TLS register along with the rest > of the process, permitting each process to use its own value for the > stack canary. > > 2021-10-28 Ard Biesheuvel <a...@kernel.org> > > * config/arm/arm-opts.h (enum stack_protector_guard): New > * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem): > New > * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define > (arm_option_override_internal): Handle and put in error checks > for stack protector guard options. > (arm_option_reconfigure_globals): Likewise > (arm_stack_protect_tls_canary_mem): New > (arm_stack_protect_guard): New > * config/arm/arm.md (stack_protect_set): New > (stack_protect_set_tls): Likewise > (stack_protect_test): Likewise > (stack_protect_test_tls): Likewise > (reload_tp_hard): Likewise > * config/arm/arm.opt (-mstack-protector-guard): New > (-mstack-protector-guard-offset): New. > * doc/invoke.texi: Document new options > How has this been tested? The code looks mostly okay to me, but the rules for patches require a bootstrap and run of the testsuite: https://gcc.gnu.org/contribute.html#testing If you don't have access to an arm machine, the GCC compile farm may be of use: https://gcc.gnu.org/wiki/CompileFarm In terms of tests, like Qing says we'd like to see some additions to the testsuite. These would go into the testsuite/gcc.target/arm directory. You can grep for "mstack-protector-guard" in the testsuite/ directory to see how various targets test this functionality and copy/adapt some tests for arm. > Signed-off-by: Ard Biesheuvel <a...@kernel.org> > --- > gcc/config/arm/arm-opts.h | 6 ++ > gcc/config/arm/arm-protos.h | 2 + > gcc/config/arm/arm.c | 55 +++++++++++++++ > gcc/config/arm/arm.md | 71 +++++++++++++++++++- > gcc/config/arm/arm.opt | 22 ++++++ > gcc/doc/invoke.texi | 9 +++ > 6 files changed, 163 insertions(+), 2 deletions(-) > > diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h > index 5c4b62f404f7..581ba3c4fbbb 100644 > --- a/gcc/config/arm/arm-opts.h > +++ b/gcc/config/arm/arm-opts.h > @@ -69,4 +69,10 @@ enum arm_tls_type { > TLS_GNU, > TLS_GNU2 > }; > + > +/* Where to get the canary for the stack protector. */ > +enum stack_protector_guard { > + SSP_TLSREG, /* per-thread canary in TLS register */ > + SSP_GLOBAL /* global canary */ > +}; > #endif > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > index 9b1f61394ad7..d8d605920c97 100644 > --- a/gcc/config/arm/arm-protos.h > +++ b/gcc/config/arm/arm-protos.h > @@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, > rtx, rtx, rtx, rtx, rtx, rtx); > extern rtx arm_load_tp (rtx); > extern bool arm_coproc_builtin_available (enum unspecv); > extern bool arm_coproc_ldc_stc_legitimate_address (rtx); > +extern rtx arm_stack_protect_tls_canary_mem (bool); > + > > #if defined TREE_CODE > extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree); > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index c4ff06b087eb..6a659d81a6fe 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -829,6 +829,9 @@ static const struct attribute_spec > arm_attribute_table[] = > > #undef TARGET_MD_ASM_ADJUST > #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust > + > +#undef TARGET_STACK_PROTECT_GUARD > +#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard > > > > /* Obstack for minipool constant handling. */ > static struct obstack minipool_obstack; > @@ -3155,6 +3158,26 @@ arm_option_override_internal (struct > gcc_options *opts, > if (TARGET_THUMB2_P (opts->x_target_flags)) > opts->x_inline_asm_unified = true; > > + if (arm_stack_protector_guard == SSP_GLOBAL > + && opts->x_arm_stack_protector_guard_offset_str) > + { > + error ("incompatible options %'-mstack-protector-guard=global%' and" > + "%'-mstack-protector-guard-offset=%qs%'", > + arm_stack_protector_guard_offset_str); > + } > + > + if (opts->x_arm_stack_protector_guard_offset_str) > + { > + char *end; > + const char *str = arm_stack_protector_guard_offset_str; > + errno = 0; > + long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0); > + if (!*str || *end || errno) > + error ("%qs is not a valid offset in %qs", str, > + "-mstack-protector-guard-offset="); > + arm_stack_protector_guard_offset = offs; > + } The arm target supports a bigger diversity of configurations than aarch64. Do we need to error out here if the user tries to specify the option for targets like M-profile, Thumb-1, older Arm architectures etc? If you want to gate all this on TARGET_32BIT that will restrict it to A32 and T32. > + > #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS > SUBTARGET_OVERRIDE_INTERNAL_OPTIONS; > #endif > @@ -3822,6 +3845,10 @@ arm_option_reconfigure_globals (void) > else > target_thread_pointer = TP_SOFT; > } > + > + if (arm_stack_protector_guard == SSP_TLSREG > + && target_thread_pointer != TP_CP15) > + error("%'-mstack-protector-guard=tls%' needs a hardware TLS register"); > } We can use the more compact (&& !TARGET_HARD_TP) check here instead. > > /* Perform some validation between the desired architecture and the rest of > the > @@ -8087,6 +8114,22 @@ legitimize_pic_address (rtx orig, machine_mode > mode, rtx reg, rtx pic_reg, > } > > > +rtx > +arm_stack_protect_tls_canary_mem (bool reload) New functions should have a function comment describing the arguments and return value. See other functions in this file for the recommended format. > +{ > + rtx tp = gen_reg_rtx (SImode); > + if (reload) > + emit_insn (gen_reload_tp_hard (tp)); > + else > + emit_insn (gen_load_tp_hard (tp)); > + > + rtx reg = gen_reg_rtx (SImode); > + rtx offset = GEN_INT (arm_stack_protector_guard_offset); > + emit_set_insn (reg, gen_rtx_PLUS (SImode, tp, offset)); You can write this more compactly as: emit_set_insn (gen_addsi3 (reg, tp, offset)); > + return gen_rtx_MEM (SImode, reg); > +} > + > + > /* Whether a register is callee saved or not. This is necessary because high > registers are marked as caller saved when optimizing for size on Thumb-1 > targets despite being callee saved in order to avoid using them. */ > @@ -34054,6 +34097,18 @@ arm_run_selftests (void) > #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests > #endif /* CHECKING_P */ > > +/* Implement TARGET_STACK_PROTECT_GUARD. In case of a > + global variable based guard use the default else > + return a null tree. */ > +static tree > +arm_stack_protect_guard (void) > +{ > + if (arm_stack_protector_guard == SSP_GLOBAL) > + return default_stack_protect_guard (); > + > + return NULL_TREE; > +} > + > /* Worker function for TARGET_MD_ASM_ADJUST, while in thumb1 mode. > Unlike the arm version, we do NOT implement asm flag outputs. */ > > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > index 4adc976b8b67..d31349cd2614 100644 > --- a/gcc/config/arm/arm.md > +++ b/gcc/config/arm/arm.md > @@ -9183,7 +9183,7 @@ (define_expand "stack_protect_combined_set" > UNSPEC_SP_SET)) > (clobber (match_scratch:SI 2 "")) > (clobber (match_scratch:SI 3 ""))])] > - "" > + "arm_stack_protector_guard == SSP_GLOBAL" > "" > ) > > @@ -9267,7 +9267,7 @@ (define_expand "stack_protect_combined_test" > (clobber (match_scratch:SI 3 "")) > (clobber (match_scratch:SI 4 "")) > (clobber (reg:CC CC_REGNUM))])] > - "" > + "arm_stack_protector_guard == SSP_GLOBAL" > "" > ) > > @@ -9361,6 +9361,64 @@ (define_insn "arm_stack_protect_test_insn" > (set_attr "arch" "t,32")] > ) > > +(define_expand "stack_protect_set" > + [(match_operand:SI 0 "memory_operand") > + (match_operand:SI 1 "memory_operand")] > + "arm_stack_protector_guard == SSP_TLSREG" > + " > +{ > + operands[1] = arm_stack_protect_tls_canary_mem (false /* reload */); > + emit_insn (gen_stack_protect_set_tls (operands[0], operands[1])); > + DONE; > +}" > +) > + > +;; DO NOT SPLIT THIS PATTERN. It is important for security reasons that the > +;; canary value does not live beyond the life of this sequence. > +(define_insn "stack_protect_set_tls" > + [(set (match_operand:SI 0 "memory_operand" "=m") > + (unspec:SI [(match_operand:SI 1 "memory_operand" "m")] > + UNSPEC_SP_SET)) > + (set (match_scratch:SI 2 "=&r") (const_int 0))] > + "" > + "ldr\\t%2, %1\;str\\t%2, %0\;mov\t%2, #0" > + [(set_attr "length" "12") > + (set_attr "conds" "nocond") I think this should be "unconditional" as we don't want the late if-conversion pass to try making this conditional (though it'll likely stay away as this is a multi-insn parallel anyway). I know that the existing stack_protect_set_insn has "nocond" here, but I think that's wrong, and we can fix that separately. > + (set_attr "type" "multiple")] > +) > + > +(define_expand "stack_protect_test" > + [(match_operand:SI 0 "memory_operand") > + (match_operand:SI 1 "memory_operand") > + (match_operand:SI 2)] > + "arm_stack_protector_guard == SSP_TLSREG" > + " > +{ > + operands[1] = arm_stack_protect_tls_canary_mem (true /* reload */); > + emit_insn (gen_stack_protect_test_tls (operands[0], operands[1])); > + > + rtx cc_reg = gen_rtx_REG (CC_Zmode, CC_REGNUM); > + rtx eq = gen_rtx_EQ (CC_Zmode, cc_reg, const0_rtx); > + emit_jump_insn (gen_arm_cond_branch (operands[2], eq, cc_reg)); > + DONE; > +}" > +) > + > +(define_insn "stack_protect_test_tls" > + [(set (reg:CC_Z CC_REGNUM) > + (compare:CC_Z (unspec:SI [(match_operand:SI 0 "memory_operand" > "m") > + (match_operand:SI 1 "memory_operand" > "m")] > + UNSPEC_SP_TEST) > + (const_int 0))) > + (clobber (match_scratch:SI 2 "=&r")) > + (clobber (match_scratch:SI 3 "=&r"))] > + "" > + "ldr\t%2, %0\;ldr\t%3, %1\;eors\t%2, %3, %2\;mov\t%3, #0" > + [(set_attr "length" "16") > + (set_attr "conds" "set") > + (set_attr "type" "multiple")] > +) > + > (define_expand "casesi" > [(match_operand:SI 0 "s_register_operand") ; index to jump on > (match_operand:SI 1 "const_int_operand") ; lower bound > @@ -12133,6 +12191,15 @@ (define_insn "load_tp_hard" > (set_attr "type" "mrs")] > ) > > +;; Used by the TLS register based stack protector > +(define_insn "reload_tp_hard" > + [(set (match_operand:SI 0 "register_operand" "=r") > + (unspec_volatile [(const_int 0)] VUNSPEC_MRC))] The unspec_volatile should have a mode: (unspec_volatile:SI ...) > + "TARGET_HARD_TP" > + "mrc\\tp15, 0, %0, c13, c0, 3\\t@ reload_tp_hard" > + [(set_attr "type" "mrs")] > +) > + > ;; Doesn't clobber R1-R3. Must use r0 for the first operand. > (define_insn "load_tp_soft_fdpic" > [(set (reg:SI 0) (unspec:SI [(const_int 0)] UNSPEC_TLS)) > diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt > index a7677eeb45c8..4b3e17bc319c 100644 > --- a/gcc/config/arm/arm.opt > +++ b/gcc/config/arm/arm.opt > @@ -311,3 +311,25 @@ Generate code which uses the core registers only > (r0-r14). > mfdpic > Target Mask(FDPIC) > Enable Function Descriptor PIC mode. > + > +mstack-protector-guard= > +Target RejectNegative Joined Enum(stack_protector_guard) > Var(arm_stack_protector_guard) Init(SSP_GLOBAL) > +Use given stack-protector guard. > + > +Enum > +Name(stack_protector_guard) Type(enum stack_protector_guard) > +Valid arguments to -mstack-protector-guard=: > + > +EnumValue > +Enum(stack_protector_guard) String(tls) Value(SSP_TLSREG) > + > +EnumValue > +Enum(stack_protector_guard) String(global) Value(SSP_GLOBAL) > + > +mstack-protector-guard-offset= > +Target Joined RejectNegative String > Var(arm_stack_protector_guard_offset_str) > +Use an immediate to offset from the TLS register. This option is for use with > +fstack-protector-guard=tls and not for use in user-land code. This text should go (or be copied) to... > + > +TargetVariable > +long arm_stack_protector_guard_offset = 0 > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 71992b8c5974..46d009376018 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -810,6 +810,7 @@ Objective-C and Objective-C++ Dialects}. > -mpure-code @gol > -mcmse @gol > -mfix-cmse-cve-2021-35465 @gol > +-mstack-protector-guard=@var{guard} -mstack-protector-guard- > offset=@var{offset} @gol > -mfdpic} > > @emph{AVR Options} > @@ -20946,6 +20947,14 @@ enabled by default when the option @option{- > mcpu=} is used with > @code{cortex-m33}, @code{cortex-m35p} or @code{cortex-m55}. The > option > @option{-mno-fix-cmse-cve-2021-35465} can be used to disable the > mitigation. > > +@item -mstack-protector-guard=@var{guard} > +@itemx -mstack-protector-guard-offset=@var{offset} > +@opindex mstack-protector-guard > +@opindex mstack-protector-guard-offset > +Generate stack protection code using canary at @var{guard}. Supported > +locations are @samp{global} for a global canary or @samp{tls} for a > +canary accessible via the TLS register. ... here as this is the actual user-visible documentation. Thanks, Kyrill > + > @item -mfdpic > @itemx -mno-fdpic > @opindex mfdpic > -- > 2.30.2