On Fri, Apr 11, 2025 at 12:26 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Richard Sandiford <richard.sandif...@arm.com> writes: > > Richard Biener <richard.guent...@gmail.com> writes: > >> On Fri, Apr 11, 2025 at 11:10 AM Richard Sandiford > >> <richard.sandif...@arm.com> wrote: > >>> > >>> Richard Biener <richard.guent...@gmail.com> writes: > >>> > On Thu, Apr 10, 2025 at 10:10 PM Richard Sandiford > >>> > <richard.sandif...@arm.com> wrote: > >>> >> > >>> >> PR119610 is about incorrect CFI output for a stack probe when that > >>> >> probe is not the initial allocation. The main aarch64 stack probe > >>> >> function, aarch64_allocate_and_probe_stack_space, implicitly assumed > >>> >> that the incoming stack pointer pointed to the top of the frame, > >>> >> and thus held the CFA. > >>> >> > >>> >> aarch64_save_callee_saves and aarch64_restore_callee_saves use a > >>> >> parameter called bytes_below_sp to track how far the stack pointer > >>> >> is above the base of the static frame. This patch does the same > >>> >> thing for aarch64_allocate_and_probe_stack_space. > >>> >> > >>> >> Also, I noticed that the SVE path was attaching the first CFA note > >>> >> to the wrong instruction: it was attaching the note to the calculation > >>> >> of the stack size, rather than to the r11<-sp copy. > >>> >> > >>> >> Bootstrapped & regression-tested on aarch64-linux-gnu. I'll push on > >>> >> Monday if there are no comments before then. I'd appreciate a second > >>> >> pair of eyes though, since this is a sensitive area. > >>> > > >>> > Do you happen to know if the backports to older branches you provided > >>> > for > >>> > the change that triggered this issue (in particular to GCC 7) are also > >>> > affected? > >>> > >>> GCC 7 and GCC 8 should be ok. The bug relies on stack protection being > >>> enabled (-fstack-protector-strong for the testcase in the PR, but just > >>> -fstack-protector for others) and that was added in GCC 9. > >> > >> Hmm, I see -fstack-protector[-strong] working with GCC 7 on aarch64. > > > > I think it's just being silently accepted as a nop. At least: > > > >> Specifically > >> the testcase produces for foo(): > >> > >> .text > >> .align 2 > >> .global _Z3foov > >> .type _Z3foov, %function > >> _Z3foov: > >> .LFB0: > >> .cfi_startproc > >> stp x29, x30, [sp, -16]! > >> .cfi_def_cfa_offset 16 > >> .cfi_offset 29, -16 > >> .cfi_offset 30, -8 > >> add x29, sp, 0 > >> .cfi_def_cfa_register 29 > >> sub sp, sp, #16 > >> sub sp, sp, #524288 > > > > ...the function isn't probing the stack here. It ought to be doing > > something like the code that Alex quoted in the PR. Compare the > > GCC 8 and GCC 9 output in https://godbolt.org/z/cWbsG3eGb . > > Gah, Alex pointed out off-list that I'd once again got the options > the wrong way around: -fstack-clash-protection is the thing that > was added in GCC 9. > > I find these option names really confusing :-(
True. Note we backported rough -fstack-clash-protection to GCC 7, but middle-end only (attached for reference). Richard. > > Richard
Index: gcc/common.opt =================================================================== --- gcc/common.opt (revision 244266) +++ gcc/common.opt (working copy) @@ -2295,6 +2299,10 @@ fstack-check Common Alias(fstack-check=, specific, no) Insert stack checking code into the program. Same as -fstack-check=specific. +fstack-clash-protection +Common Report Var(flag_stack_clash_protection) +Insert probes per page for dynamically allocated stack space + fstack-limit Common Var(common_deferred_options) Defer Index: gcc/explow.c =================================================================== --- gcc/explow.c (revision 244266) +++ gcc/explow.c (working copy) @@ -1277,6 +1277,8 @@ get_dynamic_stack_size (rtx *psize, unsi *psize = size; } +#define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP) + /* Return an rtx representing the address of an area of memory dynamically pushed on the stack. @@ -1305,6 +1307,8 @@ allocate_dynamic_stack_space (rtx size, HOST_WIDE_INT stack_usage_size = -1; rtx_code_label *final_label; rtx final_target, target; + rtx loop_lab, end_lab, skip_lab, last_size, before_skip; + int probe_pass = 0; /* If we're asking for zero bytes, it doesn't matter what we point to since we can't dereference it. But return a reasonable @@ -1440,6 +1444,30 @@ allocate_dynamic_stack_space (rtx size, /* Don't let anti_adjust_stack emit notes. */ suppress_reg_args_size = true; + if (flag_stack_clash_protection) + { +#ifndef STACK_GROWS_DOWNWARD + sorry("-fstack-clash-protection is incompatible with upward growing stack"); +#endif + size = copy_to_mode_reg (Pmode, convert_to_mode (Pmode, size, 1)); + loop_lab = gen_label_rtx (); + end_lab = gen_label_rtx (); + skip_lab = gen_label_rtx (); + /* We insert 'target = virtual_stack_dynamic_rtx' here, but target + is changed later, so that insn can be constructed only later. */ + before_skip = get_last_insn (); + emit_cmp_and_jump_insns (size, CONST0_RTX (Pmode), EQ, NULL_RTX, + Pmode, 1, skip_lab); + emit_label (loop_lab); + emit_cmp_and_jump_insns (size, GEN_INT (PROBE_INTERVAL), LTU, + NULL_RTX, Pmode, 1, end_lab); + last_size = expand_binop (Pmode, sub_optab, size, GEN_INT (PROBE_INTERVAL), size, + 1, OPTAB_WIDEN); + gcc_assert (last_size == size); + size = GEN_INT (PROBE_INTERVAL); + } + +again: /* Perform the required allocation from the stack. Some systems do this differently than simply incrementing/decrementing from the stack pointer, such as acquiring the space by calling malloc(). */ @@ -1499,6 +1527,15 @@ allocate_dynamic_stack_space (rtx size, if (STACK_GROWS_DOWNWARD) emit_move_insn (target, virtual_stack_dynamic_rtx); } + if (flag_stack_clash_protection && probe_pass == 0) + { + probe_pass = 1; + emit_stack_probe (target); + emit_jump (loop_lab); + emit_label (end_lab); + size = last_size; + goto again; + } suppress_reg_args_size = false; @@ -1510,6 +1547,17 @@ allocate_dynamic_stack_space (rtx size, emit_label (final_label); target = final_target; } + if (flag_stack_clash_protection) + { + rtx seq; + emit_stack_probe (target); + emit_label (skip_lab); + start_sequence (); + emit_move_insn (target, virtual_stack_dynamic_rtx); + seq = get_insns (); + end_sequence (); + emit_insn_after (seq, before_skip); + } target = align_dynamic_address (target, required_align); @@ -1593,8 +1641,6 @@ emit_stack_probe (rtx address) the current stack pointer. STACK_GROWS_DOWNWARD says whether to add or subtract them from the stack pointer. */ -#define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP) - #if STACK_GROWS_DOWNWARD #define STACK_GROW_OP MINUS #define STACK_GROW_OPTAB sub_optab