On Fri, Apr 11, 2025 at 12:26 PM Richard Sandiford
<[email protected]> wrote:
>
> Richard Sandiford <[email protected]> writes:
> > Richard Biener <[email protected]> writes:
> >> On Fri, Apr 11, 2025 at 11:10 AM Richard Sandiford
> >> <[email protected]> wrote:
> >>>
> >>> Richard Biener <[email protected]> writes:
> >>> > On Thu, Apr 10, 2025 at 10:10 PM Richard Sandiford
> >>> > <[email protected]> wrote:
> >>> >>
> >>> >> PR119610 is about incorrect CFI output for a stack probe when that
> >>> >> probe is not the initial allocation. The main aarch64 stack probe
> >>> >> function, aarch64_allocate_and_probe_stack_space, implicitly assumed
> >>> >> that the incoming stack pointer pointed to the top of the frame,
> >>> >> and thus held the CFA.
> >>> >>
> >>> >> aarch64_save_callee_saves and aarch64_restore_callee_saves use a
> >>> >> parameter called bytes_below_sp to track how far the stack pointer
> >>> >> is above the base of the static frame. This patch does the same
> >>> >> thing for aarch64_allocate_and_probe_stack_space.
> >>> >>
> >>> >> Also, I noticed that the SVE path was attaching the first CFA note
> >>> >> to the wrong instruction: it was attaching the note to the calculation
> >>> >> of the stack size, rather than to the r11<-sp copy.
> >>> >>
> >>> >> Bootstrapped & regression-tested on aarch64-linux-gnu. I'll push on
> >>> >> Monday if there are no comments before then. I'd appreciate a second
> >>> >> pair of eyes though, since this is a sensitive area.
> >>> >
> >>> > Do you happen to know if the backports to older branches you provided
> >>> > for
> >>> > the change that triggered this issue (in particular to GCC 7) are also
> >>> > affected?
> >>>
> >>> GCC 7 and GCC 8 should be ok. The bug relies on stack protection being
> >>> enabled (-fstack-protector-strong for the testcase in the PR, but just
> >>> -fstack-protector for others) and that was added in GCC 9.
> >>
> >> Hmm, I see -fstack-protector[-strong] working with GCC 7 on aarch64.
> >
> > I think it's just being silently accepted as a nop. At least:
> >
> >> Specifically
> >> the testcase produces for foo():
> >>
> >> .text
> >> .align 2
> >> .global _Z3foov
> >> .type _Z3foov, %function
> >> _Z3foov:
> >> .LFB0:
> >> .cfi_startproc
> >> stp x29, x30, [sp, -16]!
> >> .cfi_def_cfa_offset 16
> >> .cfi_offset 29, -16
> >> .cfi_offset 30, -8
> >> add x29, sp, 0
> >> .cfi_def_cfa_register 29
> >> sub sp, sp, #16
> >> sub sp, sp, #524288
> >
> > ...the function isn't probing the stack here. It ought to be doing
> > something like the code that Alex quoted in the PR. Compare the
> > GCC 8 and GCC 9 output in https://godbolt.org/z/cWbsG3eGb .
>
> Gah, Alex pointed out off-list that I'd once again got the options
> the wrong way around: -fstack-clash-protection is the thing that
> was added in GCC 9.
>
> I find these option names really confusing :-(
True. Note we backported rough -fstack-clash-protection to GCC 7,
but middle-end only (attached for reference).
Richard.
>
> Richard
Index: gcc/common.opt
===================================================================
--- gcc/common.opt (revision 244266)
+++ gcc/common.opt (working copy)
@@ -2295,6 +2299,10 @@ fstack-check
Common Alias(fstack-check=, specific, no)
Insert stack checking code into the program. Same as -fstack-check=specific.
+fstack-clash-protection
+Common Report Var(flag_stack_clash_protection)
+Insert probes per page for dynamically allocated stack space
+
fstack-limit
Common Var(common_deferred_options) Defer
Index: gcc/explow.c
===================================================================
--- gcc/explow.c (revision 244266)
+++ gcc/explow.c (working copy)
@@ -1277,6 +1277,8 @@ get_dynamic_stack_size (rtx *psize, unsi
*psize = size;
}
+#define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP)
+
/* Return an rtx representing the address of an area of memory dynamically
pushed on the stack.
@@ -1305,6 +1307,8 @@ allocate_dynamic_stack_space (rtx size,
HOST_WIDE_INT stack_usage_size = -1;
rtx_code_label *final_label;
rtx final_target, target;
+ rtx loop_lab, end_lab, skip_lab, last_size, before_skip;
+ int probe_pass = 0;
/* If we're asking for zero bytes, it doesn't matter what we point
to since we can't dereference it. But return a reasonable
@@ -1440,6 +1444,30 @@ allocate_dynamic_stack_space (rtx size,
/* Don't let anti_adjust_stack emit notes. */
suppress_reg_args_size = true;
+ if (flag_stack_clash_protection)
+ {
+#ifndef STACK_GROWS_DOWNWARD
+ sorry("-fstack-clash-protection is incompatible with upward growing stack");
+#endif
+ size = copy_to_mode_reg (Pmode, convert_to_mode (Pmode, size, 1));
+ loop_lab = gen_label_rtx ();
+ end_lab = gen_label_rtx ();
+ skip_lab = gen_label_rtx ();
+ /* We insert 'target = virtual_stack_dynamic_rtx' here, but target
+ is changed later, so that insn can be constructed only later. */
+ before_skip = get_last_insn ();
+ emit_cmp_and_jump_insns (size, CONST0_RTX (Pmode), EQ, NULL_RTX,
+ Pmode, 1, skip_lab);
+ emit_label (loop_lab);
+ emit_cmp_and_jump_insns (size, GEN_INT (PROBE_INTERVAL), LTU,
+ NULL_RTX, Pmode, 1, end_lab);
+ last_size = expand_binop (Pmode, sub_optab, size, GEN_INT (PROBE_INTERVAL), size,
+ 1, OPTAB_WIDEN);
+ gcc_assert (last_size == size);
+ size = GEN_INT (PROBE_INTERVAL);
+ }
+
+again:
/* Perform the required allocation from the stack. Some systems do
this differently than simply incrementing/decrementing from the
stack pointer, such as acquiring the space by calling malloc(). */
@@ -1499,6 +1527,15 @@ allocate_dynamic_stack_space (rtx size,
if (STACK_GROWS_DOWNWARD)
emit_move_insn (target, virtual_stack_dynamic_rtx);
}
+ if (flag_stack_clash_protection && probe_pass == 0)
+ {
+ probe_pass = 1;
+ emit_stack_probe (target);
+ emit_jump (loop_lab);
+ emit_label (end_lab);
+ size = last_size;
+ goto again;
+ }
suppress_reg_args_size = false;
@@ -1510,6 +1547,17 @@ allocate_dynamic_stack_space (rtx size,
emit_label (final_label);
target = final_target;
}
+ if (flag_stack_clash_protection)
+ {
+ rtx seq;
+ emit_stack_probe (target);
+ emit_label (skip_lab);
+ start_sequence ();
+ emit_move_insn (target, virtual_stack_dynamic_rtx);
+ seq = get_insns ();
+ end_sequence ();
+ emit_insn_after (seq, before_skip);
+ }
target = align_dynamic_address (target, required_align);
@@ -1593,8 +1641,6 @@ emit_stack_probe (rtx address)
the current stack pointer. STACK_GROWS_DOWNWARD says whether to add
or subtract them from the stack pointer. */
-#define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP)
-
#if STACK_GROWS_DOWNWARD
#define STACK_GROW_OP MINUS
#define STACK_GROW_OPTAB sub_optab