On Fri, Apr 11, 2025 at 12:26 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Richard Sandiford <richard.sandif...@arm.com> writes:
> > Richard Biener <richard.guent...@gmail.com> writes:
> >> On Fri, Apr 11, 2025 at 11:10 AM Richard Sandiford
> >> <richard.sandif...@arm.com> wrote:
> >>>
> >>> Richard Biener <richard.guent...@gmail.com> writes:
> >>> > On Thu, Apr 10, 2025 at 10:10 PM Richard Sandiford
> >>> > <richard.sandif...@arm.com> wrote:
> >>> >>
> >>> >> PR119610 is about incorrect CFI output for a stack probe when that
> >>> >> probe is not the initial allocation.  The main aarch64 stack probe
> >>> >> function, aarch64_allocate_and_probe_stack_space, implicitly assumed
> >>> >> that the incoming stack pointer pointed to the top of the frame,
> >>> >> and thus held the CFA.
> >>> >>
> >>> >> aarch64_save_callee_saves and aarch64_restore_callee_saves use a
> >>> >> parameter called bytes_below_sp to track how far the stack pointer
> >>> >> is above the base of the static frame.  This patch does the same
> >>> >> thing for aarch64_allocate_and_probe_stack_space.
> >>> >>
> >>> >> Also, I noticed that the SVE path was attaching the first CFA note
> >>> >> to the wrong instruction: it was attaching the note to the calculation
> >>> >> of the stack size, rather than to the r11<-sp copy.
> >>> >>
> >>> >> Bootstrapped & regression-tested on aarch64-linux-gnu.  I'll push on
> >>> >> Monday if there are no comments before then.  I'd appreciate a second
> >>> >> pair of eyes though, since this is a sensitive area.
> >>> >
> >>> > Do you happen to know if the backports to older branches you provided 
> >>> > for
> >>> > the change that triggered this issue (in particular to GCC 7) are also 
> >>> > affected?
> >>>
> >>> GCC 7 and GCC 8 should be ok.  The bug relies on stack protection being
> >>> enabled (-fstack-protector-strong for the testcase in the PR, but just
> >>> -fstack-protector for others) and that was added in GCC 9.
> >>
> >> Hmm, I see -fstack-protector[-strong] working with GCC 7 on aarch64.
> >
> > I think it's just being silently accepted as a nop.  At least:
> >
> >> Specifically
> >> the testcase produces for foo():
> >>
> >>         .text
> >>         .align  2
> >>         .global _Z3foov
> >>         .type   _Z3foov, %function
> >> _Z3foov:
> >> .LFB0:
> >>         .cfi_startproc
> >>         stp     x29, x30, [sp, -16]!
> >>         .cfi_def_cfa_offset 16
> >>         .cfi_offset 29, -16
> >>         .cfi_offset 30, -8
> >>         add     x29, sp, 0
> >>         .cfi_def_cfa_register 29
> >>         sub     sp, sp, #16
> >>         sub     sp, sp, #524288
> >
> > ...the function isn't probing the stack here.  It ought to be doing
> > something like the code that Alex quoted in the PR.  Compare the
> > GCC 8 and GCC 9 output in https://godbolt.org/z/cWbsG3eGb .
>
> Gah, Alex pointed out off-list that I'd once again got the options
> the wrong way around: -fstack-clash-protection is the thing that
> was added in GCC 9.
>
> I find these option names really confusing :-(

True.  Note we backported rough -fstack-clash-protection to GCC 7,
but middle-end only (attached for reference).

Richard.

>
> Richard
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 244266)
+++ gcc/common.opt	(working copy)
@@ -2295,6 +2299,10 @@ fstack-check
 Common Alias(fstack-check=, specific, no)
 Insert stack checking code into the program.  Same as -fstack-check=specific.
 
+fstack-clash-protection
+Common Report Var(flag_stack_clash_protection)
+Insert probes per page for dynamically allocated stack space
+
 fstack-limit
 Common Var(common_deferred_options) Defer
 
Index: gcc/explow.c
===================================================================
--- gcc/explow.c	(revision 244266)
+++ gcc/explow.c	(working copy)
@@ -1277,6 +1277,8 @@ get_dynamic_stack_size (rtx *psize, unsi
   *psize = size;
 }
 
+#define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP)
+
 /* Return an rtx representing the address of an area of memory dynamically
    pushed on the stack.
 
@@ -1305,6 +1307,8 @@ allocate_dynamic_stack_space (rtx size,
   HOST_WIDE_INT stack_usage_size = -1;
   rtx_code_label *final_label;
   rtx final_target, target;
+  rtx loop_lab, end_lab, skip_lab, last_size, before_skip;
+  int probe_pass = 0;
 
   /* If we're asking for zero bytes, it doesn't matter what we point
      to since we can't dereference it.  But return a reasonable
@@ -1440,6 +1444,30 @@ allocate_dynamic_stack_space (rtx size,
   /* Don't let anti_adjust_stack emit notes.  */
   suppress_reg_args_size = true;
 
+  if (flag_stack_clash_protection)
+    {
+#ifndef STACK_GROWS_DOWNWARD
+      sorry("-fstack-clash-protection is incompatible with upward growing stack");
+#endif
+      size = copy_to_mode_reg (Pmode, convert_to_mode (Pmode, size, 1));
+      loop_lab = gen_label_rtx ();
+      end_lab = gen_label_rtx ();
+      skip_lab = gen_label_rtx ();
+      /* We insert 'target = virtual_stack_dynamic_rtx' here, but target
+	 is changed later, so that insn can be constructed only later.  */
+      before_skip = get_last_insn ();
+      emit_cmp_and_jump_insns (size, CONST0_RTX (Pmode), EQ, NULL_RTX,
+			       Pmode, 1, skip_lab);
+      emit_label (loop_lab);
+      emit_cmp_and_jump_insns (size, GEN_INT (PROBE_INTERVAL), LTU,
+			       NULL_RTX, Pmode, 1, end_lab);
+      last_size = expand_binop (Pmode, sub_optab, size, GEN_INT (PROBE_INTERVAL), size,
+				1, OPTAB_WIDEN);
+      gcc_assert (last_size == size);
+      size = GEN_INT (PROBE_INTERVAL);
+    }
+
+again:
   /* Perform the required allocation from the stack.  Some systems do
      this differently than simply incrementing/decrementing from the
      stack pointer, such as acquiring the space by calling malloc().  */
@@ -1499,6 +1527,15 @@ allocate_dynamic_stack_space (rtx size,
       if (STACK_GROWS_DOWNWARD)
 	emit_move_insn (target, virtual_stack_dynamic_rtx);
     }
+  if (flag_stack_clash_protection && probe_pass == 0)
+    {
+      probe_pass = 1;
+      emit_stack_probe (target);
+      emit_jump (loop_lab);
+      emit_label (end_lab);
+      size = last_size;
+      goto again;
+    }
 
   suppress_reg_args_size = false;
 
@@ -1510,6 +1547,17 @@ allocate_dynamic_stack_space (rtx size,
       emit_label (final_label);
       target = final_target;
     }
+  if (flag_stack_clash_protection)
+    {
+      rtx seq;
+      emit_stack_probe (target);
+      emit_label (skip_lab);
+      start_sequence ();
+      emit_move_insn (target, virtual_stack_dynamic_rtx);
+      seq = get_insns ();
+      end_sequence ();
+      emit_insn_after (seq, before_skip);
+    }
 
   target = align_dynamic_address (target, required_align);
 
@@ -1593,8 +1641,6 @@ emit_stack_probe (rtx address)
    the current stack pointer.  STACK_GROWS_DOWNWARD says whether to add
    or subtract them from the stack pointer.  */
 
-#define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP)
-
 #if STACK_GROWS_DOWNWARD
 #define STACK_GROW_OP MINUS
 #define STACK_GROW_OPTAB sub_optab

Reply via email to