Ping (with Szabolcs remarks fixed).
On 07/02/2018 16:07, Adhemerval Zanella wrote:
> Changes from previous version:
>
> - Changed the wait to call __morestack to use use a branch with link
> instead of a simple branch. This allows use a call instruction and
> avoid possible issues with later optimization passes which might
> see a branch outside the instruction block (as noticed in previous
> iterations while building a more complex workload as speccpu2006).
>
> - Change the return address to use the branch with link value and
> set x12 to save x30. This simplifies the required instructions
> to setup/save the return address.
>
> --
>
> This patch adds the split-stack support on aarch64 (PR #67877). As for
> other ports this patch should be used along with glibc and gold support.
>
> The support is done similar to other architectures: a split-stack field
> is allocated before TCB by glibc, a target-specific __morestack implementation
> and helper functions are added in libgcc and compiler supported in adjusted
> (split-stack prologue, va_start for argument handling). I also plan to
> send the gold support to adjust stack allocation acrosss split-stack
> and default code calls.
>
> Current approach is to set the final stack adjustments using a 2 instructions
> at most (mov/movk) which limits stack allocation to upper limit of 4GB.
> The morestack call is non standard with x10 hollding the requested stack
> pointer, x11 the argument pointer (if required), and x12 to return
> continuation address. Unwinding is handled by a personality routine that
> knows how to find stack segments.
>
> Split-stack prologue on function entry is as follow (this goes before the
> usual function prologue):
>
> function:
> mrs x9, tpidr_el0
> ldur x9, [x9, -8]
> mov x10, <required stack allocation>
> movk x10, #0x0, lsl #16
> sub x10, sp, x10
> mov x11, sp # if function has stacked arguments
> cmp x9, x10
> bcc .LX
> main_fn_entry:
> [function prologue]
> LX:
> bl __morestack
> b main_fn_entry
>
> Notes:
>
> 1. Even if a function does not allocate a stack frame, a split-stack prologue
> is created. It is to avoid issues with tail call for external symbols
> which might require linker adjustment (libgo/runtime/go-varargs.c).
>
> 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldur
> to after the required stack calculation.
>
> 3. Similar to powerpc, When the linker detects a call from split-stack to
> non-split-stack code, it adds 16k (or more) to the value found in
> "allocate"
> instructions (so non-split-stack code gets a larger stack). The amount is
> tunable by a linker option. This feature is only implemented in the GNU
> gold linker.
>
> 4. AArch64 does not handle >4G stack initially and although it is possible
> to implement it, limiting to 4G allows to materize the allocation with
> only 2 instructions (mov + movk) and thus simplifying the linker
> adjustments required. Supporting multiple threads each requiring more
> than 4G of stack is probably not that important, and likely to OOM at
> run time.
>
> 5. The TCB support on GLIBC is meant to be included in version 2.28.
>
> 6. Besides a regression tests I also checked with a SPECcpu2006 run with
> -fsplit-stack additional option. I saw no regression besides 416.gamess
> which fails on trunk as well (not sure if some misconfiguration in my
> environment).
>
> libgcc/ChangeLog:
>
> * libgcc/config.host: Use t-stack and t-statck-aarch64 for
> aarch64*-*-linux.
> * libgcc/config/aarch64/morestack-c.c: New file.
> * libgcc/config/aarch64/morestack.S: Likewise.
> * libgcc/config/aarch64/t-stack-aarch64: Likewise.
> * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific
> code.
>
> gcc/ChangeLog:
>
> * common/config/aarch64/aarch64-common.c
> (aarch64_supports_split_stack): New function.
> (TARGET_SUPPORTS_SPLIT_STACK): New macro.
> * gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove
> macro.
> * gcc/config/aarch64/aarch64-protos.h: Add
> aarch64_expand_split_stack_prologue and
> aarch64_split_stack_space_check.
> * gcc/config/aarch64/aarch64.c (aarch64_expand_builtin_va_start): Use
> internal argument pointer instead of virtual_incoming_args_rtx.
> (morestack_ref): New symbol.
> (aarch64_load_split_stack_value): New function.
> (aarch64_expand_split_stack_prologue): Likewise.
> (aarch64_internal_arg_pointer): Likewise.
> (aarch64_file_end): Emit the split-stack note sections.
> (aarch64_split_stack_space_check): Likewise.
> (TARGET_ASM_FILE_END): New macro.
> (TARGET_INTERNAL_ARG_POINTER): Likewise.
> * gcc/config/aarch64/aarch64.h (aarch64_frame): Add
> split_stack_arg_pointer to setup the argument pointer when using
> split-stack.
> * gcc/config/aarch64/aarch64.md
> (UNSPECV_STACK_CHECK): New define.
> (split_stack_prologue): New expand.
> (split_stack_space_check): Likewise.
> ---
> gcc/common/config/aarch64/aarch64-common.c | 28 +++-
> gcc/config/aarch64/aarch64-linux.h | 2 -
> gcc/config/aarch64/aarch64-protos.h | 2 +
> gcc/config/aarch64/aarch64.c | 182 ++++++++++++++++++++-
> gcc/config/aarch64/aarch64.h | 3 +
> gcc/config/aarch64/aarch64.md | 29 ++++
> libgcc/config.host | 1 +
> libgcc/config/aarch64/morestack-c.c | 87 ++++++++++
> libgcc/config/aarch64/morestack.S | 254
> +++++++++++++++++++++++++++++
> libgcc/config/aarch64/t-stack-aarch64 | 3 +
> libgcc/generic-morestack.c | 1 +
> 11 files changed, 588 insertions(+), 4 deletions(-)
> create mode 100644 libgcc/config/aarch64/morestack-c.c
> create mode 100644 libgcc/config/aarch64/morestack.S
> create mode 100644 libgcc/config/aarch64/t-stack-aarch64
>
> diff --git a/gcc/common/config/aarch64/aarch64-common.c
> b/gcc/common/config/aarch64/aarch64-common.c
> index 71d3953..cf17e2f 100644
> --- a/gcc/common/config/aarch64/aarch64-common.c
> +++ b/gcc/common/config/aarch64/aarch64-common.c
> @@ -107,6 +107,33 @@ aarch64_handle_option (struct gcc_options *opts,
> }
> }
>
> +/* -fsplit-stack uses a TCB field available on glibc-2.27. GLIBC also
> + exports symbol, __tcb_private_ss, to signal it has the field available
> + on TCB bloc. This aims to prevent binaries linked against newer
> + GLIBC to run on non-supported ones. */
> +
> +static bool
> +aarch64_supports_split_stack (bool report ATTRIBUTE_UNUSED,
> + struct gcc_options *opts ATTRIBUTE_UNUSED)
> +{
> +#ifndef TARGET_GLIBC_MAJOR
> +#define TARGET_GLIBC_MAJOR 0
> +#endif
> +#ifndef TARGET_GLIBC_MINOR
> +#define TARGET_GLIBC_MINOR 0
> +#endif
> + /* Note: Can't test DEFAULT_ABI here, it isn't set until later. */
> + if (TARGET_GLIBC_MAJOR * 1000 + TARGET_GLIBC_MINOR >= 2026)
> + return true;
> +
> + if (report)
> + error ("%<-fsplit-stack%> currently only supported on AArch64 GNU/Linux
> with glibc-2.27 or later");
> + return false;
> +}
> +
> +#undef TARGET_SUPPORTS_SPLIT_STACK
> +#define TARGET_SUPPORTS_SPLIT_STACK aarch64_supports_split_stack
> +
> struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
>
> /* An ISA extension in the co-processor and main instruction set space. */
> @@ -340,4 +367,3 @@ aarch64_rewrite_mcpu (int argc, const char **argv)
> }
>
> #undef AARCH64_CPU_NAME_LENGTH
> -
> diff --git a/gcc/config/aarch64/aarch64-linux.h
> b/gcc/config/aarch64/aarch64-linux.h
> index bf1327e..1189bfe 100644
> --- a/gcc/config/aarch64/aarch64-linux.h
> +++ b/gcc/config/aarch64/aarch64-linux.h
> @@ -81,8 +81,6 @@
> } \
> while (0)
>
> -#define TARGET_ASM_FILE_END file_end_indicate_exec_stack
> -
> /* Uninitialized common symbols in non-PIE executables, even with
> strong definitions in dependent shared libraries, will resolve
> to COPY relocated symbol in the executable. See PR65780. */
> diff --git a/gcc/config/aarch64/aarch64-protos.h
> b/gcc/config/aarch64/aarch64-protos.h
> index cda2895..20fe10e 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -450,6 +450,8 @@ void aarch64_expand_sve_mem_move (rtx, rtx, machine_mode);
> bool aarch64_maybe_expand_sve_subreg_move (rtx, rtx);
> void aarch64_split_sve_subreg_move (rtx, rtx, rtx);
> void aarch64_expand_prologue (void);
> +void aarch64_expand_split_stack_prologue (void);
> +void aarch64_split_stack_space_check (rtx, rtx);
> void aarch64_expand_vector_init (rtx, rtx);
> void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
> const_tree, unsigned);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 7c9c6e5..c653755 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -71,6 +71,7 @@
> #include "selftest.h"
> #include "selftest-rtl.h"
> #include "rtx-vector-builder.h"
> +#include "except.h"
>
> /* This file should be included last. */
> #include "target-def.h"
> @@ -12073,7 +12074,7 @@ aarch64_expand_builtin_va_start (tree valist, rtx
> nextarg ATTRIBUTE_UNUSED)
> /* Emit code to initialize STACK, which points to the next varargs stack
> argument. CUM->AAPCS_STACK_SIZE gives the number of stack words used
> by named arguments. STACK is 8-byte aligned. */
> - t = make_tree (TREE_TYPE (stack), virtual_incoming_args_rtx);
> + t = make_tree (TREE_TYPE (stack), crtl->args.internal_arg_pointer);
> if (cum->aapcs_stack_size > 0)
> t = fold_build_pointer_plus_hwi (t, cum->aapcs_stack_size *
> UNITS_PER_WORD);
> t = build2 (MODIFY_EXPR, TREE_TYPE (stack), stack, t);
> @@ -17351,6 +17352,179 @@ aarch64_select_early_remat_modes (sbitmap modes)
> }
> }
>
> +/* -fsplit-stack support. */
> +
> +/* A SYMBOL_REF for __morestack. */
> +static GTY(()) rtx morestack_ref;
> +
> +/* Load split-stack area from thread pointer position. The split-stack is
> + allocate just before thread pointer. */
> +
> +static rtx
> +aarch64_load_split_stack_value (bool use_hard_reg)
> +{
> + /* Offset from thread pointer to split-stack area. */
> + const int psso = -8;
> +
> + rtx ssvalue = use_hard_reg
> + ? gen_rtx_REG (Pmode, R9_REGNUM) : gen_reg_rtx (Pmode);
> + ssvalue = aarch64_load_tp (ssvalue);
> + rtx mem = gen_rtx_MEM (Pmode, plus_constant (Pmode, ssvalue, psso));
> + emit_move_insn (ssvalue, mem);
> + return ssvalue;
> +}
> +
> +/* Emit -fsplit-stack prologue, which goes before the regular function
> + prologue. */
> +
> +void
> +aarch64_expand_split_stack_prologue (void)
> +{
> + rtx ssvalue, reg10, reg11, reg12, cc, jump;
> + HOST_WIDE_INT allocate;
> + rtx_code_label *ok_label;
> + rtx_insn *insn;
> +
> + gcc_assert (flag_split_stack && reload_completed);
> +
> + /* It limits total maximum stack allocation on 4G so its value can be
> + materialized using two instructions at most (movn/movk). It might be
> + used by the linker to add some extra space for split calling non split
> + stack functions. */
> + allocate = constant_lower_bound (cfun->machine->frame.frame_size);
> + if (allocate > ((int64_t)1 << 32))
> + {
> + sorry ("Stack frame larger than 4G is not supported for
> -fsplit-stack");
> + return;
> + }
> +
> + if (morestack_ref == NULL_RTX)
> + {
> + morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");
> + SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL
> + | SYMBOL_FLAG_FUNCTION);
> + }
> +
> + ssvalue = aarch64_load_split_stack_value (true);
> +
> + /* Always emit two insns to calculate the requested stack, so the linker
> + can edit them when adjusting size for calling non-split-stack code. */
> + reg10 = gen_rtx_REG (Pmode, R10_REGNUM);
> + emit_insn (gen_rtx_SET (reg10, GEN_INT (allocate & 0xffff)));
> + emit_insn (gen_insv_immdi (reg10, GEN_INT (16),
> + GEN_INT ((allocate & 0xffff0000) >> 16)));
> + emit_insn (gen_sub3_insn (reg10, stack_pointer_rtx, reg10));
> +
> + ok_label = gen_label_rtx ();
> +
> + /* If function uses stacked arguments save the old stack value so morestack
> + can return it. */
> + reg11 = gen_rtx_REG (Pmode, R11_REGNUM);
> + if (maybe_gt(crtl->args.size, 0)
> + || maybe_gt(cfun->machine->frame.saved_varargs_size, 0))
> + emit_move_insn (reg11, stack_pointer_rtx);
> +
> + /* x12 holds the function entry x30 which will be restored by morestack.
> */
> + reg12 = gen_rtx_REG (Pmode, R12_REGNUM);
> + emit_move_insn (reg12, gen_rtx_REG (Pmode, R30_REGNUM));
> +
> + ok_label = gen_label_rtx ();
> + cc = aarch64_gen_compare_reg (GEU, reg10, ssvalue);
> + jump = gen_rtx_IF_THEN_ELSE (VOIDmode,
> + gen_rtx_fmt_ee (GEU, VOIDmode, cc, const0_rtx),
> + gen_rtx_LABEL_REF (VOIDmode, ok_label),
> + pc_rtx);
> + insn = emit_jump_insn (gen_rtx_SET (pc_rtx, jump));
> + JUMP_LABEL (insn) = ok_label;
> + /* Mark the jump as very likely to be taken. */
> + add_reg_br_prob_note (insn, profile_probability::very_likely ());
> +
> + insn = emit_call_insn (gen_call (gen_rtx_MEM (Pmode, morestack_ref),
> + const0_rtx, const0_rtx));
> +
> + rtx call_fusage = NULL_RTX;
> + use_reg (&call_fusage, reg10);
> + use_reg (&call_fusage, reg11);
> + use_reg (&call_fusage, reg12);
> + add_function_usage_to (insn, call_fusage);
> + /* Indicate that this function can't jump to non-local gotos. */
> + make_reg_eh_region_note_nothrow_nononlocal (insn);
> +
> + emit_label (ok_label);
> + LABEL_NUSES (ok_label)++;
> +}
> +
> +/* Implement TARGET_ASM_FILE_END. */
> +
> +static void
> +aarch64_file_end (void)
> +{
> + file_end_indicate_exec_stack ();
> +
> + if (flag_split_stack)
> + {
> + file_end_indicate_split_stack ();
> +
> + switch_to_section (data_section);
> + fprintf (asm_out_file, "\t.align 3\n");
> + fprintf (asm_out_file, "\t.quad __libc_tcb_private_ss\n");
> + }
> +}
> +
> +/* Return the internal arg pointer used for function incoming arguments. */
> +
> +static rtx
> +aarch64_internal_arg_pointer (void)
> +{
> + if (flag_split_stack
> + && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl))
> + == NULL))
> + {
> + if (cfun->machine->frame.split_stack_arg_pointer == NULL_RTX)
> + {
> + rtx pat;
> +
> + cfun->machine->frame.split_stack_arg_pointer = gen_reg_rtx (Pmode);
> + REG_POINTER (cfun->machine->frame.split_stack_arg_pointer) = 1;
> +
> + /* Put the pseudo initialization right after the note at the
> + beginning of the function. */
> + pat = gen_rtx_SET (cfun->machine->frame.split_stack_arg_pointer,
> + gen_rtx_REG (Pmode, R11_REGNUM));
> + push_topmost_sequence ();
> + emit_insn_after (pat, get_insns ());
> + pop_topmost_sequence ();
> + }
> + return plus_constant (Pmode,
> cfun->machine->frame.split_stack_arg_pointer,
> + FIRST_PARM_OFFSET (current_function_decl));
> + }
> + return virtual_incoming_args_rtx;
> +}
> +
> +/* Emit -fsplit-stack dynamic stack allocation space check. */
> +
> +void
> +aarch64_split_stack_space_check (rtx size, rtx label)
> +{
> + rtx ssvalue, cc, cmp, jump, temp;
> + rtx requested = gen_reg_rtx (Pmode);
> +
> + /* Load __private_ss from TCB. */
> + ssvalue = aarch64_load_split_stack_value (false);
> +
> + temp = gen_reg_rtx (Pmode);
> +
> + /* And compare it with frame pointer plus required stack. */
> + size = force_reg (Pmode, size);
> + emit_move_insn (requested, gen_rtx_MINUS (Pmode, stack_pointer_rtx, size));
> +
> + /* Jump to label call if current ss guard is not suffice. */
> + cc = aarch64_gen_compare_reg (GE, temp, ssvalue);
> + cmp = gen_rtx_fmt_ee (GEU, VOIDmode, cc, const0_rtx);
> + jump = emit_jump_insn (gen_condjump (cmp, cc, label));
> + JUMP_LABEL (jump) = label;
> +}
> +
> /* Target-specific selftests. */
>
> #if CHECKING_P
> @@ -17423,6 +17597,9 @@ aarch64_run_selftests (void)
> #undef TARGET_ASM_FILE_START
> #define TARGET_ASM_FILE_START aarch64_start_file
>
> +#undef TARGET_ASM_FILE_END
> +#define TARGET_ASM_FILE_END aarch64_file_end
> +
> #undef TARGET_ASM_OUTPUT_MI_THUNK
> #define TARGET_ASM_OUTPUT_MI_THUNK aarch64_output_mi_thunk
>
> @@ -17513,6 +17690,9 @@ aarch64_run_selftests (void)
> #undef TARGET_FUNCTION_VALUE_REGNO_P
> #define TARGET_FUNCTION_VALUE_REGNO_P aarch64_function_value_regno_p
>
> +#undef TARGET_INTERNAL_ARG_POINTER
> +#define TARGET_INTERNAL_ARG_POINTER aarch64_internal_arg_pointer
> +
> #undef TARGET_GIMPLE_FOLD_BUILTIN
> #define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
>
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index e3c52f6..20ef441 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -675,6 +675,9 @@ struct GTY (()) aarch64_frame
> unsigned wb_candidate2;
>
> bool laid_out;
> +
> + /* Alternative internal arg pointer for -fsplit-stack. */
> + rtx split_stack_arg_pointer;
> };
>
> typedef struct GTY (()) machine_function
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 5a2a930..3104ed4 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -169,6 +169,7 @@
> UNSPEC_CLASTB
> UNSPEC_FADDA
> UNSPEC_REV_SUBREG
> + UNSPEC_STACK_CHECK
> ])
>
> (define_c_enum "unspecv" [
> @@ -6010,6 +6011,34 @@
> (match_operand 1))
> (clobber (reg:CC CC_REGNUM))])])
>
> +;; Handle -fsplit-stack
> +(define_expand "split_stack_prologue"
> + [(const_int 0)]
> + ""
> +{
> + aarch64_expand_split_stack_prologue ();
> + DONE;
> +})
> +
> +;; If there are operand 0 bytes available on the stack, jump to
> +;; operand 1.
> +(define_expand "split_stack_space_check"
> + [(set (match_dup 2)
> + (unspec [(const_int 0)] UNSPEC_STACK_CHECK))
> + (set (match_dup 3)
> + (minus (reg SP_REGNUM)
> + (match_operand 0)))
> + (set (match_dup 4) (compare:CC (match_dup 3) (match_dup 2)))
> + (set (pc) (if_then_else
> + (geu (match_dup 4) (const_int 0))
> + (label_ref (match_operand 1))
> + (pc)))]
> + ""
> +{
> + aarch64_split_stack_space_check (operands[0], operands[1]);
> + DONE;
> +})
> +
> ;; AdvSIMD Stuff
> (include "aarch64-simd.md")
>
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 96d55a4..d6a2d15 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -355,6 +355,7 @@ aarch64*-*-linux*)
> md_unwind_header=aarch64/linux-unwind.h
> tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
> + tmake_file="${tmake_file} t-stack aarch64/t-stack-aarch64"
> ;;
> alpha*-*-linux*)
> tmake_file="${tmake_file} alpha/t-alpha alpha/t-ieee t-crtfm
> alpha/t-linux"
> diff --git a/libgcc/config/aarch64/morestack-c.c
> b/libgcc/config/aarch64/morestack-c.c
> new file mode 100644
> index 0000000..8de531f
> --- /dev/null
> +++ b/libgcc/config/aarch64/morestack-c.c
> @@ -0,0 +1,87 @@
> +/* AArch64 support for -fsplit-stack.
> + * Copyright (C) 2018 Free Software Foundation, Inc.
> + *
> + * This file is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; either version 3, or (at your option) any
> + * later version.
> + *
> + * This file is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * Under Section 7 of GPL version 3, you are granted additional
> + * permissions described in the GCC Runtime Library Exception, version
> + * 3.1, as published by the Free Software Foundation.
> + *
> + * You should have received a copy of the GNU General Public License and
> + * a copy of the GCC Runtime Library Exception along with this program;
> + * see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
> + * <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef inhibit_libc
> +
> +#include <stdint.h>
> +#include <stdlib.h>
> +#include <stddef.h>
> +#include "generic-morestack.h"
> +
> +#define INITIAL_STACK_SIZE 0x4000
> +#define BACKOFF 0x1000
> +
> +void __generic_morestack_set_initial_sp (void *sp, size_t len);
> +void *__morestack_get_guard (void);
> +void __morestack_set_guard (void *);
> +void *__morestack_make_guard (void *stack, size_t size);
> +void __morestack_load_mmap (void);
> +
> +/* split-stack area position from thread pointer. */
> +static inline void *
> +ss_pointer (void)
> +{
> +#define SS_OFFSET (-8)
> + return (void*) ((uintptr_t) __builtin_thread_pointer() + SS_OFFSET);
> +}
> +
> +/* Initialize the stack guard when the program starts or when a new
> + thread. This is called from a constructor using ctors section. */
> +void
> +__stack_split_initialize (void)
> +{
> + register uintptr_t* sp __asm__ ("sp");
> + uintptr_t *ss = ss_pointer ();
> + *ss = (uintptr_t)sp - INITIAL_STACK_SIZE;
> + __generic_morestack_set_initial_sp (sp, INITIAL_STACK_SIZE);
> +}
> +
> +/* Return current __private_ss. */
> +void *
> +__morestack_get_guard (void)
> +{
> + void **ss = ss_pointer ();
> + return *ss;
> +}
> +
> +/* Set __private_ss to ptr. */
> +void
> +__morestack_set_guard (void *ptr)
> +{
> + void **ss = ss_pointer ();
> + *ss = ptr;
> +}
> +
> +/* Return the stack guard value for given stack. */
> +void *
> +__morestack_make_guard (void *stack, size_t size)
> +{
> + return (void*)((uintptr_t) stack - size + BACKOFF);
> +}
> +
> +/* Make __stack_split_initialize a high priority constructor. */
> +static void (*const ctors [])
> + __attribute__ ((used, section (".ctors.65535"), aligned (sizeof (void *))))
> + = { __stack_split_initialize, __morestack_load_mmap };
> +
> +#endif /* !defined (inhibit_libc) */
> diff --git a/libgcc/config/aarch64/morestack.S
> b/libgcc/config/aarch64/morestack.S
> new file mode 100644
> index 0000000..59a6391
> --- /dev/null
> +++ b/libgcc/config/aarch64/morestack.S
> @@ -0,0 +1,254 @@
> +# AArch64 support for -fsplit-stack.
> +# Copyright (C) 2018 Free Software Foundation, Inc.
> +
> +# This file is part of GCC.
> +
> +# GCC is free software; you can redistribute it and/or modify it under
> +# the terms of the GNU General Public License as published by the Free
> +# Software Foundation; either version 3, or (at your option) any later
> +# version.
> +
> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> +# for more details.
> +
> +# Under Section 7 of GPL version 3, you are granted additional
> +# permissions described in the GCC Runtime Library Exception, version
> +# 3.1, as published by the Free Software Foundation.
> +
> +# You should have received a copy of the GNU General Public License and
> +# a copy of the GCC Runtime Library Exception along with this program;
> +# see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
> +# <http://www.gnu.org/licenses/>.
> +
> +/* Define an entry point visible from C. */
> +#define ENTRY(name) \
> + .globl name; \
> + .type name,%function; \
> + .align 4; \
> + name##:
> +
> +#define END(name) \
> + .size name,.-name
> +
> +/* __morestack frame size. */
> +#define MORESTACK_FRAMESIZE 112
> +/* Offset from __morestack frame where the new stack size is saved and
> + passed to __generic_morestack. */
> +#define NEWSTACK_SAVE 96
> +
> +# Excess space needed to call ld.so resolver for lazy plt resolution.
> +# Go uses sigaltstack so this doesn't need to also cover signal frame size.
> +#define BACKOFF 0x1000
> +# Large excess allocated when calling non-split-stack code.
> +#define NON_SPLIT_STACK 0x100000
> +
> +/* split-stack area position from thread pointer. */
> +#define SPLITSTACK_PTR_TP -8
> +
> + .text
> +ENTRY(__morestack_non_split)
> + .cfi_startproc
> +# We use a cleanup to restore the TCB split stack field if an exception is
> +# through this code.
> + sub x10, x10, NON_SPLIT_STACK
> + .cfi_endproc
> +END(__morestack_non_split)
> +# Fall through into __morestack
> +
> +# This function is called with non-standard calling convention: on entry
> +# x10 is the requested stack pointer, x11 is previous stack pointer (if
> +# functions has stacked arguments which needs to be restored), and x12 is
> +# the caller link register on function entry (which will be restored by
> +# morestack when returning to caller). The split-stack prologue is in
> +# the form:
> +#
> +# function:
> +# mrs x9, tpidr_el0
> +# ldur x9, [x9, #-8]
> +# mov x10, <required stack allocation>
> +# movk x10, #0x0, lsl #16
> +# sub x10, sp, x10
> +# mov x11, sp # if function has stacked arguments
> +# mov x12, x30
> +# cmp x9, x10
> +# bcc .LX
> +# main_fn_entry:
> +# [function body]
> +# LX:
> +# bl __morestack
> +# b main_fn_entry
> +#
> +# The N bit is also restored to indicate that the function is called
> +# (so the prologue addition can set up the argument pointer correctly).
> +
> +ENTRY(__morestack)
> +.LFB1:
> + .cfi_startproc
> +
> +#ifdef __PIC__
> + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0
> + .cfi_lsda 0x1b,.LLSDA1
> +#else
> + .cfi_personality 0x3,__gcc_personality_v0
> + .cfi_lsda 0x3,.LLSDA1
> +#endif
> + # Calculate requested stack size.
> + sub x10, sp, x10
> +
> + # Save parameters
> + stp x29, x12, [sp, -MORESTACK_FRAMESIZE]!
> + .cfi_def_cfa_offset MORESTACK_FRAMESIZE
> + .cfi_offset 29, -MORESTACK_FRAMESIZE
> + .cfi_offset 30, -MORESTACK_FRAMESIZE+8
> + add x29, sp, 0
> + .cfi_def_cfa_register 29
> + # Adjust the requested stack size for the frame pointer save.
> + stp x0, x1, [x29, 16]
> + stp x2, x3, [x29, 32]
> + add x10, x10, BACKOFF
> + stp x4, x5, [x29, 48]
> + stp x6, x7, [x29, 64]
> + stp x8, x30, [x29, 80]
> + str x10, [x29, 96]
> +
> + # void __morestack_block_signals (void)
> + bl __morestack_block_signals
> +
> + # void *__generic_morestack (size_t *pframe_size,
> + # void *old_stack,
> + # size_t param_size)
> + # pframe_size: is the size of the required stack frame (the function
> + # amount of space remaining on the allocated stack).
> + # old_stack: points at the parameters the old stack
> + # param_size: size in bytes of parameters to copy to the new stack.
> + add x0, x29, NEWSTACK_SAVE
> + add x1, x29, MORESTACK_FRAMESIZE
> + mov x2, 0
> + bl __generic_morestack
> +
> + # Start using new stack
> + mov sp, x0
> +
> + # Set __private_ss stack guard for the new stack.
> + ldr x9, [x29, NEWSTACK_SAVE]
> + add x0, x0, BACKOFF
> + sub x0, x0, x9
> +.LEHB0:
> + mrs x1, tpidr_el0
> + str x0, [x1, SPLITSTACK_PTR_TP]
> +
> + # void __morestack_unblock_signals (void)
> + bl __morestack_unblock_signals
> +
> + # Set up for a call to the target function.
> + ldp x0, x1, [x29, 16]
> + ldp x2, x3, [x29, 32]
> + ldp x4, x5, [x29, 48]
> + ldp x6, x7, [x29, 64]
> + ldp x8, x12, [x29, 80]
> + add x11, x29, MORESTACK_FRAMESIZE
> + ldr x30, [x29, 8]
> + # Indicate __morestack was called.
> + cmp x12, 0
> + blr x12
> +
> + stp x0, x1, [x29, 16]
> + stp x2, x3, [x29, 32]
> + stp x4, x5, [x29, 48]
> + stp x6, x7, [x29, 64]
> +
> + bl __morestack_block_signals
> +
> + # void *__generic_releasestack (size_t *pavailable)
> + add x0, x29, NEWSTACK_SAVE
> + bl __generic_releasestack
> +
> + # Reset __private_ss stack guard to value for old stack
> + ldr x9, [x29, NEWSTACK_SAVE]
> + add x0, x0, BACKOFF
> + sub x0, x0, x9
> +
> + # Update TCB split stack field
> +.LEHE0:
> + mrs x1, tpidr_el0
> + str x0, [x1, SPLITSTACK_PTR_TP]
> +
> + bl __morestack_unblock_signals
> +
> + # Use old stack again.
> + add sp, x29, MORESTACK_FRAMESIZE
> +
> + ldp x0, x1, [x29, 16]
> + ldp x2, x3, [x29, 32]
> + ldp x4, x5, [x29, 48]
> + ldp x6, x7, [x29, 64]
> + ldp x29, x30, [x29]
> +
> + .cfi_remember_state
> + .cfi_restore 30
> + .cfi_restore 29
> + .cfi_def_cfa 31, 0
> +
> + ret
> +
> +# This is the cleanup code called by the stack unwinder when
> +# unwinding through code between .LEHB0 and .LEHE0 above.
> +cleanup:
> + .cfi_restore_state
> + # Reuse the new stack allocation to save/restore the
> + # exception header
> + str x0, [x29, NEWSTACK_SAVE]
> + # size_t __generic_findstack (void *stack)
> + add x0, x29, MORESTACK_FRAMESIZE
> + bl __generic_findstack
> + sub x0, x29, x0
> + add x0, x0, BACKOFF
> + # Restore split-stack guard value
> + mrs x1, tpidr_el0
> + str x0, [x1, SPLITSTACK_PTR_TP]
> + ldr x0, [x29, NEWSTACK_SAVE]
> + b _Unwind_Resume
> + .cfi_endproc
> +END(__morestack)
> +
> + .section .gcc_except_table,"a",@progbits
> + .align 4
> +.LLSDA1:
> + # @LPStart format (omit)
> + .byte 0xff
> + # @TType format (omit)
> + .byte 0xff
> + # Call-site format (uleb128)
> + .byte 0x1
> + # Call-site table length
> + .uleb128 .LLSDACSE1-.LLSDACSB1
> +.LLSDACSB1:
> + # region 0 start
> + .uleb128 .LEHB0-.LFB1
> + # length
> + .uleb128 .LEHE0-.LEHB0
> + # landing pad
> + .uleb128 cleanup-.LFB1
> + # no action (ie a cleanup)
> + .uleb128 0
> +.LLSDACSE1:
> +
> +
> + .global __gcc_personality_v0
> +#ifdef __PIC__
> + # Build a position independent reference to the personality function.
> + .hidden DW.ref.__gcc_personality_v0
> + .weak DW.ref.__gcc_personality_v0
> + .section
> .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat
> + .type DW.ref.__gcc_personality_v0, @object
> + .align 3
> +DW.ref.__gcc_personality_v0:
> + .size DW.ref.__gcc_personality_v0, 8
> + .quad __gcc_personality_v0
> +#endif
> +
> + .section .note.GNU-stack,"",@progbits
> + .section .note.GNU-split-stack,"",@progbits
> + .section .note.GNU-no-split-stack,"",@progbits
> diff --git a/libgcc/config/aarch64/t-stack-aarch64
> b/libgcc/config/aarch64/t-stack-aarch64
> new file mode 100644
> index 0000000..4babb4e
> --- /dev/null
> +++ b/libgcc/config/aarch64/t-stack-aarch64
> @@ -0,0 +1,3 @@
> +# Makefile fragment to support -fsplit-stack for aarch64.
> +LIB2ADD_ST += $(srcdir)/config/aarch64/morestack.S \
> + $(srcdir)/config/aarch64/morestack-c.c
> diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c
> index 80bfd7f..574f58d 100644
> --- a/libgcc/generic-morestack.c
> +++ b/libgcc/generic-morestack.c
> @@ -943,6 +943,7 @@ __splitstack_find (void *segment_arg, void *sp, size_t
> *len,
> nsp -= 2 * 160;
> #elif defined __s390__
> nsp -= 2 * 96;
> +#elif defined __aarch64__
> #else
> #error "unrecognized target"
> #endif
>