Re: [RFC][AArch64] Add support for system register based stack protector canary access
On Mon, 3 Dec 2018 at 10:55, Ramana Radhakrishnan wrote: > > For quite sometime the kernel guys, (more specifically Ard) have been > talking about using a system register (sp_el0) and an offset from that > for a canary based access. This patchset adds support for a new set of > command line options similar to how powerpc has done this. > > I don't intend to change the defaults in userland, we've discussed this > for user-land in the past and as far as glibc and userland is concerned > we stick to the options as currently existing. The system register > option is really for the kernel to use along with an offset as they > control their ABI and this is a decision for them to make. > > I did consider sticking this all under a mcmodel=kernel-small option but > thought that would be a bit too aggressive. There is very little error > checking I can do in terms of the system register being used and really > the assembler would barf quite quickly in case things go wrong. I've > managed to rebuild Ard's kernel tree with an additional patch that > I will send to him. I haven't managed to boot this kernel. > > There was an additional question asked about the performance > characteristics of this but it's a security feature and the kernel > doesn't have the luxury of a hidden symbol. Further since the kernel > uses sp_el0 for access everywhere and if they choose to use the same > register I don't think the performance characteristics would be too bad, > but that's a decision for the kernel folks to make when taking in the > feature into the kernel. > > I still need to add some tests and documentation in invoke.texi but > this is at the stage where it would be nice for some other folks > to look at this. > > The difference in code generated is as below. > > extern void bar (char *); > int foo (void) > { >char a[100]; >bar (&a); > } > > $GCC -O2 -fstack-protector-strong vs > -mstack-protector-guard-reg=sp_el0 -mstack-protector-guard=sysreg > -mstack-protector-guard-offset=1024 -fstack-protector-strong > > > --- tst.s 2018-12-03 09:46:21.174167443 + > +++ tst.s.1 2018-12-03 09:46:03.546257203 + > @@ -15,15 +15,14 @@ > mov x29, sp > str x19, [sp, 16] > .cfi_offset 19, -128 > - adrpx19, __stack_chk_guard > - add x19, x19, :lo12:__stack_chk_guard > - ldr x0, [x19] > - str x0, [sp, 136] > - mov x0,0 > + mrs x19, sp_el0 > add x0, sp, 32 > + ldr x1, [x19, 1024] > + str x1, [sp, 136] > + mov x1,0 > bl bar > ldr x0, [sp, 136] > - ldr x1, [x19] > + ldr x1, [x19, 1024] > eor x1, x0, x1 > cbnzx1, .L5 > > > > > I will be afk tomorrow and day after but this is to elicit some comments > and for Ard to try this out with his kernel patches. > Thanks Ramana. I managed to build and run a complete kernel (including modules) on a bare metal system, and everything works as expected. The only thing I'd like to confirm with you is the logic wrt the command line arguments, more specifically, if/when all 3 arguments have to appear, and whether they are permitted to appear if -fstack-protector is not set. This is relevant given that we invoke the compiler in 3 different ways: - at the configure stage, we invoke the compiler with some/all of these options to decide whether the feature is supported, but the actual offset is not known, but also irrelevant - we invoke the compiler to build the header file that actually gives us the offset to pass to later invocations - finally, all kernel objects are built with all 3 arguments passed on the command line It looks like your code permits -mstack-protector-guard-reg at any time, but only permits -mstack-protector-guard-offset if -mstack-protector-guard is set to sysreg (and thus set explicitly, since the default is global). Is that intentional? Can we expect this to remain like that?
Re: [PATCH] i386: Prefer PLT indirection for __fentry__ calls under -fPIC
On Sun, 6 Apr 2025 at 12:58, Alexander Monakov wrote: > > On Fri, 4 Apr 2025, Ard Biesheuvel wrote: > > > From: Ard Biesheuvel > > > > Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling > > __fentry__") updated the logic that emits mcount() / __fentry__() calls > > into function prologues when profiling is enabled, to avoid GOT-based > > indirect calls when a direct call would suffice. > > > > There are two problems with that change: > > - it relies on -mdirect-extern-access rather than -fno-plt to decide > > whether or not a direct [PLT based] call is appropriate; > > - for the PLT case, it falls through to x86_print_call_or_nop(), which > > does not emit the @PLT suffix, resulting in the wrong relocation to be > > used (R_X86_64_PC32 instead of R_X86_64_PLT32) > > > > Fix this by testing flag_plt instead of ix86_direct_extern_access, and > > updating x86_print_call_or_nop() to take flag_pic and flag_plt into > > account. This ensures that -mnop-mcount works as expected when emitting > > the PLT based profiling calls. > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386 > > > > Signed-off-by: Ard Biesheuvel > > > > gcc/ChangeLog: > > > >* config/i386/i386.cc (x86_function_profiler): Take > >ix86_direct_extern_access into account when generating calls > >to __fentry__() > > Wrong changelog (reused from the earlier patch?) > The entry should mention PR category/number. > Thanks, How should I generate this PR, and this changelog? > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -23154,6 +23154,8 @@ x86_print_call_or_nop (FILE *file, const char > > *target) > >if (flag_nop_mcount || !strcmp (target, "nop")) > > /* 5 byte nop: nopl 0(%[re]ax,%[re]ax,1) */ > > fprintf (file, "1:" ASM_BYTE "0x0f, 0x1f, 0x44, 0x00, 0x00\n"); > > + else if (!TARGET_PECOFF && flag_pic && flag_plt) > > +fprintf (file, "1:\tcall\t%s@PLT\n", target); > > I guess this is written this way because the caller is supposed to handle the > !flat_plt case, but then this should be an assert: > > else if (!TARGET_PECOFF && flag_pic) >{ > gcc_assert (flat_plt); > fprintf (file, "1:\tcall\t%s@PLT\n", target); >} > I'll update this. > Note that this patch touches only 64-bit codegen, -m32 will yield > call *mcount@GOT as before (the previous patch also touched only 64-bit case). > I will mention this in the commit log.
[PATCH v2] i386: Prefer PLT indirection for __fentry__ calls under -fPIC
From: Ard Biesheuvel Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling __fentry__") updated the logic that emits mcount() / __fentry__() calls into function prologues when profiling is enabled, to avoid GOT-based indirect calls when a direct call would suffice. There are two problems with that change: - it relies on -mdirect-extern-access rather than -fno-plt to decide whether or not a direct [PLT based] call is appropriate; - for the PLT case, it falls through to x86_print_call_or_nop(), which does not emit the @PLT suffix, resulting in the wrong relocation to be used (R_X86_64_PC32 instead of R_X86_64_PLT32) Fix this by testing flag_plt instead of ix86_direct_extern_access, and updating x86_print_call_or_nop() to take flag_pic and flag_plt into account. This ensures that -mnop-mcount works as expected when emitting the PLT based profiling calls. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386 Signed-off-by: Ard Biesheuvel gcc/ChangeLog: PR target/119386 * config/i386/i386.cc (x86_print_call_or_nop): Add @PLT suffix where appropriate. (x86_function_profiler): Fall through to x86_print_call_or_nop() for PIC codegen when flag_plt is set. gcc/testsuite/ChangeLog: PR target/119386 * gcc.target/i386/pr119386-1.c: New test. * gcc.target/i386/pr119386-2.c: New test. --- gcc/config/i386/i386.cc| 8 +++- gcc/testsuite/gcc.target/i386/pr119386-1.c | 11 +++ gcc/testsuite/gcc.target/i386/pr119386-2.c | 11 +++ 3 files changed, 29 insertions(+), 1 deletion(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index be5e27fc391..0b238c3dddc 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -23154,6 +23154,12 @@ x86_print_call_or_nop (FILE *file, const char *target) if (flag_nop_mcount || !strcmp (target, "nop")) /* 5 byte nop: nopl 0(%[re]ax,%[re]ax,1) */ fprintf (file, "1:" ASM_BYTE "0x0f, 0x1f, 0x44, 0x00, 0x00\n"); + else if (!TARGET_PECOFF && flag_pic) +{ + gcc_assert (flag_plt); + + fprintf (file, "1:\tcall\t%s@PLT\n", target); +} else fprintf (file, "1:\tcall\t%s\n", target); } @@ -23317,7 +23323,7 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) break; case CM_SMALL_PIC: case CM_MEDIUM_PIC: - if (!ix86_direct_extern_access) + if (!flag_plt) { if (ASSEMBLER_DIALECT == ASM_INTEL) fprintf (file, "1:\tcall\t[QWORD PTR %s@GOTPCREL[rip]]\n", diff --git a/gcc/testsuite/gcc.target/i386/pr119386-1.c b/gcc/testsuite/gcc.target/i386/pr119386-1.c new file mode 100644 index 000..7930fc6f28d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr119386-1.c @@ -0,0 +1,11 @@ +/* PR target/119386 */ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -fpic -pg" } */ +/* { dg-final { scan-assembler "mcount@PLT" } } */ + +int +main () +{ + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr119386-2.c b/gcc/testsuite/gcc.target/i386/pr119386-2.c new file mode 100644 index 000..6334b9b9072 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr119386-2.c @@ -0,0 +1,11 @@ +/* PR target/119386 */ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -fpic -fno-plt -pg" } */ +/* { dg-final { scan-assembler "mcount@GOTPCREL" } } */ + +int +main () +{ + return 0; +} -- 2.49.0.504.g3bcea36a83-goog
[PATCH v5 2/2] i386: Enable -mnop-mcount for -fpic with PLTs
From: Ard Biesheuvel -mnop-mcount can be trivially enabled for -fPIC codegen as long as PLTs are being used, given that the instruction encodings are identical, only the target may resolve differently depending on how the linker decides to incorporate the object file. So relax the option check, and add a test to ensure that 5-byte NOPs are emitted when -mnop-mcount is being used. Signed-off-by: Ard Biesheuvel gcc/ChangeLog: PR target/119386 * config/i386/i386-options.cc: Permit -mnop-mcount when using -fpic with PLTs. gcc/testsuite/ChangeLog: PR target/119386 * gcc.target/i386/pr119386-3.c: New test. --- gcc/config/i386/i386-options.cc| 4 ++-- gcc/testsuite/gcc.target/i386/pr119386-3.c | 10 ++ 2 files changed, 12 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr119386-3.c diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index a9fac011f3d..964449fa8cd 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -2828,8 +2828,8 @@ ix86_option_override_internal (bool main_args_p, if (flag_nop_mcount) error ("%<-mnop-mcount%> is not compatible with this target"); #endif - if (flag_nop_mcount && flag_pic) -error ("%<-mnop-mcount%> is not implemented for %<-fPIC%>"); + if (flag_nop_mcount && flag_pic && !flag_plt) +error ("%<-mnop-mcount%> is not implemented for %<-fno-plt%>"); /* Accept -msseregparm only if at least SSE support is enabled. */ if (TARGET_SSEREGPARM_P (opts->x_target_flags) diff --git a/gcc/testsuite/gcc.target/i386/pr119386-3.c b/gcc/testsuite/gcc.target/i386/pr119386-3.c new file mode 100644 index 000..287410b951a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr119386-3.c @@ -0,0 +1,10 @@ +/* PR target/119386 */ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fpic -pg -mnop-mcount" } */ +/* { dg-final { scan-assembler ".byte\[ \t\]+0x0f, 0x1f, 0x44, 0x00, 0x00" } } */ + +int +main () +{ + return 0; +} -- 2.49.0.504.g3bcea36a83-goog
[PATCH v5 1/2] i386: Prefer PLT indirection for __fentry__ calls under -fPIC
From: Ard Biesheuvel Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling __fentry__") updated the logic that emits mcount() / __fentry__() calls into function prologues when profiling is enabled, to avoid GOT-based indirect calls when a direct call would suffice. There are two problems with that change: - it relies on -mdirect-extern-access rather than -fno-plt to decide whether or not a direct [PLT based] call is appropriate; - for the PLT case, it falls through to x86_print_call_or_nop(), which does not emit the @PLT suffix, resulting in the wrong relocation to be used (R_X86_64_PC32 instead of R_X86_64_PLT32) Fix this by testing flag_plt instead of ix86_direct_extern_access, and updating x86_print_call_or_nop() to take flag_pic and flag_plt into account. This also ensures that -mnop-mcount works as expected when emitting the PLT based profiling calls. While at it, fix the 32-bit logic as well, and issue a PLT call unless PLTs are explicitly disabled. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386 Signed-off-by: Ard Biesheuvel gcc/ChangeLog: PR target/119386 * config/i386/i386.cc (x86_print_call_or_nop): Add @PLT suffix where appropriate. (x86_function_profiler): Fall through to x86_print_call_or_nop() for PIC codegen when flag_plt is set. gcc/testsuite/ChangeLog: PR target/119386 * gcc.target/i386/pr119386-1.c: New test. * gcc.target/i386/pr119386-2.c: New test. --- gcc/config/i386/i386.cc| 12 ++-- gcc/testsuite/gcc.target/i386/pr119386-1.c | 10 ++ gcc/testsuite/gcc.target/i386/pr119386-2.c | 12 3 files changed, 32 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr119386-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr119386-2.c diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 4f8380c4a58..20059b775b9 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -23158,6 +23158,12 @@ x86_print_call_or_nop (FILE *file, const char *target) if (flag_nop_mcount || !strcmp (target, "nop")) /* 5 byte nop: nopl 0(%[re]ax,%[re]ax,1) */ fprintf (file, "1:" ASM_BYTE "0x0f, 0x1f, 0x44, 0x00, 0x00\n"); + else if (!TARGET_PECOFF && flag_pic) +{ + gcc_assert (flag_plt); + + fprintf (file, "1:\tcall\t%s@PLT\n", target); +} else fprintf (file, "1:\tcall\t%s\n", target); } @@ -23321,7 +23327,7 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) break; case CM_SMALL_PIC: case CM_MEDIUM_PIC: - if (!ix86_direct_extern_access) + if (!flag_plt) { if (ASSEMBLER_DIALECT == ASM_INTEL) fprintf (file, "1:\tcall\t[QWORD PTR %s@GOTPCREL[rip]]\n", @@ -23352,7 +23358,9 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) "\tleal\t%sP%d@GOTOFF(%%ebx), %%" PROFILE_COUNT_REGISTER "\n", LPREFIX, labelno); #endif - if (ASSEMBLER_DIALECT == ASM_INTEL) + if (flag_plt) + x86_print_call_or_nop (file, mcount_name); + else if (ASSEMBLER_DIALECT == ASM_INTEL) fprintf (file, "1:\tcall\t[DWORD PTR %s@GOT[ebx]]\n", mcount_name); else fprintf (file, "1:\tcall\t*%s@GOT(%%ebx)\n", mcount_name); diff --git a/gcc/testsuite/gcc.target/i386/pr119386-1.c b/gcc/testsuite/gcc.target/i386/pr119386-1.c new file mode 100644 index 000..9a0dc64b5b9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr119386-1.c @@ -0,0 +1,10 @@ +/* PR target/119386 */ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fpic -pg" } */ +/* { dg-final { scan-assembler "call\[ \t\]+mcount@PLT" } } */ + +int +main () +{ + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr119386-2.c b/gcc/testsuite/gcc.target/i386/pr119386-2.c new file mode 100644 index 000..3ea978ecfdf --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr119386-2.c @@ -0,0 +1,12 @@ +/* PR target/119386 */ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fpic -fno-plt -pg" } */ +/* { dg-final { scan-assembler "call\[ \t\]+\\*mcount@GOTPCREL\\(" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "call\[ \t\]+\\*mcount@GOT\\(" { target ia32 } } } */ + + +int +main () +{ + return 0; +} -- 2.49.0.504.g3bcea36a83-goog
Re: [PATCH v3] i386: Prefer PLT indirection for __fentry__ calls under -fPIC
On Tue, 8 Apr 2025 at 15:33, H.J. Lu wrote: > > On Tue, Apr 8, 2025 at 3:46 AM Ard Biesheuvel wrote: > > > > From: Ard Biesheuvel > > > > Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling > > __fentry__") updated the logic that emits mcount() / __fentry__() calls > > into function prologues when profiling is enabled, to avoid GOT-based > > indirect calls when a direct call would suffice. > > > > There are two problems with that change: > > - it relies on -mdirect-extern-access rather than -fno-plt to decide > > whether or not a direct [PLT based] call is appropriate; > > - for the PLT case, it falls through to x86_print_call_or_nop(), which > > does not emit the @PLT suffix, resulting in the wrong relocation to be > > used (R_X86_64_PC32 instead of R_X86_64_PLT32) > > > > Fix this by testing flag_plt instead of ix86_direct_extern_access, and > > updating x86_print_call_or_nop() to take flag_pic and flag_plt into > > account. This also ensures that -mnop-mcount works as expected when > > emitting the PLT based profiling calls. > > > > Note that only 64-bit codegen is affected by this change or by the > > commit referenced above; -m32 will yield 'call *mcount@GOT()' as before. > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386 > > > > Signed-off-by: Ard Biesheuvel > > > > gcc/ChangeLog: > > > > PR target/119386 > > * config/i386/i386.cc (x86_print_call_or_nop): Add @PLT suffix > > where appropriate. > > (x86_function_profiler): Fall through to x86_print_call_or_nop() > > for PIC codegen when flag_plt is set. > > > > gcc/testsuite/ChangeLog: > > > > PR target/119386 > > * gcc.target/i386/pr119386-1.c: New test. > > * gcc.target/i386/pr119386-2.c: New test. > > --- > > gcc/config/i386/i386.cc| 8 +++- > > gcc/testsuite/gcc.target/i386/pr119386-1.c | 11 +++ > > gcc/testsuite/gcc.target/i386/pr119386-2.c | 11 +++ > > 3 files changed, 29 insertions(+), 1 deletion(-) > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index be5e27fc391..0b238c3dddc 100644 > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -23154,6 +23154,12 @@ x86_print_call_or_nop (FILE *file, const char > > *target) > >if (flag_nop_mcount || !strcmp (target, "nop")) > > /* 5 byte nop: nopl 0(%[re]ax,%[re]ax,1) */ > > fprintf (file, "1:" ASM_BYTE "0x0f, 0x1f, 0x44, 0x00, 0x00\n"); > > + else if (!TARGET_PECOFF && flag_pic) > > +{ > > + gcc_assert (flag_plt); > > + > > + fprintf (file, "1:\tcall\t%s@PLT\n", target); > > +} > >else > > fprintf (file, "1:\tcall\t%s\n", target); > > } > > @@ -23317,7 +23323,7 @@ x86_function_profiler (FILE *file, int labelno > > ATTRIBUTE_UNUSED) > > break; > > case CM_SMALL_PIC: > > case CM_MEDIUM_PIC: > > - if (!ix86_direct_extern_access) > > + if (!flag_plt) > > { > > if (ASSEMBLER_DIALECT == ASM_INTEL) > > fprintf (file, "1:\tcall\t[QWORD PTR > > %s@GOTPCREL[rip]]\n", > > diff --git a/gcc/testsuite/gcc.target/i386/pr119386-1.c > > b/gcc/testsuite/gcc.target/i386/pr119386-1.c > > new file mode 100644 > > index 000..7930fc6f28d > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/pr119386-1.c > > @@ -0,0 +1,11 @@ > > +/* PR target/119386 */ > > +/* { dg-do compile { target *-*-linux* } } */ > > +/* { dg-require-effective-target lp64 } */ > > Can this be dropped? > I copied that from another test, assuming it would limit the testing to x86_64. Is there a better way to achieve that? > > +/* { dg-options "-O2 -fpic -pg" } */ > > +/* { dg-final { scan-assembler "mcount@PLT" } } */ > > + > > +int > > +main () > > +{ > > + return 0; > > +} > > diff --git a/gcc/testsuite/gcc.target/i386/pr119386-2.c > > b/gcc/testsuite/gcc.target/i386/pr119386-2.c > > new file mode 100644 > > index 000..6334b9b9072 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/pr119386-2.c > > @@ -0,0 +1,11 @@ > > +/* PR target/119386 */ > > +/* { dg-do compile { target *-*-linux* } } */ > > +/* { dg-require-effective-target lp64 } */ > > Can this be dropped? > > > +/* { dg-options "-O2 -fpic -fno-plt -pg" } */ > > +/* { dg-final { scan-assembler "mcount@GOTPCREL" } } */ > > Different scans for ia32 and ! ia32? > I did not consider IA32 at all - can we just omit it?
Re: [PATCH v4 1/2] i386: Prefer PLT indirection for __fentry__ calls under -fPIC
On Wed, 9 Apr 2025 at 16:46, H.J. Lu wrote: > > On Wed, Apr 9, 2025 at 1:53 AM Ard Biesheuvel wrote: > > > > From: Ard Biesheuvel > > > > Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling > > __fentry__") updated the logic that emits mcount() / __fentry__() calls > > into function prologues when profiling is enabled, to avoid GOT-based > > indirect calls when a direct call would suffice. > > > > There are two problems with that change: > > - it relies on -mdirect-extern-access rather than -fno-plt to decide > > whether or not a direct [PLT based] call is appropriate; > > - for the PLT case, it falls through to x86_print_call_or_nop(), which > > does not emit the @PLT suffix, resulting in the wrong relocation to be > > used (R_X86_64_PC32 instead of R_X86_64_PLT32) > > > > Fix this by testing flag_plt instead of ix86_direct_extern_access, and > > updating x86_print_call_or_nop() to take flag_pic and flag_plt into > > account. This also ensures that -mnop-mcount works as expected when > > emitting the PLT based profiling calls. > > > > Note that only 64-bit codegen is affected by this change or by the > > commit referenced above; -m32 will yield 'call *mcount@GOT()' as before. > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386 > > > > Signed-off-by: Ard Biesheuvel > > > > gcc/ChangeLog: > > > > PR target/119386 > > * config/i386/i386.cc (x86_print_call_or_nop): Add @PLT suffix > > where appropriate. > > (x86_function_profiler): Fall through to x86_print_call_or_nop() > > for PIC codegen when flag_plt is set. > > > > gcc/testsuite/ChangeLog: > > > > PR target/119386 > > * gcc.target/i386/pr119386-1.c: New test. > > * gcc.target/i386/pr119386-2.c: New test. > > --- > > gcc/config/i386/i386.cc| 8 +++- > > gcc/testsuite/gcc.target/i386/pr119386-1.c | 11 +++ > > gcc/testsuite/gcc.target/i386/pr119386-2.c | 12 > > 3 files changed, 30 insertions(+), 1 deletion(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr119386-1.c > > create mode 100644 gcc/testsuite/gcc.target/i386/pr119386-2.c > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index be5e27fc391..0b238c3dddc 100644 > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -23154,6 +23154,12 @@ x86_print_call_or_nop (FILE *file, const char > > *target) > >if (flag_nop_mcount || !strcmp (target, "nop")) > > /* 5 byte nop: nopl 0(%[re]ax,%[re]ax,1) */ > > fprintf (file, "1:" ASM_BYTE "0x0f, 0x1f, 0x44, 0x00, 0x00\n"); > > + else if (!TARGET_PECOFF && flag_pic) > > +{ > > + gcc_assert (flag_plt); > > + > > + fprintf (file, "1:\tcall\t%s@PLT\n", target); > > +} > >else > > fprintf (file, "1:\tcall\t%s\n", target); > > } > > @@ -23317,7 +23323,7 @@ x86_function_profiler (FILE *file, int labelno > > ATTRIBUTE_UNUSED) > > break; > > case CM_SMALL_PIC: > > case CM_MEDIUM_PIC: > > - if (!ix86_direct_extern_access) > > + if (!flag_plt) > > { > > if (ASSEMBLER_DIALECT == ASM_INTEL) > > fprintf (file, "1:\tcall\t[QWORD PTR > > %s@GOTPCREL[rip]]\n", > > diff --git a/gcc/testsuite/gcc.target/i386/pr119386-1.c > > b/gcc/testsuite/gcc.target/i386/pr119386-1.c > > new file mode 100644 > > index 000..174d00f1e27 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/pr119386-1.c > > @@ -0,0 +1,11 @@ > > +/* PR target/119386 */ > > +/* { dg-do compile { target *-*-linux* } } */ > > +/* { dg-options "-O2 -fpic -pg" } */ > > +/* { dg-final { scan-assembler "call\[ \t\]mcount@PLT" { target { ! ia32 } > > } } } */ > > +/* { dg-final { scan-assembler "call\[ \t\]\\*mcount@GOT\\(" { target ia32 > > } } } */ > > This is wrong for ia32, which should also generate "call mcount@PLT". > But it hasn't done that for a long time - it is hard to figure out from the Git history how long but at least 20 years IIUC So do you think this change should fix IA-32 as well? Note that the issue is about emitting 'call mcount' on 64-bit where 'call mcount@PLT' is needed, not about changing the indirect GOT based call to a PLT call.
Re: [PATCH v3] i386: Prefer PLT indirection for __fentry__ calls under -fPIC
On Tue, 8 Apr 2025 at 18:44, H.J. Lu wrote: > > On Tue, Apr 8, 2025 at 9:39 AM Ard Biesheuvel wrote: > > > > On Tue, 8 Apr 2025 at 15:33, H.J. Lu wrote: > > > > > > On Tue, Apr 8, 2025 at 3:46 AM Ard Biesheuvel wrote: > > > > > > > > From: Ard Biesheuvel > > > > > > > > Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling > > > > __fentry__") updated the logic that emits mcount() / __fentry__() calls > > > > into function prologues when profiling is enabled, to avoid GOT-based > > > > indirect calls when a direct call would suffice. > > > > > > > > There are two problems with that change: > > > > - it relies on -mdirect-extern-access rather than -fno-plt to decide > > > > whether or not a direct [PLT based] call is appropriate; > > > > - for the PLT case, it falls through to x86_print_call_or_nop(), which > > > > does not emit the @PLT suffix, resulting in the wrong relocation to be > > > > used (R_X86_64_PC32 instead of R_X86_64_PLT32) > > > > > > > > Fix this by testing flag_plt instead of ix86_direct_extern_access, and > > > > updating x86_print_call_or_nop() to take flag_pic and flag_plt into > > > > account. This also ensures that -mnop-mcount works as expected when > > > > emitting the PLT based profiling calls. > > > > > > > > Note that only 64-bit codegen is affected by this change or by the > > > > commit referenced above; -m32 will yield 'call *mcount@GOT()' as before. > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386 > > > > > > > > Signed-off-by: Ard Biesheuvel > > > > > > > > gcc/ChangeLog: > > > > > > > > PR target/119386 > > > > * config/i386/i386.cc (x86_print_call_or_nop): Add @PLT suffix > > > > where appropriate. > > > > (x86_function_profiler): Fall through to x86_print_call_or_nop() > > > > for PIC codegen when flag_plt is set. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > PR target/119386 > > > > * gcc.target/i386/pr119386-1.c: New test. > > > > * gcc.target/i386/pr119386-2.c: New test. > > > > --- > > > > gcc/config/i386/i386.cc| 8 +++- > > > > gcc/testsuite/gcc.target/i386/pr119386-1.c | 11 +++ > > > > gcc/testsuite/gcc.target/i386/pr119386-2.c | 11 +++ > > > > 3 files changed, 29 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > > > index be5e27fc391..0b238c3dddc 100644 > > > > --- a/gcc/config/i386/i386.cc > > > > +++ b/gcc/config/i386/i386.cc > > > > @@ -23154,6 +23154,12 @@ x86_print_call_or_nop (FILE *file, const char > > > > *target) > > > >if (flag_nop_mcount || !strcmp (target, "nop")) > > > > /* 5 byte nop: nopl 0(%[re]ax,%[re]ax,1) */ > > > > fprintf (file, "1:" ASM_BYTE "0x0f, 0x1f, 0x44, 0x00, 0x00\n"); > > > > + else if (!TARGET_PECOFF && flag_pic) > > > > +{ > > > > + gcc_assert (flag_plt); > > > > + > > > > + fprintf (file, "1:\tcall\t%s@PLT\n", target); > > > > +} > > > >else > > > > fprintf (file, "1:\tcall\t%s\n", target); > > > > } > > > > @@ -23317,7 +23323,7 @@ x86_function_profiler (FILE *file, int labelno > > > > ATTRIBUTE_UNUSED) > > > > break; > > > > case CM_SMALL_PIC: > > > > case CM_MEDIUM_PIC: > > > > - if (!ix86_direct_extern_access) > > > > + if (!flag_plt) > > > > { > > > > if (ASSEMBLER_DIALECT == ASM_INTEL) > > > > fprintf (file, "1:\tcall\t[QWORD PTR > > > > %s@GOTPCREL[rip]]\n", > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr119386-1.c > > > > b/gcc/testsuite/gcc.target/i386/pr119386-1.c > > > > new file mode 100644 > > > > index 000..7930fc6f28d > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.target/i386/pr119386-1.c > > > > @@ -0,0 +1,11 @@ > > &
[PATCH v4 2/2] i386: Enable -mnop-mcount for -fpic with PLTs
From: Ard Biesheuvel On 64-bit, -mnop-mcount can be trivially enabled for -fPIC codegen as long as PLTs are being used, given that the instruction encodings are identical, only the target may resolve differently depending on how the linker decides to incorporate the object file. So relax the option check, and add a test to ensure that 5-byte NOPs are emitted in the 64-bit case. On 32-bit, PLTs are never used so -mnop-mcount remains disallowed in combination with -fPIC. Signed-off-by: Ard Biesheuvel gcc/ChangeLog: PR target/119386 * config/i386/i386-options.cc: Permit -mnop-mcount on 64-bit when using -fpic with PLTs. gcc/testsuite/ChangeLog: PR target/119386 * gcc.target/i386/pr119386-3.c: New test. --- gcc/config/i386/i386-options.cc| 7 ++- gcc/testsuite/gcc.target/i386/pr119386-3.c | 11 +++ 2 files changed, 17 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr119386-3.c diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index fc4d2d2529e..8ecaff8d929 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -2819,7 +2819,12 @@ ix86_option_override_internal (bool main_args_p, error ("%<-mnop-mcount%> is not compatible with this target"); #endif if (flag_nop_mcount && flag_pic) -error ("%<-mnop-mcount%> is not implemented for %<-fPIC%>"); +{ + if (!TARGET_64BIT) + error ("%<-mnop-mcount%> is not implemented for %<-fPIC%>"); + else if (!flag_plt) + error ("%<-mnop-mcount%> is not implemented for %<-fno-plt%>"); +} /* Accept -msseregparm only if at least SSE support is enabled. */ if (TARGET_SSEREGPARM_P (opts->x_target_flags) diff --git a/gcc/testsuite/gcc.target/i386/pr119386-3.c b/gcc/testsuite/gcc.target/i386/pr119386-3.c new file mode 100644 index 000..b7999c34ce1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr119386-3.c @@ -0,0 +1,11 @@ +/* PR target/119386 */ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fpic -pg -mnop-mcount" } */ +/* { dg-final { scan-assembler ".byte\[ \t\]0x0f, 0x1f, 0x44, 0x00, 0x00" { target { ! ia32 } } } } */ +/* { dg-xfail-if "'-mnop-mcount' is not implemented for '-fPIC'" { ia32 } } */ + +int +main () +{ + return 0; +} -- 2.49.0.504.g3bcea36a83-goog
[PATCH v4 1/2] i386: Prefer PLT indirection for __fentry__ calls under -fPIC
From: Ard Biesheuvel Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling __fentry__") updated the logic that emits mcount() / __fentry__() calls into function prologues when profiling is enabled, to avoid GOT-based indirect calls when a direct call would suffice. There are two problems with that change: - it relies on -mdirect-extern-access rather than -fno-plt to decide whether or not a direct [PLT based] call is appropriate; - for the PLT case, it falls through to x86_print_call_or_nop(), which does not emit the @PLT suffix, resulting in the wrong relocation to be used (R_X86_64_PC32 instead of R_X86_64_PLT32) Fix this by testing flag_plt instead of ix86_direct_extern_access, and updating x86_print_call_or_nop() to take flag_pic and flag_plt into account. This also ensures that -mnop-mcount works as expected when emitting the PLT based profiling calls. Note that only 64-bit codegen is affected by this change or by the commit referenced above; -m32 will yield 'call *mcount@GOT()' as before. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386 Signed-off-by: Ard Biesheuvel gcc/ChangeLog: PR target/119386 * config/i386/i386.cc (x86_print_call_or_nop): Add @PLT suffix where appropriate. (x86_function_profiler): Fall through to x86_print_call_or_nop() for PIC codegen when flag_plt is set. gcc/testsuite/ChangeLog: PR target/119386 * gcc.target/i386/pr119386-1.c: New test. * gcc.target/i386/pr119386-2.c: New test. --- gcc/config/i386/i386.cc| 8 +++- gcc/testsuite/gcc.target/i386/pr119386-1.c | 11 +++ gcc/testsuite/gcc.target/i386/pr119386-2.c | 12 3 files changed, 30 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr119386-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr119386-2.c diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index be5e27fc391..0b238c3dddc 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -23154,6 +23154,12 @@ x86_print_call_or_nop (FILE *file, const char *target) if (flag_nop_mcount || !strcmp (target, "nop")) /* 5 byte nop: nopl 0(%[re]ax,%[re]ax,1) */ fprintf (file, "1:" ASM_BYTE "0x0f, 0x1f, 0x44, 0x00, 0x00\n"); + else if (!TARGET_PECOFF && flag_pic) +{ + gcc_assert (flag_plt); + + fprintf (file, "1:\tcall\t%s@PLT\n", target); +} else fprintf (file, "1:\tcall\t%s\n", target); } @@ -23317,7 +23323,7 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) break; case CM_SMALL_PIC: case CM_MEDIUM_PIC: - if (!ix86_direct_extern_access) + if (!flag_plt) { if (ASSEMBLER_DIALECT == ASM_INTEL) fprintf (file, "1:\tcall\t[QWORD PTR %s@GOTPCREL[rip]]\n", diff --git a/gcc/testsuite/gcc.target/i386/pr119386-1.c b/gcc/testsuite/gcc.target/i386/pr119386-1.c new file mode 100644 index 000..174d00f1e27 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr119386-1.c @@ -0,0 +1,11 @@ +/* PR target/119386 */ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fpic -pg" } */ +/* { dg-final { scan-assembler "call\[ \t\]mcount@PLT" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "call\[ \t\]\\*mcount@GOT\\(" { target ia32 } } } */ + +int +main () +{ + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr119386-2.c b/gcc/testsuite/gcc.target/i386/pr119386-2.c new file mode 100644 index 000..808dd9298ce --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr119386-2.c @@ -0,0 +1,12 @@ +/* PR target/119386 */ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fpic -fno-plt -pg" } */ +/* { dg-final { scan-assembler "call\[ \t\]\\*mcount@GOTPCREL\\(" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "call\[ \t\]\\*mcount@GOT\\(" { target ia32 } } } */ + + +int +main () +{ + return 0; +} -- 2.49.0.504.g3bcea36a83-goog
[PATCH] i386: Prefer PLT indirection for __fentry__ calls under -fPIC
From: Ard Biesheuvel Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling __fentry__") updated the logic that emits mcount() / __fentry__() calls into function prologues when profiling is enabled, to avoid GOT-based indirect calls when a direct call would suffice. There are two problems with that change: - it relies on -mdirect-extern-access rather than -fno-plt to decide whether or not a direct [PLT based] call is appropriate; - for the PLT case, it falls through to x86_print_call_or_nop(), which does not emit the @PLT suffix, resulting in the wrong relocation to be used (R_X86_64_PC32 instead of R_X86_64_PLT32) Fix this by testing flag_plt instead of ix86_direct_extern_access, and updating x86_print_call_or_nop() to take flag_pic and flag_plt into account. This ensures that -mnop-mcount works as expected when emitting the PLT based profiling calls. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386 Signed-off-by: Ard Biesheuvel gcc/ChangeLog: * config/i386/i386.cc (x86_function_profiler): Take ix86_direct_extern_access into account when generating calls to __fentry__() --- gcc/config/i386/i386.cc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index be5e27fc391..6be6ddcbb3c 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -23154,6 +23154,8 @@ x86_print_call_or_nop (FILE *file, const char *target) if (flag_nop_mcount || !strcmp (target, "nop")) /* 5 byte nop: nopl 0(%[re]ax,%[re]ax,1) */ fprintf (file, "1:" ASM_BYTE "0x0f, 0x1f, 0x44, 0x00, 0x00\n"); + else if (!TARGET_PECOFF && flag_pic && flag_plt) +fprintf (file, "1:\tcall\t%s@PLT\n", target); else fprintf (file, "1:\tcall\t%s\n", target); } @@ -23317,7 +23319,7 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) break; case CM_SMALL_PIC: case CM_MEDIUM_PIC: - if (!ix86_direct_extern_access) + if (!flag_plt) { if (ASSEMBLER_DIALECT == ASM_INTEL) fprintf (file, "1:\tcall\t[QWORD PTR %s@GOTPCREL[rip]]\n", -- 2.49.0.504.g3bcea36a83-goog
Re: [PATCH v3] i386: Prefer PLT indirection for __fentry__ calls under -fPIC
On Wed, 9 Apr 2025 at 09:12, Uros Bizjak wrote: > > On Tue, Apr 8, 2025 at 6:59 PM Ard Biesheuvel wrote: > > > > On Tue, 8 Apr 2025 at 18:44, H.J. Lu wrote: > > > > > > On Tue, Apr 8, 2025 at 9:39 AM Ard Biesheuvel wrote: > > > > > > > > On Tue, 8 Apr 2025 at 15:33, H.J. Lu wrote: > > > > > > > > > > On Tue, Apr 8, 2025 at 3:46 AM Ard Biesheuvel > > > > > wrote: > > > > > > > > > > > > From: Ard Biesheuvel > > > > > > > > > > > > Commit bde21de1205 ("i386: Honour -mdirect-extern-access when > > > > > > calling > > > > > > __fentry__") updated the logic that emits mcount() / __fentry__() > > > > > > calls > > > > > > into function prologues when profiling is enabled, to avoid > > > > > > GOT-based > > > > > > indirect calls when a direct call would suffice. > > > > > > > > > > > > There are two problems with that change: > > > > > > - it relies on -mdirect-extern-access rather than -fno-plt to decide > > > > > > whether or not a direct [PLT based] call is appropriate; > > > > > > - for the PLT case, it falls through to x86_print_call_or_nop(), > > > > > > which > > > > > > does not emit the @PLT suffix, resulting in the wrong relocation > > > > > > to be > > > > > > used (R_X86_64_PC32 instead of R_X86_64_PLT32) > > > > > > > > > > > > Fix this by testing flag_plt instead of ix86_direct_extern_access, > > > > > > and > > > > > > updating x86_print_call_or_nop() to take flag_pic and flag_plt into > > > > > > account. This also ensures that -mnop-mcount works as expected when > > > > > > emitting the PLT based profiling calls. > > > > > > > > > > > > Note that only 64-bit codegen is affected by this change or by the > > > > > > commit referenced above; -m32 will yield 'call *mcount@GOT()' as > > > > > > before. > > > > > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386 > > > > > > > > > > > > Signed-off-by: Ard Biesheuvel > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > PR target/119386 > > > > > > * config/i386/i386.cc (x86_print_call_or_nop): Add @PLT > > > > > > suffix > > > > > > where appropriate. > > > > > > (x86_function_profiler): Fall through to > > > > > > x86_print_call_or_nop() > > > > > > for PIC codegen when flag_plt is set. > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > PR target/119386 > > > > > > * gcc.target/i386/pr119386-1.c: New test. > > > > > > * gcc.target/i386/pr119386-2.c: New test. > > > > > > --- > > > > > > gcc/config/i386/i386.cc| 8 +++- > > > > > > gcc/testsuite/gcc.target/i386/pr119386-1.c | 11 +++ > > > > > > gcc/testsuite/gcc.target/i386/pr119386-2.c | 11 +++ > > > > > > 3 files changed, 29 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > > > > > index be5e27fc391..0b238c3dddc 100644 > > > > > > --- a/gcc/config/i386/i386.cc > > > > > > +++ b/gcc/config/i386/i386.cc > > > > > > @@ -23154,6 +23154,12 @@ x86_print_call_or_nop (FILE *file, const > > > > > > char *target) > > > > > >if (flag_nop_mcount || !strcmp (target, "nop")) > > > > > > /* 5 byte nop: nopl 0(%[re]ax,%[re]ax,1) */ > > > > > > fprintf (file, "1:" ASM_BYTE "0x0f, 0x1f, 0x44, 0x00, 0x00\n"); > > > > > > + else if (!TARGET_PECOFF && flag_pic) > > > > > > +{ > > > > > > + gcc_assert (flag_plt); > > > > > > + > > > > > > + fprintf (file, "1:\tcall\t%s@PLT\n", target); > > > > > > +} > > > > > >else
[PATCH v3] i386: Prefer PLT indirection for __fentry__ calls under -fPIC
From: Ard Biesheuvel Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling __fentry__") updated the logic that emits mcount() / __fentry__() calls into function prologues when profiling is enabled, to avoid GOT-based indirect calls when a direct call would suffice. There are two problems with that change: - it relies on -mdirect-extern-access rather than -fno-plt to decide whether or not a direct [PLT based] call is appropriate; - for the PLT case, it falls through to x86_print_call_or_nop(), which does not emit the @PLT suffix, resulting in the wrong relocation to be used (R_X86_64_PC32 instead of R_X86_64_PLT32) Fix this by testing flag_plt instead of ix86_direct_extern_access, and updating x86_print_call_or_nop() to take flag_pic and flag_plt into account. This also ensures that -mnop-mcount works as expected when emitting the PLT based profiling calls. Note that only 64-bit codegen is affected by this change or by the commit referenced above; -m32 will yield 'call *mcount@GOT()' as before. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386 Signed-off-by: Ard Biesheuvel gcc/ChangeLog: PR target/119386 * config/i386/i386.cc (x86_print_call_or_nop): Add @PLT suffix where appropriate. (x86_function_profiler): Fall through to x86_print_call_or_nop() for PIC codegen when flag_plt is set. gcc/testsuite/ChangeLog: PR target/119386 * gcc.target/i386/pr119386-1.c: New test. * gcc.target/i386/pr119386-2.c: New test. --- gcc/config/i386/i386.cc| 8 +++- gcc/testsuite/gcc.target/i386/pr119386-1.c | 11 +++ gcc/testsuite/gcc.target/i386/pr119386-2.c | 11 +++ 3 files changed, 29 insertions(+), 1 deletion(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index be5e27fc391..0b238c3dddc 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -23154,6 +23154,12 @@ x86_print_call_or_nop (FILE *file, const char *target) if (flag_nop_mcount || !strcmp (target, "nop")) /* 5 byte nop: nopl 0(%[re]ax,%[re]ax,1) */ fprintf (file, "1:" ASM_BYTE "0x0f, 0x1f, 0x44, 0x00, 0x00\n"); + else if (!TARGET_PECOFF && flag_pic) +{ + gcc_assert (flag_plt); + + fprintf (file, "1:\tcall\t%s@PLT\n", target); +} else fprintf (file, "1:\tcall\t%s\n", target); } @@ -23317,7 +23323,7 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) break; case CM_SMALL_PIC: case CM_MEDIUM_PIC: - if (!ix86_direct_extern_access) + if (!flag_plt) { if (ASSEMBLER_DIALECT == ASM_INTEL) fprintf (file, "1:\tcall\t[QWORD PTR %s@GOTPCREL[rip]]\n", diff --git a/gcc/testsuite/gcc.target/i386/pr119386-1.c b/gcc/testsuite/gcc.target/i386/pr119386-1.c new file mode 100644 index 000..7930fc6f28d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr119386-1.c @@ -0,0 +1,11 @@ +/* PR target/119386 */ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -fpic -pg" } */ +/* { dg-final { scan-assembler "mcount@PLT" } } */ + +int +main () +{ + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr119386-2.c b/gcc/testsuite/gcc.target/i386/pr119386-2.c new file mode 100644 index 000..6334b9b9072 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr119386-2.c @@ -0,0 +1,11 @@ +/* PR target/119386 */ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -fpic -fno-plt -pg" } */ +/* { dg-final { scan-assembler "mcount@GOTPCREL" } } */ + +int +main () +{ + return 0; +} -- 2.49.0.504.g3bcea36a83-goog
Re: [PATCH v5 1/2] i386: Prefer PLT indirection for __fentry__ calls under -fPIC
On Tue, 15 Apr 2025 at 09:48, Uros Bizjak wrote: > > On Thu, Apr 10, 2025 at 2:27 PM Ard Biesheuvel wrote: > > > > From: Ard Biesheuvel > > > > Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling > > __fentry__") updated the logic that emits mcount() / __fentry__() calls > > into function prologues when profiling is enabled, to avoid GOT-based > > indirect calls when a direct call would suffice. > > > > There are two problems with that change: > > - it relies on -mdirect-extern-access rather than -fno-plt to decide > > whether or not a direct [PLT based] call is appropriate; > > - for the PLT case, it falls through to x86_print_call_or_nop(), which > > does not emit the @PLT suffix, resulting in the wrong relocation to be > > used (R_X86_64_PC32 instead of R_X86_64_PLT32) > > > > Fix this by testing flag_plt instead of ix86_direct_extern_access, and > > updating x86_print_call_or_nop() to take flag_pic and flag_plt into > > account. This also ensures that -mnop-mcount works as expected when > > emitting the PLT based profiling calls. > > > > While at it, fix the 32-bit logic as well, and issue a PLT call unless > > PLTs are explicitly disabled. > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386 > > > > Signed-off-by: Ard Biesheuvel > > > > gcc/ChangeLog: > > > > PR target/119386 > > * config/i386/i386.cc (x86_print_call_or_nop): Add @PLT suffix > > where appropriate. > > (x86_function_profiler): Fall through to x86_print_call_or_nop() > > for PIC codegen when flag_plt is set. > > > > gcc/testsuite/ChangeLog: > > > > PR target/119386 > > * gcc.target/i386/pr119386-1.c: New test. > > * gcc.target/i386/pr119386-2.c: New test. > > OK if there are no further comments in the next day or two. > Thanks > BTW: Do you have commit rights? > No I do not.
Re: [PATCH] i386: Honour -mdirect-extern-access when calling __fentry__
On Thu, 11 May 2023 at 08:08, Uros Bizjak wrote: > > On Thu, May 11, 2023 at 12:04 AM H.J. Lu wrote: > > > > On Wed, May 10, 2023 at 2:17 AM Uros Bizjak wrote: > > > > > > On Tue, May 9, 2023 at 10:58 AM Ard Biesheuvel wrote: > > > > > > > > The small and medium PIC code models generate profiling calls that > > > > always load the address of __fentry__() via the GOT, even if > > > > -mdirect-extern-access is in effect. > > > > > > > > This deviates from the behavior with respect to other external > > > > references, and results in a longer opcode that relies on linker > > > > relaxation to eliminate the GOT load. In this particular case, the > > > > transformation replaces an indirect 'CALL *__fentry__@GOTPCREL(%rip)' > > > > with either 'CALL __fentry__; NOP' or 'NOP; CALL __fentry__', where the > > > > NOP is a 1 byte NOP that preserves the 6 byte length of the sequence. > > > > > > > > This is problematic for the Linux kernel, which generally relies on > > > > -mdirect-extern-access and hidden visibility to eliminate GOT based > > > > symbol references in code generated with -fpie/-fpic, without having to > > > > depend on linker relaxation. > > > > > > > > The Linux kernel relies on code patching to replace these opcodes with > > > > NOPs at runtime, and this is complicated code that we'd prefer not to > > > > complicate even more by adding support for patching both 5 and 6 byte > > > > sequences as well as parsing the instruction stream to decide which > > > > variant of CALL+NOP we are dealing with. > > > > > > > > So let's honour -mdirect-extern-access, and only load the address of > > > > __fentry__ via the GOT if direct references to external symbols are not > > > > permitted. > > > > > > > > Note that the GOT reference in question is in fact a data reference: we > > > > explicitly load the address of __fentry__ from the GOT, which amounts to > > > > eager binding, rather than emitting a PLT call that could bind eagerly, > > > > lazily or directly at link time. > > > > > > > > gcc/ChangeLog: > > > > > > > > * config/i386/i386.cc (x86_function_profiler): Take > > > > ix86_direct_extern_access into account when generating calls > > > > to __fentry__() > > > > > > HJ, is the patch OK with you? > > > > LGTM. > > OK then. > Thanks all. Is anything expected of me at this point?
Re: [PATCH] i386: Honour -mdirect-extern-access when calling __fentry__
On Fri, 12 May 2023 at 19:05, Uros Bizjak wrote: > > On Fri, May 12, 2023 at 4:07 PM Ard Biesheuvel wrote: > > > > > > > Note that the GOT reference in question is in fact a data > > > > > > reference: we > > > > > > explicitly load the address of __fentry__ from the GOT, which > > > > > > amounts to > > > > > > eager binding, rather than emitting a PLT call that could bind > > > > > > eagerly, > > > > > > lazily or directly at link time. > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > * config/i386/i386.cc (x86_function_profiler): Take > > > > > > ix86_direct_extern_access into account when generating > > > > > > calls > > > > > > to __fentry__() > > > > > > > > > > HJ, is the patch OK with you? > > > > > > > > LGTM. > > > > > > OK then. > > > > > > > Thanks all. Is anything expected of me at this point? > > Do you have write access to the repository? If not I can commit the > patch for you Yes, please. , but you should state this [1] in your patch submission. > > [1] https://gcc.gnu.org/contribute.html > Signed-off-by: Ard Biesheuvel Thanks,
Re: [PATCH v5 1/1] [ARM] Add support for TLS register based stack protector canary access
On Wed, 19 Jan 2022 at 17:54, Kyrylo Tkachov wrote: > > Hi Ard, > > > -Original Message- > > From: Gcc-patches > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Ard > > Biesheuvel via Gcc-patches > > Sent: Monday, November 15, 2021 6:04 PM > > To: linux-harden...@vger.kernel.org > > Cc: Richard Sandiford ; > > thomas.preudho...@celest.fr; Keith Packard ; > > gcc-patches@gcc.gnu.org; Kyrylo Tkachov ; Ard > > Biesheuvel > > Subject: [PATCH v5 1/1] [ARM] Add support for TLS register based stack > > protector canary access > > > > Add support for accessing the stack canary value via the TLS register, > > so that multiple threads running in the same address space can use > > distinct canary values. This is intended for the Linux kernel running in > > SMP mode, where processes entering the kernel are essentially threads > > running the same program concurrently: using a global variable for the > > canary in that context is problematic because it can never be rotated, > > and so the OS is forced to use the same value as long as it remains up. > > > > Using the TLS register to index the stack canary helps with this, as it > > allows each CPU to context switch the TLS register along with the rest > > of the process, permitting each process to use its own value for the > > stack canary. > > I've tested this patch on an arm-none-linux-gnueabihf target and the results > look clean. > Have you tested this patch with a kernel build as well? (since the > functionality is intended for that use). Of course. > If so, the patch is okay but please rebase it and repost so that we can > commit it taking into account > Will do.
[PATCH v6 0/1] implement TLS register based stack canary for ARM
Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352 In the Linux kernel, user processes calling into the kernel are essentially threads running in the same address space, of a program that never terminates. This means that using a global variable for the stack protector canary value is problematic on SMP systems, as we can never change it unless we reboot the system. (Processes that sleep for any reason will do so on a call into the kernel, which means that there will always be live kernel stack frames carrying copies of the canary taken when the function was entered) AArch64 implements -mstack-protector-guard=sysreg for this purpose, as this permits the kernel to use different memory addresses for the stack canary for each CPU, and context switch the chosen system register with the rest of the process, allowing each process to use its own unique value for the stack canary. This patch implements something similar, but for the 32-bit ARM kernel, which will start using the user space TLS register TPIDRURO to index per-process metadata while running in the kernel. This means we can just add an offset to TPIDRURO to obtain the address from which to load the canary value. Changes since v5: - rebase onto latest changes, including .c -> .cc rename - ensure that tests execute only on targets that can support them Changes since v4: - add a couple of test cases - incorporate feedback received from Qing and Kyrylo Changes since v3: - force a reload of the TLS register before performing the stack protector check, so that we never rely on the stack for the address of the canary Changes since v2: - fix the template for stack_protect_test_tls so it correctly conveys the fact that it sets the Z flag Cc: Keith Packard Cc: thomas.preudho...@celest.fr Cc: adhemerval.zane...@linaro.org Cc: Qing Zhao Cc: Richard Sandiford Cc: Kyrylo Tkachov Cc: Richard Earnshaw Cc: gcc-patches@gcc.gnu.org Ard Biesheuvel (1): [ARM] Add support for TLS register based stack protector canary access gcc/config/arm/arm-opts.h| 6 ++ gcc/config/arm/arm-protos.h | 2 + gcc/config/arm/arm.cc| 55 +++ gcc/config/arm/arm.md| 71 +++- gcc/config/arm/arm.opt | 22 ++ gcc/doc/invoke.texi | 11 +++ gcc/testsuite/gcc.target/arm/stack-protector-7.c | 12 gcc/testsuite/gcc.target/arm/stack-protector-8.c | 7 ++ 8 files changed, 184 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arm/stack-protector-7.c create mode 100644 gcc/testsuite/gcc.target/arm/stack-protector-8.c -- 2.30.2
[PATCH v6 1/1] [ARM] Add support for TLS register based stack protector canary access
Add support for accessing the stack canary value via the TLS register, so that multiple threads running in the same address space can use distinct canary values. This is intended for the Linux kernel running in SMP mode, where processes entering the kernel are essentially threads running the same program concurrently: using a global variable for the canary in that context is problematic because it can never be rotated, and so the OS is forced to use the same value as long as it remains up. Using the TLS register to index the stack canary helps with this, as it allows each CPU to context switch the TLS register along with the rest of the process, permitting each process to use its own value for the stack canary. 2022-01-19 Ard Biesheuvel * config/arm/arm-opts.h (enum stack_protector_guard): New * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem): New * config/arm/arm.cc (TARGET_STACK_PROTECT_GUARD): Define (arm_option_override_internal): Handle and put in error checks for stack protector guard options. (arm_option_reconfigure_globals): Likewise (arm_stack_protect_tls_canary_mem): New (arm_stack_protect_guard): New * config/arm/arm.md (stack_protect_set): New (stack_protect_set_tls): Likewise (stack_protect_test): Likewise (stack_protect_test_tls): Likewise (reload_tp_hard): Likewise * config/arm/arm.opt (-mstack-protector-guard): New (-mstack-protector-guard-offset): New. * doc/invoke.texi: Document new options gcc/testsuite/ChangeLog: * gcc.target/arm/stack-protector-7.c: New test. * gcc.target/arm/stack-protector-8.c: New test. Signed-off-by: Ard Biesheuvel --- gcc/config/arm/arm-opts.h| 6 ++ gcc/config/arm/arm-protos.h | 2 + gcc/config/arm/arm.cc| 55 +++ gcc/config/arm/arm.md| 71 +++- gcc/config/arm/arm.opt | 22 ++ gcc/doc/invoke.texi | 11 +++ gcc/testsuite/gcc.target/arm/stack-protector-7.c | 12 gcc/testsuite/gcc.target/arm/stack-protector-8.c | 7 ++ 8 files changed, 184 insertions(+), 2 deletions(-) diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h index c50d5e56a181..24d12fafdec8 100644 --- a/gcc/config/arm/arm-opts.h +++ b/gcc/config/arm/arm-opts.h @@ -69,4 +69,10 @@ enum arm_tls_type { TLS_GNU, TLS_GNU2 }; + +/* Where to get the canary for the stack protector. */ +enum stack_protector_guard { + SSP_TLSREG, /* per-thread canary in TLS register */ + SSP_GLOBAL /* global canary */ +}; #endif diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index cd55a9f6ca54..881c72c988bd 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx); extern rtx arm_load_tp (rtx); extern bool arm_coproc_builtin_available (enum unspecv); extern bool arm_coproc_ldc_stc_legitimate_address (rtx); +extern rtx arm_stack_protect_tls_canary_mem (bool); + #if defined TREE_CODE extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree); diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index 7825e364c01e..c192894ff33e 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_MD_ASM_ADJUST #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust + +#undef TARGET_STACK_PROTECT_GUARD +#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard /* Obstack for minipool constant handling. */ static struct obstack minipool_obstack; @@ -3176,6 +3179,26 @@ arm_option_override_internal (struct gcc_options *opts, if (TARGET_THUMB2_P (opts->x_target_flags)) opts->x_inline_asm_unified = true; + if (arm_stack_protector_guard == SSP_GLOBAL + && opts->x_arm_stack_protector_guard_offset_str) +{ + error ("incompatible options %'-mstack-protector-guard=global%' and" +"%'-mstack-protector-guard-offset=%qs%'", +arm_stack_protector_guard_offset_str); +} + + if (opts->x_arm_stack_protector_guard_offset_str) +{ + char *end; + const char *str = arm_stack_protector_guard_offset_str; + errno = 0; + long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0); + if (!*str || *end || errno) + error ("%qs is not a valid offset in %qs", str, + "-mstack-protector-guard-offset="); + arm_stack_protector_guard_offset = offs; +} + #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS SUBTARGET_OVERRIDE_INTERNAL_OPTIONS; #endif @@ -3843,6 +3866,9 @@ arm_option_reconfigure_globals (void)
Re: [PATCH v5 1/1] [ARM] Add support for TLS register based stack protector canary access
On Wed, 19 Jan 2022 at 18:02, Ard Biesheuvel wrote: > > On Wed, 19 Jan 2022 at 17:54, Kyrylo Tkachov wrote: > > > > Hi Ard, > > > > > -Original Message- > > > From: Gcc-patches > > bounces+kyrylo.tkachov=arm....@gcc.gnu.org> On Behalf Of Ard > > > Biesheuvel via Gcc-patches > > > Sent: Monday, November 15, 2021 6:04 PM > > > To: linux-harden...@vger.kernel.org > > > Cc: Richard Sandiford ; > > > thomas.preudho...@celest.fr; Keith Packard ; > > > gcc-patches@gcc.gnu.org; Kyrylo Tkachov ; Ard > > > Biesheuvel > > > Subject: [PATCH v5 1/1] [ARM] Add support for TLS register based stack > > > protector canary access > > > > > > Add support for accessing the stack canary value via the TLS register, > > > so that multiple threads running in the same address space can use > > > distinct canary values. This is intended for the Linux kernel running in > > > SMP mode, where processes entering the kernel are essentially threads > > > running the same program concurrently: using a global variable for the > > > canary in that context is problematic because it can never be rotated, > > > and so the OS is forced to use the same value as long as it remains up. > > > > > > Using the TLS register to index the stack canary helps with this, as it > > > allows each CPU to context switch the TLS register along with the rest > > > of the process, permitting each process to use its own value for the > > > stack canary. > > > > I've tested this patch on an arm-none-linux-gnueabihf target and the > > results look clean. > > Have you tested this patch with a kernel build as well? (since the > > functionality is intended for that use). > > Of course. > > > If so, the patch is okay but please rebase it and repost so that we can > > commit it taking into account > > > > Will do. I have sent out my v6 - please let me know if there is anything else I need to do to get this landed. Thanks, Ard.
Re: [PATCH v6 1/1] [ARM] Add support for TLS register based stack protector canary access
On Fri, 21 Jan 2022 at 11:47, Kyrylo Tkachov wrote: > > > -Original Message- > > From: Gcc-patches > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Ard > > Biesheuvel via Gcc-patches > > Sent: Wednesday, January 19, 2022 5:44 PM > > To: linux-harden...@vger.kernel.org > > Cc: Richard Earnshaw ; Richard Sandiford > > ; thomas.preudho...@celest.fr; Keith > > Packard ; gcc-patches@gcc.gnu.org; Kyrylo Tkachov > > ; Ard Biesheuvel > > Subject: [PATCH v6 1/1] [ARM] Add support for TLS register based stack > > protector canary access > > > > Add support for accessing the stack canary value via the TLS register, > > so that multiple threads running in the same address space can use > > distinct canary values. This is intended for the Linux kernel running in > > SMP mode, where processes entering the kernel are essentially threads > > running the same program concurrently: using a global variable for the > > canary in that context is problematic because it can never be rotated, > > and so the OS is forced to use the same value as long as it remains up. > > > > Using the TLS register to index the stack canary helps with this, as it > > allows each CPU to context switch the TLS register along with the rest > > of the process, permitting each process to use its own value for the > > stack canary. > > > > 2022-01-19 Ard Biesheuvel > > > > * config/arm/arm-opts.h (enum stack_protector_guard): New > > * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem): > > New > > * config/arm/arm.cc (TARGET_STACK_PROTECT_GUARD): Define > > (arm_option_override_internal): Handle and put in error checks > > for stack protector guard options. > > (arm_option_reconfigure_globals): Likewise > > (arm_stack_protect_tls_canary_mem): New > > (arm_stack_protect_guard): New > > * config/arm/arm.md (stack_protect_set): New > > (stack_protect_set_tls): Likewise > > (stack_protect_test): Likewise > > (stack_protect_test_tls): Likewise > > (reload_tp_hard): Likewise > > * config/arm/arm.opt (-mstack-protector-guard): New > > (-mstack-protector-guard-offset): New. > > * doc/invoke.texi: Document new options > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/arm/stack-protector-7.c: New test. > > * gcc.target/arm/stack-protector-8.c: New test. > > > > Signed-off-by: Ard Biesheuvel > > Thanks. One final bit. Given that you're using the Signed-off-by tag this > means that you're contributing this patch under the DCO rules. > Can you please confirm that you intend to contribute this patch under the > rules in https://gcc.gnu.org/dco.html Yes, I am making this contribution under DCO 1.1 terms. > If you're happy with that I'll push the patch for you. > Thanks, > Kyrill > Thanks! > > --- > > gcc/config/arm/arm-opts.h| 6 ++ > > gcc/config/arm/arm-protos.h | 2 + > > gcc/config/arm/arm.cc| 55 +++ > > gcc/config/arm/arm.md| 71 +++- > > gcc/config/arm/arm.opt | 22 ++ > > gcc/doc/invoke.texi | 11 +++ > > gcc/testsuite/gcc.target/arm/stack-protector-7.c | 12 > > gcc/testsuite/gcc.target/arm/stack-protector-8.c | 7 ++ > > 8 files changed, 184 insertions(+), 2 deletions(-) > > > > diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h > > index c50d5e56a181..24d12fafdec8 100644 > > --- a/gcc/config/arm/arm-opts.h > > +++ b/gcc/config/arm/arm-opts.h > > @@ -69,4 +69,10 @@ enum arm_tls_type { > >TLS_GNU, > >TLS_GNU2 > > }; > > + > > +/* Where to get the canary for the stack protector. */ > > +enum stack_protector_guard { > > + SSP_TLSREG, /* per-thread canary in TLS register */ > > + SSP_GLOBAL /* global canary */ > > +}; > > #endif > > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > > index cd55a9f6ca54..881c72c988bd 100644 > > --- a/gcc/config/arm/arm-protos.h > > +++ b/gcc/config/arm/arm-protos.h > > @@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, > > rtx, rtx, rtx, rtx, rtx, rtx); > > extern rtx arm_load_tp (rtx); > > extern bool arm_coproc_builtin_available (enum unspecv); > > extern bool arm_coproc_ldc_stc_legitimate_address (rtx); > &
Re: [PING^3][PATCH,v2,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack
On Wed, 26 Jan 2022 at 08:53, Dan Li wrote: > > Hi, all, > > Sorry for bothering. > > I'm trying to commit aarch64 scs code to the gcc and there is an issue > that I'm not sure about, could someone give me some suggestions? > (To avoid noise, I did't cc PING^3 [1] to the kernel mail list :) ) > > When clang enables scs, the following instructions are usually generated: > > str x30, [x18], 8 > ldp x29, x30, [sp], 16 > .. > ldp x29, x30, [sp], 16 ## x30 pop > ldr x30, [x18, -8]! ## x30 pop again > ret > > The x30 register is popped twice here, Richard suggested that we can > omit the first x30 pop here. > > AFAICT, it seems fine and also safe for SCS. But I'm not sure if I'm > missing something with the kernel, could someone give some suggestions? > > The previous discussion can be found here [1]. > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589257.html > As was pointed out in the discussion, binary patching is in fact a concern for the Linux kernel. E.g., Android uses generic binary builds, but we would like to be able to switch between pointer authentication and shadow call stack at boot time, rather than always support both, and take the SCS performance hit on systems that implement PAC as well. However, it seems more straight-forward to patch PACIASP and AUTIASP instructions into SCS push/pop instructions rather than the other way around, as we can force the use of these exact opcodes [in the NOP space]), as well as rely on existing unwind annotations to locate any such instruction in the binary. So omitting the load of X30 from the ordinary stack seems fine to me. > On 1/25/22 22:51, Dan Li wrote: > > > > > > On 1/25/22 02:19, Richard Sandiford wrote: > >> Dan Li writes: > > + > > if (flag_stack_usage_info) > > current_function_static_stack_size = constant_lower_bound > > (frame_size); > > @@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall) > > RTX_FRAME_RELATED_P (insn) = 1; > > } > > + /* Pop return address from shadow call stack. */ > > + if (aarch64_shadow_call_stack_enabled ()) > > +emit_insn (gen_scs_pop ()); > > + > > This looks correct, but following on from the above, I guess we continue > to restore x30 from the frame in the traditional way first, and then > overwrite it with the SCS value. I think the output code would be > slightly neater if we suppressed the first restore of x30. > > >>> Yes, the current epilogue is like: > >>> ... > >>> ldp x29, x30, [sp], #16 > >>> + ldr x30, [x18, #-8]! > >>> > >>> I think may be we can keep the two x30 pops here, for two reasons: > >>> 1) The implementation of scs in clang is to pop x30 twice, it might be > >>> better to be consistent with clang here[1]. > >> > >> This doesn't seem a very compelling reason on its own, unless it's > >> explicitly meant to be observable behaviour. (But then, observed how?) > >> > > > > Well, probably sticking to pop x30 twice is not a good idea. > > AFAICT, there doesn't seem to be an explicit requirement that > > this behavior must be followed. > > > > BTW: > > Do we still need to emit the .cfi_restore 30 directive here if we > > want to save a pop x30? (Sorry I'm not familiar enough with DWARF.) > > > > Since the aarch64 linux kernel always enables -fno-omit-frame-pointer, > > the generated assembly code may be as follows: > > > > str x30, [x18], 8 > > ldp x29, x30, [sp], 16 > > .. > > ldr x29, [sp], 16 > > ## Do we still need a .cfi_restore 30 here > > .cfi_restore 29 > > .cfi_def_cfa_offset 0 > > ldr x30, [x18, -8]! > > ## Or may be a non-CFA based directive here > > ret > > > >>> 2) If we keep the first restore of x30, it may provide some flexibility > >>> for other scenarios. For example, we can directly patch the scs_push/pop > >>> insns in the binary to turn it into a binary without scs; > >> > >> Is that a supported (and ideally documented) use case? If it is, > >> then it becomes important for correctness that the compiler emits > >> exactly the opcodes in the original patch. If it isn't, then the > >> compiler can emit other instructions that have the same effect. > >> > > > > Oh, no, this is just a little trick that can be used for compatibility. > > (Maybe some scenarios such as our company sometimes need to be > > compatible with some non-open source binaries from different > > manufacturers, two pops could make life easier :). ) > > > > If this is not a consideration for our community, please ignore > > this request. > > > >> I'd like a definitive ruling on this from the kernel folks before > >> the patch goes in. > >> > > > > Ok, I'll cc some kernel folks to make sure I didn't miss something. > > > >> If binary patching is supposed to be possible then scs_push and > >> scs_pop *do* need to be separate define_in
Re: [PING^3][PATCH,v2,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack
On Wed, 26 Jan 2022 at 11:40, Dan Li wrote: > > Thanks, Ard, > > On 1/26/22 00:10, Ard Biesheuvel wrote: > > On Wed, 26 Jan 2022 at 08:53, Dan Li wrote: > >> > >> Hi, all, > >> > >> Sorry for bothering. > >> > >> I'm trying to commit aarch64 scs code to the gcc and there is an issue > >> that I'm not sure about, could someone give me some suggestions? > >> (To avoid noise, I did't cc PING^3 [1] to the kernel mail list :) ) > >> > >> When clang enables scs, the following instructions are usually generated: > >> > >> str x30, [x18], 8 > >> ldp x29, x30, [sp], 16 > >> .. > >> ldp x29, x30, [sp], 16 ## x30 pop > >> ldr x30, [x18, -8]! ## x30 pop again > >> ret > >> > >> The x30 register is popped twice here, Richard suggested that we can > >> omit the first x30 pop here. > >> > >> AFAICT, it seems fine and also safe for SCS. But I'm not sure if I'm > >> missing something with the kernel, could someone give some suggestions? > >> > >> The previous discussion can be found here [1]. > >> > >> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589257.html > >> > > > > As was pointed out in the discussion, binary patching is in fact a > > concern for the Linux kernel. E.g., Android uses generic binary > > builds, but we would like to be able to switch between pointer > > authentication and shadow call stack at boot time, rather than always > > support both, and take the SCS performance hit on systems that > > implement PAC as well. > > > > However, it seems more straight-forward to patch PACIASP and AUTIASP > > instructions into SCS push/pop instructions rather than the other way > > around, as we can force the use of these exact opcodes [in the NOP > > space]), as well as rely on existing unwind annotations to locate any > > such instruction in the binary. > > > > Well, then I think I don't need to submit a kernel patch to > enable SCS for gcc :) > Not entirely. > BTW: > Do we have a plan to submit patches of dynamic patch PAC into > the kernel recently? > At the moment, there are just some ideas floating around. I did implement a proof of concept that uses unwind data, but it hit some issues with cfi_negate_ra_state being emitted imprecisely (GCC) or not at all (Clang) in some cases. Using objtool would be another possibility. So in summary, getting SCS support proper into GCC is definitely worth the effort. -- Ard.
Re: [PATCH v5 1/1] [ARM] Add support for TLS register based stack protector canary access
(+ Ramana) On Mon, 15 Nov 2021 at 19:04, Ard Biesheuvel wrote: > > Add support for accessing the stack canary value via the TLS register, > so that multiple threads running in the same address space can use > distinct canary values. This is intended for the Linux kernel running in > SMP mode, where processes entering the kernel are essentially threads > running the same program concurrently: using a global variable for the > canary in that context is problematic because it can never be rotated, > and so the OS is forced to use the same value as long as it remains up. > > Using the TLS register to index the stack canary helps with this, as it > allows each CPU to context switch the TLS register along with the rest > of the process, permitting each process to use its own value for the > stack canary. > > 2021-11-15 Ard Biesheuvel > > * config/arm/arm-opts.h (enum stack_protector_guard): New > * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem): > New > * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define > (arm_option_override_internal): Handle and put in error checks > for stack protector guard options. > (arm_option_reconfigure_globals): Likewise > (arm_stack_protect_tls_canary_mem): New > (arm_stack_protect_guard): New > * config/arm/arm.md (stack_protect_set): New > (stack_protect_set_tls): Likewise > (stack_protect_test): Likewise > (stack_protect_test_tls): Likewise > (reload_tp_hard): Likewise > * config/arm/arm.opt (-mstack-protector-guard): New > (-mstack-protector-guard-offset): New. > * doc/invoke.texi: Document new options > > gcc/testsuite/ChangeLog: > > * gcc.target/arm/stack-protector-7.c: New test. > * gcc.target/arm/stack-protector-8.c: New test. > > Signed-off-by: Ard Biesheuvel > --- > gcc/config/arm/arm-opts.h| 6 ++ > gcc/config/arm/arm-protos.h | 2 + > gcc/config/arm/arm.c | 55 +++ > gcc/config/arm/arm.md| 71 +++- > gcc/config/arm/arm.opt | 22 ++ > gcc/doc/invoke.texi | 11 +++ > gcc/testsuite/gcc.target/arm/stack-protector-7.c | 10 +++ > gcc/testsuite/gcc.target/arm/stack-protector-8.c | 5 ++ > 8 files changed, 180 insertions(+), 2 deletions(-) > > diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h > index 5c4b62f404f7..581ba3c4fbbb 100644 > --- a/gcc/config/arm/arm-opts.h > +++ b/gcc/config/arm/arm-opts.h > @@ -69,4 +69,10 @@ enum arm_tls_type { >TLS_GNU, >TLS_GNU2 > }; > + > +/* Where to get the canary for the stack protector. */ > +enum stack_protector_guard { > + SSP_TLSREG, /* per-thread canary in TLS register */ > + SSP_GLOBAL /* global canary */ > +}; > #endif > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > index 9b1f61394ad7..d8d605920c97 100644 > --- a/gcc/config/arm/arm-protos.h > +++ b/gcc/config/arm/arm-protos.h > @@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, > rtx, rtx, rtx, rtx); > extern rtx arm_load_tp (rtx); > extern bool arm_coproc_builtin_available (enum unspecv); > extern bool arm_coproc_ldc_stc_legitimate_address (rtx); > +extern rtx arm_stack_protect_tls_canary_mem (bool); > + > > #if defined TREE_CODE > extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree); > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index a5b403eb3e49..e5077348ce07 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] = > > #undef TARGET_MD_ASM_ADJUST > #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust > + > +#undef TARGET_STACK_PROTECT_GUARD > +#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard > > /* Obstack for minipool constant handling. */ > static struct obstack minipool_obstack; > @@ -3176,6 +3179,26 @@ arm_option_override_internal (struct gcc_options *opts, >if (TARGET_THUMB2_P (opts->x_target_flags)) > opts->x_inline_asm_unified = true; > > + if (arm_stack_protector_guard == SSP_GLOBAL > + && opts->x_arm_stack_protector_guard_offset_str) > +{ > + error ("incompatible options %'-mstack-protector-guard=global%' and" > +"%'-mstack-protector-guard-offset=%qs%'", > +arm_stack_protector_guard_offset_str); > +} > + > + if (op
Re: [PATCH v4 0/1] implement TLS register based stack canary for ARM
On Thu, 28 Oct 2021 at 13:27, Ard Biesheuvel wrote: > > Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352 > > In the Linux kernel, user processes calling into the kernel are > essentially threads running in the same address space, of a program that > never terminates. This means that using a global variable for the stack > protector canary value is problematic on SMP systems, as we can never > change it unless we reboot the system. (Processes that sleep for any > reason will do so on a call into the kernel, which means that there will > always be live kernel stack frames carrying copies of the canary taken > when the function was entered) > > AArch64 implements -mstack-protector-guard=sysreg for this purpose, as > this permits the kernel to use different memory addresses for the stack > canary for each CPU, and context switch the chosen system register with > the rest of the process, allowing each process to use its own unique > value for the stack canary. > > This patch implements something similar, but for the 32-bit ARM kernel, > which will start using the user space TLS register TPIDRURO to index > per-process metadata while running in the kernel. This means we can just > add an offset to TPIDRURO to obtain the address from which to load the > canary value. > > Changes since v3: > - force a reload of the TLS register before performing the stack > protector check, so that we never rely on the stack for the address of > the canary > Changes since v2: > - fix the template for stack_protect_test_tls so it correctly conveys > the fact that it sets the Z flag > > Comments/suggestions welcome. > > Cc: Keith Packard > Cc: thomas.preudho...@celest.fr > Cc: adhemerval.zane...@linaro.org > Cc: Qing Zhao > Cc: Richard Sandiford > Cc: gcc-patches@gcc.gnu.org > Note to reviewers: this feature has been accepted in LLVM/Clang, and so the exact command line options introduced by this patch to enable this feature can no longer be changed easily. I don't expect this to be an issue, given that they are the same as x86, but I thought I should note it nonetheless.
Re: [PATCH v4 1/1] [ARM] Add support for TLS register based stack protector canary access
On Tue, 9 Nov 2021 at 21:45, Qing Zhao wrote: > > Hi, Ard, > > Sorry for the late reply (since I don’t have the right to approve a patch, I > has been waiting for any arm port maintainer to review this patch). > The following is the arm port maintainer information I got from MAINTAINERS > file (you might want to explicitly cc’ing one of them for a review) > > arm portNick Clifton > arm portRichard Earnshaw > arm portRamana Radhakrishnan > arm portKyrylo Tkachov > > I see that Ramana implemented the similar patch for aarch64 (commit > cd0b2d361df82c848dc7e1c3078651bb0624c3c6), So, I am CCing him with this > email. Hopefully he will review this patch. > Thank you Qing. But I know Ramana well, and I know he no longer works on GCC. I collaborated with him on the AArch64 implementation at the time (but he wrote all the code) > Anyway, I briefly read your patch (version 4), and have the following > questions and comments: > > 1. When the option -mstack-protector-guard=tls presents, should the option > mstack-protector-guard-offset=.. be required to present? > If it’s required to present, you might want to add such requirement to > the documentation, and also issue errors when it’s not present. > It’s not clear right now from the current implementation, so, you might > need to update both "arm_option_override_internal “ in arm.c > and doc/invoke.texi to make this clear. > An offset of 0x0 is a reasonable default, so I don't think it is necessary to require the offset param to be passed in that case. > 2. For arm, is there only one system register can be used for this purpose? > There are other registers that might be used in the same way, but the TLS register is the obvious choice. On AArch64, we decided to use 'sysreg' and permit the user to specify the register because the Linux kernel uses the user space stack pointer (SP_EL0), which is kind of odd so we did not want to hard code that. > 3. For the functionality you added, I didn’t see any testing cases added, I > think testing cases are needed. > Yes, I am aware of that. I'm just not sure I know how to proceed here: any pointers? > More comments are embedded below: > > > On Oct 28, 2021, at 6:27 AM, Ard Biesheuvel wrote: > > > > Add support for accessing the stack canary value via the TLS register, > > so that multiple threads running in the same address space can use > > distinct canary values. This is intended for the Linux kernel running in > > SMP mode, where processes entering the kernel are essentially threads > > running the same program concurrently: using a global variable for the > > canary in that context is problematic because it can never be rotated, > > and so the OS is forced to use the same value as long as it remains up. > > > > Using the TLS register to index the stack canary helps with this, as it > > allows each CPU to context switch the TLS register along with the rest > > of the process, permitting each process to use its own value for the > > stack canary. > > > > 2021-10-28 Ard Biesheuvel > > > > * config/arm/arm-opts.h (enum stack_protector_guard): New > > * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem): > > New > > * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define > > (arm_option_override_internal): Handle and put in error checks > > for stack protector guard options. > > (arm_option_reconfigure_globals): Likewise > > (arm_stack_protect_tls_canary_mem): New > > (arm_stack_protect_guard): New > > * config/arm/arm.md (stack_protect_set): New > > (stack_protect_set_tls): Likewise > > (stack_protect_test): Likewise > > (stack_protect_test_tls): Likewise > > (reload_tp_hard): Likewise > > * config/arm/arm.opt (-mstack-protector-guard): New > > (-mstack-protector-guard-offset): New. > > * doc/invoke.texi: Document new options > > > > Signed-off-by: Ard Biesheuvel > > --- > > gcc/config/arm/arm-opts.h | 6 ++ > > gcc/config/arm/arm-protos.h | 2 + > > gcc/config/arm/arm.c| 55 +++ > > gcc/config/arm/arm.md | 71 +++- > > gcc/config/arm/arm.opt | 22 ++ > > gcc/doc/invoke.texi | 9 +++ > > 6 files changed, 163 insertions(+), 2 deletions(-) > > > > diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h > > index 5c4b62f404f7..581ba3c4fbbb 100644 > > --- a/gcc/config/arm/arm-opts.h > > +++ b/gcc/conf
[PATCH v5 0/1] implement TLS register based stack canary for ARM
Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352 In the Linux kernel, user processes calling into the kernel are essentially threads running in the same address space, of a program that never terminates. This means that using a global variable for the stack protector canary value is problematic on SMP systems, as we can never change it unless we reboot the system. (Processes that sleep for any reason will do so on a call into the kernel, which means that there will always be live kernel stack frames carrying copies of the canary taken when the function was entered) AArch64 implements -mstack-protector-guard=sysreg for this purpose, as this permits the kernel to use different memory addresses for the stack canary for each CPU, and context switch the chosen system register with the rest of the process, allowing each process to use its own unique value for the stack canary. This patch implements something similar, but for the 32-bit ARM kernel, which will start using the user space TLS register TPIDRURO to index per-process metadata while running in the kernel. This means we can just add an offset to TPIDRURO to obtain the address from which to load the canary value. Changes since v4: - add a couple of test cases - incorporate feedback received from Qing and Kyrylo Changes since v3: - force a reload of the TLS register before performing the stack protector check, so that we never rely on the stack for the address of the canary Changes since v2: - fix the template for stack_protect_test_tls so it correctly conveys the fact that it sets the Z flag Cc: Keith Packard Cc: thomas.preudho...@celest.fr Cc: adhemerval.zane...@linaro.org Cc: Qing Zhao Cc: Richard Sandiford Cc: Kyrylo Tkachov Cc: gcc-patches@gcc.gnu.org Ard Biesheuvel (1): [ARM] Add support for TLS register based stack protector canary access gcc/config/arm/arm-opts.h| 6 ++ gcc/config/arm/arm-protos.h | 2 + gcc/config/arm/arm.c | 55 +++ gcc/config/arm/arm.md| 71 +++- gcc/config/arm/arm.opt | 22 ++ gcc/doc/invoke.texi | 11 +++ gcc/testsuite/gcc.target/arm/stack-protector-7.c | 10 +++ gcc/testsuite/gcc.target/arm/stack-protector-8.c | 5 ++ 8 files changed, 180 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arm/stack-protector-7.c create mode 100644 gcc/testsuite/gcc.target/arm/stack-protector-8.c -- 2.30.2
[PATCH v5 1/1] [ARM] Add support for TLS register based stack protector canary access
Add support for accessing the stack canary value via the TLS register, so that multiple threads running in the same address space can use distinct canary values. This is intended for the Linux kernel running in SMP mode, where processes entering the kernel are essentially threads running the same program concurrently: using a global variable for the canary in that context is problematic because it can never be rotated, and so the OS is forced to use the same value as long as it remains up. Using the TLS register to index the stack canary helps with this, as it allows each CPU to context switch the TLS register along with the rest of the process, permitting each process to use its own value for the stack canary. 2021-11-15 Ard Biesheuvel * config/arm/arm-opts.h (enum stack_protector_guard): New * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem): New * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define (arm_option_override_internal): Handle and put in error checks for stack protector guard options. (arm_option_reconfigure_globals): Likewise (arm_stack_protect_tls_canary_mem): New (arm_stack_protect_guard): New * config/arm/arm.md (stack_protect_set): New (stack_protect_set_tls): Likewise (stack_protect_test): Likewise (stack_protect_test_tls): Likewise (reload_tp_hard): Likewise * config/arm/arm.opt (-mstack-protector-guard): New (-mstack-protector-guard-offset): New. * doc/invoke.texi: Document new options gcc/testsuite/ChangeLog: * gcc.target/arm/stack-protector-7.c: New test. * gcc.target/arm/stack-protector-8.c: New test. Signed-off-by: Ard Biesheuvel --- gcc/config/arm/arm-opts.h| 6 ++ gcc/config/arm/arm-protos.h | 2 + gcc/config/arm/arm.c | 55 +++ gcc/config/arm/arm.md| 71 +++- gcc/config/arm/arm.opt | 22 ++ gcc/doc/invoke.texi | 11 +++ gcc/testsuite/gcc.target/arm/stack-protector-7.c | 10 +++ gcc/testsuite/gcc.target/arm/stack-protector-8.c | 5 ++ 8 files changed, 180 insertions(+), 2 deletions(-) diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h index 5c4b62f404f7..581ba3c4fbbb 100644 --- a/gcc/config/arm/arm-opts.h +++ b/gcc/config/arm/arm-opts.h @@ -69,4 +69,10 @@ enum arm_tls_type { TLS_GNU, TLS_GNU2 }; + +/* Where to get the canary for the stack protector. */ +enum stack_protector_guard { + SSP_TLSREG, /* per-thread canary in TLS register */ + SSP_GLOBAL /* global canary */ +}; #endif diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 9b1f61394ad7..d8d605920c97 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx); extern rtx arm_load_tp (rtx); extern bool arm_coproc_builtin_available (enum unspecv); extern bool arm_coproc_ldc_stc_legitimate_address (rtx); +extern rtx arm_stack_protect_tls_canary_mem (bool); + #if defined TREE_CODE extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index a5b403eb3e49..e5077348ce07 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_MD_ASM_ADJUST #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust + +#undef TARGET_STACK_PROTECT_GUARD +#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard /* Obstack for minipool constant handling. */ static struct obstack minipool_obstack; @@ -3176,6 +3179,26 @@ arm_option_override_internal (struct gcc_options *opts, if (TARGET_THUMB2_P (opts->x_target_flags)) opts->x_inline_asm_unified = true; + if (arm_stack_protector_guard == SSP_GLOBAL + && opts->x_arm_stack_protector_guard_offset_str) +{ + error ("incompatible options %'-mstack-protector-guard=global%' and" +"%'-mstack-protector-guard-offset=%qs%'", +arm_stack_protector_guard_offset_str); +} + + if (opts->x_arm_stack_protector_guard_offset_str) +{ + char *end; + const char *str = arm_stack_protector_guard_offset_str; + errno = 0; + long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0); + if (!*str || *end || errno) + error ("%qs is not a valid offset in %qs", str, + "-mstack-protector-guard-offset="); + arm_stack_protector_guard_offset = offs; +} + #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS SUBTARGET_OVERRIDE_INTERNAL_OPTIONS; #endif @@ -3843,6 +3866,9 @@ arm_option_reconfigure_globals (void)
[RFC PATCH 0/1] implement TLS register based stack canary for ARM
Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352 In the Linux kernel, user processes calling into the kernel are essentially threads running in the same address space, of a program that never terminates. This means that using a global variable for the stack protector canary value is problematic on SMP systems, as we can never change it unless we reboot the system. (Processes that sleep for any reason will do so on a call into the kernel, which means that there will always be live kernel stack frames carrying copies of the canary taken when the function was entered) AArch64 implements -mstack-protector-guard=sysreg for this purpose, as this permits the kernel to use different memory addresses for the stack canary for each CPU, and context switch the chosen system register with the rest of the process, allowing each process to use its own unique value for the stack canary. This patch implements something similar, but for the 32-bit ARM kernel, which will start using the user space TLS register TPIDRURO to index per-process metadata while running in the kernel. This means we can just add an offset to TPIDRURO to obtain the address from which to load the canary value. The patch is a bit rough around the edges, but produces the correct results as far as I can tell. However, I couldn't quite figure out how to modify the patterns so that the offset will be moved into the immediate offset field of the LDR instructions, so currently, the ADD of the offset is always a distinct instruction. As for the spilling issues that have been fixed in this code in the past: I suppose a register carrying the TLS register value will never get spilled to begin with? How about a register that carries TLS+? Comments/suggestions welcome. Cc: thomas.preudho...@celest.fr Cc: adhemerval.zane...@linaro.org Cc: Qing Zhao Cc: Richard Sandiford Cc: gcc-patches@gcc.gnu.org Ard Biesheuvel (1): [ARM] Add support for TLS register based stack protector canary access gcc/config/arm/arm-opts.h | 6 +++ gcc/config/arm/arm.c | 39 + gcc/config/arm/arm.md | 44 ++-- gcc/config/arm/arm.opt| 22 ++ gcc/doc/invoke.texi | 9 5 files changed, 116 insertions(+), 4 deletions(-) -- 2.30.2 $ cat |arm-linux-gnueabihf-gcc -march=armv7-a -mstack-protector-guard=tls -mstack-protector-guard-offset=10 -mtp=cp15 -S -o - -xc - -fstack-protector-all -O3 int foo(void *); int bar(void) { return foo(__builtin_thread_pointer()) + 1; } .arch armv7-a .fpu softvfp .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 1 .eabi_attribute 18, 4 .file "" .text .align 2 .global bar .syntax unified .arm .type bar, %function bar: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 0, uses_anonymous_args = 0 push{r4, lr} mrc p15, 0, r4, c13, c0, 3 @ load_tp_hard add r3, r4, #10 sub sp, sp, #8 mov r0, r4 add r4, r4, #10 ldr r3, [r3] str r3, [sp, #4] mov r3, #0 bl foo ldr r3, [r4] ldr r4, [sp, #4] eorsr3, r4, r3 mov r4, #0 bne .L5 add r0, r0, #1 add sp, sp, #8 @ sp needed pop {r4, pc} .L5: bl __stack_chk_fail .size bar, .-bar .ident "GCC: (GNU) 12.0.0 20211019 (experimental)" .section.note.GNU-stack,"",%progbits
[RFC PATCH 1/1] [ARM] Add support for TLS register based stack protector canary access
Add support for accessing the stack canary value via the TLS register, so that multiple threads running in the same address space can use distinct canary values. This is intended for the Linux kernel running in SMP mode, where processes entering the kernel are essentially threads running the same program concurrently: using a global variable for the canary in that context is problematic because it can never be rotated, and so the OS is forced to use the same value as long as it remains up. Using the TLS register to index the stack canary helps with this, as it allows each CPU to context switch the TLS register along with the rest of the process, permitting each process to use its own value for the stack canary. 2021-10-20 Ard Biesheuvel * config/arm/arm-opts.h (enum stack_protector_guard): New * config/arm/arm.c (arm_option_override_internal): Handle and put in error checks for stack protector guard options. (arm_stack_protect_guard): New. (TARGET_STACK_PROTECT_GUARD): Define. * config/arm/arm.md (reg_stack_protect_address): New. (stack_protect_set): Adjust for SSP_GLOBAL. (stack_protect_test): Likewise. * config/arm/arm.opt (-mstack-protector-guard): New (-mstack-protector-guard-offset): New. Signed-off-by: Ard Biesheuvel --- gcc/config/arm/arm-opts.h | 6 +++ gcc/config/arm/arm.c | 39 + gcc/config/arm/arm.md | 44 ++-- gcc/config/arm/arm.opt| 22 ++ gcc/doc/invoke.texi | 9 5 files changed, 116 insertions(+), 4 deletions(-) diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h index 5c4b62f404f7..581ba3c4fbbb 100644 --- a/gcc/config/arm/arm-opts.h +++ b/gcc/config/arm/arm-opts.h @@ -69,4 +69,10 @@ enum arm_tls_type { TLS_GNU, TLS_GNU2 }; + +/* Where to get the canary for the stack protector. */ +enum stack_protector_guard { + SSP_TLSREG, /* per-thread canary in TLS register */ + SSP_GLOBAL /* global canary */ +}; #endif diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index e51f60a1841d..deccc88e006e 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_MD_ASM_ADJUST #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust + +#undef TARGET_STACK_PROTECT_GUARD +#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard /* Obstack for minipool constant handling. */ static struct obstack minipool_obstack; @@ -3157,6 +3160,26 @@ arm_option_override_internal (struct gcc_options *opts, if (TARGET_THUMB2_P (opts->x_target_flags)) opts->x_inline_asm_unified = true; + if (arm_stack_protector_guard == SSP_GLOBAL + && opts->x_arm_stack_protector_guard_offset_str) +{ + error ("incompatible options %'-mstack-protector-guard=global%' and" +"%'-mstack-protector-guard-offset=%qs%'", +arm_stack_protector_guard_offset_str); +} + + if (opts->x_arm_stack_protector_guard_offset_str) +{ + char *end; + const char *str = arm_stack_protector_guard_offset_str; + errno = 0; + long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0); + if (!*str || *end || errno) + error ("%qs is not a valid offset in %qs", str, + "-mstack-protector-guard-offset="); + arm_stack_protector_guard_offset = offs; +} + #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS SUBTARGET_OVERRIDE_INTERNAL_OPTIONS; #endif @@ -3824,6 +3847,10 @@ arm_option_reconfigure_globals (void) else target_thread_pointer = TP_SOFT; } + + if (arm_stack_protector_guard == SSP_TLSREG + && target_thread_pointer != TP_CP15) +error("%'-mstack-protector-guard=tls%' needs a hardware TLS register"); } /* Perform some validation between the desired architecture and the rest of the @@ -34052,6 +34079,18 @@ arm_run_selftests (void) } } /* Namespace selftest. */ +/* Implement TARGET_STACK_PROTECT_GUARD. In case of a + global variable based guard use the default else + return a null tree. */ +static tree +arm_stack_protect_guard (void) +{ + if (arm_stack_protector_guard == SSP_GLOBAL) +return default_stack_protect_guard (); + + return NULL_TREE; +} + #undef TARGET_RUN_TARGET_SELFTESTS #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests #endif /* CHECKING_P */ diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 4adc976b8b67..f57e1db07e6a 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -9183,8 +9183,21 @@ UNSPEC_SP_SET)) (clobber (match_scratch:SI 2 "")) (clobber (match_scratch:SI 3 ""))])] + "arm_stack_protector_guard == SSP_GLOBAL" "" - "" +) + +(define_ex
Re: [RFC PATCH 0/1] implement TLS register based stack canary for ARM
On Thu, 21 Oct 2021 at 12:23, Ard Biesheuvel wrote: > > Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352 > > In the Linux kernel, user processes calling into the kernel are > essentially threads running in the same address space, of a program that > never terminates. This means that using a global variable for the stack > protector canary value is problematic on SMP systems, as we can never > change it unless we reboot the system. (Processes that sleep for any > reason will do so on a call into the kernel, which means that there will > always be live kernel stack frames carrying copies of the canary taken > when the function was entered) > > AArch64 implements -mstack-protector-guard=sysreg for this purpose, as > this permits the kernel to use different memory addresses for the stack > canary for each CPU, and context switch the chosen system register with > the rest of the process, allowing each process to use its own unique > value for the stack canary. > > This patch implements something similar, but for the 32-bit ARM kernel, > which will start using the user space TLS register TPIDRURO to index > per-process metadata while running in the kernel. This means we can just > add an offset to TPIDRURO to obtain the address from which to load the > canary value. > > The patch is a bit rough around the edges, but produces the correct > results as far as I can tell. This is a lie > However, I couldn't quite figure out how > to modify the patterns so that the offset will be moved into the > immediate offset field of the LDR instructions, so currently, the ADD of > the offset is always a distinct instruction. > ... and this is no longer true now that I fixed the correctness problem. I will be sending out a v2 shortly, so please disregard this one for now. > As for the spilling issues that have been fixed in this code in the > past: I suppose a register carrying the TLS register value will never > get spilled to begin with? How about a register that carries TLS+? > > Comments/suggestions welcome. > > Cc: thomas.preudho...@celest.fr > Cc: adhemerval.zane...@linaro.org > Cc: Qing Zhao > Cc: Richard Sandiford > Cc: gcc-patches@gcc.gnu.org > > Ard Biesheuvel (1): > [ARM] Add support for TLS register based stack protector canary access > > gcc/config/arm/arm-opts.h | 6 +++ > gcc/config/arm/arm.c | 39 + > gcc/config/arm/arm.md | 44 ++-- > gcc/config/arm/arm.opt| 22 ++ > gcc/doc/invoke.texi | 9 > 5 files changed, 116 insertions(+), 4 deletions(-) > > -- > 2.30.2 > > $ cat |arm-linux-gnueabihf-gcc -march=armv7-a -mstack-protector-guard=tls > -mstack-protector-guard-offset=10 -mtp=cp15 -S -o - -xc - > -fstack-protector-all -O3 > int foo(void *); > int bar(void) > { > > return foo(__builtin_thread_pointer()) + 1; > } > .arch armv7-a > .fpu softvfp > .eabi_attribute 20, 1 > .eabi_attribute 21, 1 > .eabi_attribute 23, 3 > .eabi_attribute 24, 1 > .eabi_attribute 25, 1 > .eabi_attribute 26, 2 > .eabi_attribute 30, 2 > .eabi_attribute 34, 1 > .eabi_attribute 18, 4 > .file "" > .text > .align 2 > .global bar > .syntax unified > .arm > .type bar, %function > bar: > @ args = 0, pretend = 0, frame = 8 > @ frame_needed = 0, uses_anonymous_args = 0 > push{r4, lr} > mrc p15, 0, r4, c13, c0, 3 @ load_tp_hard > add r3, r4, #10 > sub sp, sp, #8 > mov r0, r4 > add r4, r4, #10 > ldr r3, [r3] > str r3, [sp, #4] > mov r3, #0 > bl foo > ldr r3, [r4] > ldr r4, [sp, #4] > eorsr3, r4, r3 > mov r4, #0 > bne .L5 > add r0, r0, #1 > add sp, sp, #8 > @ sp needed > pop {r4, pc} > .L5: > bl __stack_chk_fail > .size bar, .-bar > .ident "GCC: (GNU) 12.0.0 20211019 (experimental)" > .section.note.GNU-stack,"",%progbits >
[RFC PATCH v2 0/1] implement TLS register based stack canary for ARM
Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352 In the Linux kernel, user processes calling into the kernel are essentially threads running in the same address space, of a program that never terminates. This means that using a global variable for the stack protector canary value is problematic on SMP systems, as we can never change it unless we reboot the system. (Processes that sleep for any reason will do so on a call into the kernel, which means that there will always be live kernel stack frames carrying copies of the canary taken when the function was entered) AArch64 implements -mstack-protector-guard=sysreg for this purpose, as this permits the kernel to use different memory addresses for the stack canary for each CPU, and context switch the chosen system register with the rest of the process, allowing each process to use its own unique value for the stack canary. This patch implements something similar, but for the 32-bit ARM kernel, which will start using the user space TLS register TPIDRURO to index per-process metadata while running in the kernel. This means we can just add an offset to TPIDRURO to obtain the address from which to load the canary value. As for the spilling issues that have been fixed in this code in the past: I suppose a register carrying the TLS register value will never get spilled to begin with? Comments/suggestions welcome. Cc: Keith Packard Cc: thomas.preudho...@celest.fr Cc: adhemerval.zane...@linaro.org Cc: Qing Zhao Cc: Richard Sandiford Cc: gcc-patches@gcc.gnu.org Ard Biesheuvel (1): [ARM] Add support for TLS register based stack protector canary access gcc/config/arm/arm-opts.h | 6 ++ gcc/config/arm/arm-protos.h | 2 + gcc/config/arm/arm.c| 52 gcc/config/arm/arm.md | 62 +++- gcc/config/arm/arm.opt | 22 +++ gcc/doc/invoke.texi | 9 +++ 6 files changed, 151 insertions(+), 2 deletions(-) -- 2.30.2 $ cat|arm-linux-gnueabihf-gcc -march=armv7-a -mstack-protector-guard=tls -mstack-protector-guard-offset=1296 -mtp=cp15 -S -o - -xc - -fstack-protector-all -O3 int foo(void *); int bar(void) { return foo(__builtin_thread_pointer()) + 1; } .arch armv7-a .fpu softvfp .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 1 .eabi_attribute 18, 4 .file "" .text .align 2 .global bar .syntax unified .arm .type bar, %function bar: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 0, uses_anonymous_args = 0 push{r4, lr} mrc p15, 0, r4, c13, c0, 3 @ load_tp_hard mov r0, r4 sub sp, sp, #8 ldr r3, [r4, #1296] str r3, [sp, #4] mov r3, #0 bl foo ldr r2, [sp, #4] ldr r3, [r4, #1296] eorsr3, r2, r3 mov r2, #0 bne .L5 add r0, r0, #1 add sp, sp, #8 @ sp needed pop {r4, pc} .L5: bl __stack_chk_fail .size bar, .-bar .ident "GCC: (GNU) 12.0.0 20211021 (experimental)" .section.note.GNU-stack,"",%progbits
[RFC PATCH v2 1/1] [ARM] Add support for TLS register based stack protector canary access
Add support for accessing the stack canary value via the TLS register, so that multiple threads running in the same address space can use distinct canary values. This is intended for the Linux kernel running in SMP mode, where processes entering the kernel are essentially threads running the same program concurrently: using a global variable for the canary in that context is problematic because it can never be rotated, and so the OS is forced to use the same value as long as it remains up. Using the TLS register to index the stack canary helps with this, as it allows each CPU to context switch the TLS register along with the rest of the process, permitting each process to use its own value for the stack canary. 2021-10-21 Ard Biesheuvel * config/arm/arm-opts.h (enum stack_protector_guard): New * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem): New * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define (arm_option_override_internal): Handle and put in error checks for stack protector guard options. (arm_option_reconfigure_globals): Likewise (arm_stack_protect_tls_canary_mem): New (arm_stack_protect_guard): New * config/arm/arm.md (stack_protect_set): New (stack_protect_set_tls): Likewise (stack_protect_test): Likewise (stack_protect_test_tls): Likewise * config/arm/arm.opt (-mstack-protector-guard): New (-mstack-protector-guard-offset): New. Signed-off-by: Ard Biesheuvel --- gcc/config/arm/arm-opts.h | 6 ++ gcc/config/arm/arm-protos.h | 2 + gcc/config/arm/arm.c| 52 gcc/config/arm/arm.md | 62 +++- gcc/config/arm/arm.opt | 22 +++ gcc/doc/invoke.texi | 9 +++ 6 files changed, 151 insertions(+), 2 deletions(-) diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h index 5c4b62f404f7..581ba3c4fbbb 100644 --- a/gcc/config/arm/arm-opts.h +++ b/gcc/config/arm/arm-opts.h @@ -69,4 +69,10 @@ enum arm_tls_type { TLS_GNU, TLS_GNU2 }; + +/* Where to get the canary for the stack protector. */ +enum stack_protector_guard { + SSP_TLSREG, /* per-thread canary in TLS register */ + SSP_GLOBAL /* global canary */ +}; #endif diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 9b1f61394ad7..37e80256a78d 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx); extern rtx arm_load_tp (rtx); extern bool arm_coproc_builtin_available (enum unspecv); extern bool arm_coproc_ldc_stc_legitimate_address (rtx); +extern rtx arm_stack_protect_tls_canary_mem (void); + #if defined TREE_CODE extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index c4ff06b087eb..0bf06e764dbb 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_MD_ASM_ADJUST #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust + +#undef TARGET_STACK_PROTECT_GUARD +#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard /* Obstack for minipool constant handling. */ static struct obstack minipool_obstack; @@ -3155,6 +3158,26 @@ arm_option_override_internal (struct gcc_options *opts, if (TARGET_THUMB2_P (opts->x_target_flags)) opts->x_inline_asm_unified = true; + if (arm_stack_protector_guard == SSP_GLOBAL + && opts->x_arm_stack_protector_guard_offset_str) +{ + error ("incompatible options %'-mstack-protector-guard=global%' and" +"%'-mstack-protector-guard-offset=%qs%'", +arm_stack_protector_guard_offset_str); +} + + if (opts->x_arm_stack_protector_guard_offset_str) +{ + char *end; + const char *str = arm_stack_protector_guard_offset_str; + errno = 0; + long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0); + if (!*str || *end || errno) + error ("%qs is not a valid offset in %qs", str, + "-mstack-protector-guard-offset="); + arm_stack_protector_guard_offset = offs; +} + #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS SUBTARGET_OVERRIDE_INTERNAL_OPTIONS; #endif @@ -3822,6 +3845,10 @@ arm_option_reconfigure_globals (void) else target_thread_pointer = TP_SOFT; } + + if (arm_stack_protector_guard == SSP_TLSREG + && target_thread_pointer != TP_CP15) +error("%'-mstack-protector-guard=tls%' needs a hardware TLS register"); } /* Perform some validation between the desired architecture and the rest of the @@ -8087,6 +8114,19 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg, } +rtx +arm_stack_p
Re: [RFC PATCH v2 1/1] [ARM] Add support for TLS register based stack protector canary access
On Thu, 21 Oct 2021 at 18:51, Ard Biesheuvel wrote: > > Add support for accessing the stack canary value via the TLS register, > so that multiple threads running in the same address space can use > distinct canary values. This is intended for the Linux kernel running in > SMP mode, where processes entering the kernel are essentially threads > running the same program concurrently: using a global variable for the > canary in that context is problematic because it can never be rotated, > and so the OS is forced to use the same value as long as it remains up. > > Using the TLS register to index the stack canary helps with this, as it > allows each CPU to context switch the TLS register along with the rest > of the process, permitting each process to use its own value for the > stack canary. > > 2021-10-21 Ard Biesheuvel > > * config/arm/arm-opts.h (enum stack_protector_guard): New > * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem): > New > * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define > (arm_option_override_internal): Handle and put in error checks > for stack protector guard options. > (arm_option_reconfigure_globals): Likewise > (arm_stack_protect_tls_canary_mem): New > (arm_stack_protect_guard): New > * config/arm/arm.md (stack_protect_set): New > (stack_protect_set_tls): Likewise > (stack_protect_test): Likewise > (stack_protect_test_tls): Likewise > * config/arm/arm.opt (-mstack-protector-guard): New > (-mstack-protector-guard-offset): New. > > Signed-off-by: Ard Biesheuvel > --- > gcc/config/arm/arm-opts.h | 6 ++ > gcc/config/arm/arm-protos.h | 2 + > gcc/config/arm/arm.c| 52 > gcc/config/arm/arm.md | 62 +++- > gcc/config/arm/arm.opt | 22 +++ > gcc/doc/invoke.texi | 9 +++ > 6 files changed, 151 insertions(+), 2 deletions(-) > > diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h > index 5c4b62f404f7..581ba3c4fbbb 100644 > --- a/gcc/config/arm/arm-opts.h > +++ b/gcc/config/arm/arm-opts.h > @@ -69,4 +69,10 @@ enum arm_tls_type { >TLS_GNU, >TLS_GNU2 > }; > + > +/* Where to get the canary for the stack protector. */ > +enum stack_protector_guard { > + SSP_TLSREG, /* per-thread canary in TLS register */ > + SSP_GLOBAL /* global canary */ > +}; > #endif > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > index 9b1f61394ad7..37e80256a78d 100644 > --- a/gcc/config/arm/arm-protos.h > +++ b/gcc/config/arm/arm-protos.h > @@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, > rtx, rtx, rtx, rtx); > extern rtx arm_load_tp (rtx); > extern bool arm_coproc_builtin_available (enum unspecv); > extern bool arm_coproc_ldc_stc_legitimate_address (rtx); > +extern rtx arm_stack_protect_tls_canary_mem (void); > + > > #if defined TREE_CODE > extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree); > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index c4ff06b087eb..0bf06e764dbb 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] = > > #undef TARGET_MD_ASM_ADJUST > #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust > + > +#undef TARGET_STACK_PROTECT_GUARD > +#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard > > /* Obstack for minipool constant handling. */ > static struct obstack minipool_obstack; > @@ -3155,6 +3158,26 @@ arm_option_override_internal (struct gcc_options *opts, >if (TARGET_THUMB2_P (opts->x_target_flags)) > opts->x_inline_asm_unified = true; > > + if (arm_stack_protector_guard == SSP_GLOBAL > + && opts->x_arm_stack_protector_guard_offset_str) > +{ > + error ("incompatible options %'-mstack-protector-guard=global%' and" > +"%'-mstack-protector-guard-offset=%qs%'", > +arm_stack_protector_guard_offset_str); > +} > + > + if (opts->x_arm_stack_protector_guard_offset_str) > +{ > + char *end; > + const char *str = arm_stack_protector_guard_offset_str; > + errno = 0; > + long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0); > + if (!*str || *end || errno) > + error ("%qs is not a valid offset in %qs", str, > + "-mstack-protector-guard-offset="); > + arm_stack_protector_guard_offset = offs; > +} > + > #ifdef SUBTARGET_OVERRID
[PATCH v3 0/1] implement TLS register based stack canary for ARM
Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352 In the Linux kernel, user processes calling into the kernel are essentially threads running in the same address space, of a program that never terminates. This means that using a global variable for the stack protector canary value is problematic on SMP systems, as we can never change it unless we reboot the system. (Processes that sleep for any reason will do so on a call into the kernel, which means that there will always be live kernel stack frames carrying copies of the canary taken when the function was entered) AArch64 implements -mstack-protector-guard=sysreg for this purpose, as this permits the kernel to use different memory addresses for the stack canary for each CPU, and context switch the chosen system register with the rest of the process, allowing each process to use its own unique value for the stack canary. This patch implements something similar, but for the 32-bit ARM kernel, which will start using the user space TLS register TPIDRURO to index per-process metadata while running in the kernel. This means we can just add an offset to TPIDRURO to obtain the address from which to load the canary value. As for the spilling issues that have been fixed in this code in the past: I suppose a register carrying the TLS register value will never get spilled to begin with? Changes since v2: - fix the template for stack_protect_test_tls so it correctly conveys the fact that it sets the Z flag Comments/suggestions welcome. Cc: Keith Packard Cc: thomas.preudho...@celest.fr Cc: adhemerval.zane...@linaro.org Cc: Qing Zhao Cc: Richard Sandiford Cc: gcc-patches@gcc.gnu.org Ard Biesheuvel (1): [ARM] Add support for TLS register based stack protector canary access gcc/config/arm/arm-opts.h | 6 ++ gcc/config/arm/arm-protos.h | 2 + gcc/config/arm/arm.c| 52 gcc/config/arm/arm.md | 62 +++- gcc/config/arm/arm.opt | 22 +++ gcc/doc/invoke.texi | 9 +++ 6 files changed, 151 insertions(+), 2 deletions(-) -- 2.30.2
[PATCH v3 1/1] [ARM] Add support for TLS register based stack protector canary access
Add support for accessing the stack canary value via the TLS register, so that multiple threads running in the same address space can use distinct canary values. This is intended for the Linux kernel running in SMP mode, where processes entering the kernel are essentially threads running the same program concurrently: using a global variable for the canary in that context is problematic because it can never be rotated, and so the OS is forced to use the same value as long as it remains up. Using the TLS register to index the stack canary helps with this, as it allows each CPU to context switch the TLS register along with the rest of the process, permitting each process to use its own value for the stack canary. 2021-10-21 Ard Biesheuvel * config/arm/arm-opts.h (enum stack_protector_guard): New * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem): New * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define (arm_option_override_internal): Handle and put in error checks for stack protector guard options. (arm_option_reconfigure_globals): Likewise (arm_stack_protect_tls_canary_mem): New (arm_stack_protect_guard): New * config/arm/arm.md (stack_protect_set): New (stack_protect_set_tls): Likewise (stack_protect_test): Likewise (stack_protect_test_tls): Likewise * config/arm/arm.opt (-mstack-protector-guard): New (-mstack-protector-guard-offset): New. Signed-off-by: Ard Biesheuvel --- gcc/config/arm/arm-opts.h | 6 ++ gcc/config/arm/arm-protos.h | 2 + gcc/config/arm/arm.c| 52 gcc/config/arm/arm.md | 62 +++- gcc/config/arm/arm.opt | 22 +++ gcc/doc/invoke.texi | 9 +++ 6 files changed, 151 insertions(+), 2 deletions(-) diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h index 5c4b62f404f7..581ba3c4fbbb 100644 --- a/gcc/config/arm/arm-opts.h +++ b/gcc/config/arm/arm-opts.h @@ -69,4 +69,10 @@ enum arm_tls_type { TLS_GNU, TLS_GNU2 }; + +/* Where to get the canary for the stack protector. */ +enum stack_protector_guard { + SSP_TLSREG, /* per-thread canary in TLS register */ + SSP_GLOBAL /* global canary */ +}; #endif diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 9b1f61394ad7..37e80256a78d 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx); extern rtx arm_load_tp (rtx); extern bool arm_coproc_builtin_available (enum unspecv); extern bool arm_coproc_ldc_stc_legitimate_address (rtx); +extern rtx arm_stack_protect_tls_canary_mem (void); + #if defined TREE_CODE extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index c4ff06b087eb..0bf06e764dbb 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_MD_ASM_ADJUST #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust + +#undef TARGET_STACK_PROTECT_GUARD +#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard /* Obstack for minipool constant handling. */ static struct obstack minipool_obstack; @@ -3155,6 +3158,26 @@ arm_option_override_internal (struct gcc_options *opts, if (TARGET_THUMB2_P (opts->x_target_flags)) opts->x_inline_asm_unified = true; + if (arm_stack_protector_guard == SSP_GLOBAL + && opts->x_arm_stack_protector_guard_offset_str) +{ + error ("incompatible options %'-mstack-protector-guard=global%' and" +"%'-mstack-protector-guard-offset=%qs%'", +arm_stack_protector_guard_offset_str); +} + + if (opts->x_arm_stack_protector_guard_offset_str) +{ + char *end; + const char *str = arm_stack_protector_guard_offset_str; + errno = 0; + long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0); + if (!*str || *end || errno) + error ("%qs is not a valid offset in %qs", str, + "-mstack-protector-guard-offset="); + arm_stack_protector_guard_offset = offs; +} + #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS SUBTARGET_OVERRIDE_INTERNAL_OPTIONS; #endif @@ -3822,6 +3845,10 @@ arm_option_reconfigure_globals (void) else target_thread_pointer = TP_SOFT; } + + if (arm_stack_protector_guard == SSP_TLSREG + && target_thread_pointer != TP_CP15) +error("%'-mstack-protector-guard=tls%' needs a hardware TLS register"); } /* Perform some validation between the desired architecture and the rest of the @@ -8087,6 +8114,19 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg, } +rtx +arm_stack_p
[PATCH v4 0/1] implement TLS register based stack canary for ARM
Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352 In the Linux kernel, user processes calling into the kernel are essentially threads running in the same address space, of a program that never terminates. This means that using a global variable for the stack protector canary value is problematic on SMP systems, as we can never change it unless we reboot the system. (Processes that sleep for any reason will do so on a call into the kernel, which means that there will always be live kernel stack frames carrying copies of the canary taken when the function was entered) AArch64 implements -mstack-protector-guard=sysreg for this purpose, as this permits the kernel to use different memory addresses for the stack canary for each CPU, and context switch the chosen system register with the rest of the process, allowing each process to use its own unique value for the stack canary. This patch implements something similar, but for the 32-bit ARM kernel, which will start using the user space TLS register TPIDRURO to index per-process metadata while running in the kernel. This means we can just add an offset to TPIDRURO to obtain the address from which to load the canary value. Changes since v3: - force a reload of the TLS register before performing the stack protector check, so that we never rely on the stack for the address of the canary Changes since v2: - fix the template for stack_protect_test_tls so it correctly conveys the fact that it sets the Z flag Comments/suggestions welcome. Cc: Keith Packard Cc: thomas.preudho...@celest.fr Cc: adhemerval.zane...@linaro.org Cc: Qing Zhao Cc: Richard Sandiford Cc: gcc-patches@gcc.gnu.org Ard Biesheuvel (1): [ARM] Add support for TLS register based stack protector canary access gcc/config/arm/arm-opts.h | 6 ++ gcc/config/arm/arm-protos.h | 2 + gcc/config/arm/arm.c| 55 +++ gcc/config/arm/arm.md | 71 +++- gcc/config/arm/arm.opt | 22 ++ gcc/doc/invoke.texi | 9 +++ 6 files changed, 163 insertions(+), 2 deletions(-) -- 2.30.2
[PATCH v4 1/1] [ARM] Add support for TLS register based stack protector canary access
Add support for accessing the stack canary value via the TLS register, so that multiple threads running in the same address space can use distinct canary values. This is intended for the Linux kernel running in SMP mode, where processes entering the kernel are essentially threads running the same program concurrently: using a global variable for the canary in that context is problematic because it can never be rotated, and so the OS is forced to use the same value as long as it remains up. Using the TLS register to index the stack canary helps with this, as it allows each CPU to context switch the TLS register along with the rest of the process, permitting each process to use its own value for the stack canary. 2021-10-28 Ard Biesheuvel * config/arm/arm-opts.h (enum stack_protector_guard): New * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem): New * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define (arm_option_override_internal): Handle and put in error checks for stack protector guard options. (arm_option_reconfigure_globals): Likewise (arm_stack_protect_tls_canary_mem): New (arm_stack_protect_guard): New * config/arm/arm.md (stack_protect_set): New (stack_protect_set_tls): Likewise (stack_protect_test): Likewise (stack_protect_test_tls): Likewise (reload_tp_hard): Likewise * config/arm/arm.opt (-mstack-protector-guard): New (-mstack-protector-guard-offset): New. * doc/invoke.texi: Document new options Signed-off-by: Ard Biesheuvel --- gcc/config/arm/arm-opts.h | 6 ++ gcc/config/arm/arm-protos.h | 2 + gcc/config/arm/arm.c| 55 +++ gcc/config/arm/arm.md | 71 +++- gcc/config/arm/arm.opt | 22 ++ gcc/doc/invoke.texi | 9 +++ 6 files changed, 163 insertions(+), 2 deletions(-) diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h index 5c4b62f404f7..581ba3c4fbbb 100644 --- a/gcc/config/arm/arm-opts.h +++ b/gcc/config/arm/arm-opts.h @@ -69,4 +69,10 @@ enum arm_tls_type { TLS_GNU, TLS_GNU2 }; + +/* Where to get the canary for the stack protector. */ +enum stack_protector_guard { + SSP_TLSREG, /* per-thread canary in TLS register */ + SSP_GLOBAL /* global canary */ +}; #endif diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 9b1f61394ad7..d8d605920c97 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx); extern rtx arm_load_tp (rtx); extern bool arm_coproc_builtin_available (enum unspecv); extern bool arm_coproc_ldc_stc_legitimate_address (rtx); +extern rtx arm_stack_protect_tls_canary_mem (bool); + #if defined TREE_CODE extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index c4ff06b087eb..6a659d81a6fe 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_MD_ASM_ADJUST #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust + +#undef TARGET_STACK_PROTECT_GUARD +#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard /* Obstack for minipool constant handling. */ static struct obstack minipool_obstack; @@ -3155,6 +3158,26 @@ arm_option_override_internal (struct gcc_options *opts, if (TARGET_THUMB2_P (opts->x_target_flags)) opts->x_inline_asm_unified = true; + if (arm_stack_protector_guard == SSP_GLOBAL + && opts->x_arm_stack_protector_guard_offset_str) +{ + error ("incompatible options %'-mstack-protector-guard=global%' and" +"%'-mstack-protector-guard-offset=%qs%'", +arm_stack_protector_guard_offset_str); +} + + if (opts->x_arm_stack_protector_guard_offset_str) +{ + char *end; + const char *str = arm_stack_protector_guard_offset_str; + errno = 0; + long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0); + if (!*str || *end || errno) + error ("%qs is not a valid offset in %qs", str, + "-mstack-protector-guard-offset="); + arm_stack_protector_guard_offset = offs; +} + #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS SUBTARGET_OVERRIDE_INTERNAL_OPTIONS; #endif @@ -3822,6 +3845,10 @@ arm_option_reconfigure_globals (void) else target_thread_pointer = TP_SOFT; } + + if (arm_stack_protector_guard == SSP_TLSREG + && target_thread_pointer != TP_CP15) +error("%'-mstack-protector-guard=tls%' needs a hardware TLS register"); } /* Perform some validation between the desired architecture and the rest of the @@ -8087,6 +8114,22 @@ legitimize_pic_
[PATCH] i386: Honour -mdirect-extern-access when calling __fentry__
The small and medium PIC code models generate profiling calls that always load the address of __fentry__() via the GOT, even if -mdirect-extern-access is in effect. This deviates from the behavior with respect to other external references, and results in a longer opcode that relies on linker relaxation to eliminate the GOT load. In this particular case, the transformation replaces an indirect 'CALL *__fentry__@GOTPCREL(%rip)' with either 'CALL __fentry__; NOP' or 'NOP; CALL __fentry__', where the NOP is a 1 byte NOP that preserves the 6 byte length of the sequence. This is problematic for the Linux kernel, which generally relies on -mdirect-extern-access and hidden visibility to eliminate GOT based symbol references in code generated with -fpie/-fpic, without having to depend on linker relaxation. The Linux kernel relies on code patching to replace these opcodes with NOPs at runtime, and this is complicated code that we'd prefer not to complicate even more by adding support for patching both 5 and 6 byte sequences as well as parsing the instruction stream to decide which variant of CALL+NOP we are dealing with. So let's honour -mdirect-extern-access, and only load the address of __fentry__ via the GOT if direct references to external symbols are not permitted. Note that the GOT reference in question is in fact a data reference: we explicitly load the address of __fentry__ from the GOT, which amounts to eager binding, rather than emitting a PLT call that could bind eagerly, lazily or directly at link time. gcc/ChangeLog: * config/i386/i386.cc (x86_function_profiler): Take ix86_direct_extern_access into account when generating calls to __fentry__() Cc: H.J. Lu Cc: Jakub Jelinek Cc: Richard Biener Cc: Uros Bizjak Cc: Hou Wenlong --- gcc/config/i386/i386.cc | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index b1d08ecdb3d44729..69b183abb4318b0a 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -21836,8 +21836,12 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) break; case CM_SMALL_PIC: case CM_MEDIUM_PIC: - fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", mcount_name); - break; + if (!ix86_direct_extern_access) + { + fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", mcount_name); + break; + } + /* fall through */ default: x86_print_call_or_nop (file, mcount_name); break; -- 2.39.2
Re: [PATCH v4 1/1] [ARM] Add support for TLS register based stack protector canary access
On Wed, 17 Nov 2021 at 10:09, Ramana Radhakrishnan wrote: > > Thanks Ard and Qing. > > > > I have been busy with other things in the last few weeks and I don’t work on > GCC as part of my day job : however I’ll try to find some time to review this > patch set in the coming days. > > Hello Ramana, Please let us know if you will be able to spend time on this. I fully understand if not, as you moved on quite a while ago, but in that case, let's just own up to it, and I will go and badger someone else to get this reviewed :-) -- Ard.
Re: [PATCH v5 1/1] [ARM] Add support for TLS register based stack protector canary access
On Wed, 17 Nov 2021 at 18:12, Ard Biesheuvel wrote: > > (+ Ramana) > Ping? > On Mon, 15 Nov 2021 at 19:04, Ard Biesheuvel wrote: > > > > Add support for accessing the stack canary value via the TLS register, > > so that multiple threads running in the same address space can use > > distinct canary values. This is intended for the Linux kernel running in > > SMP mode, where processes entering the kernel are essentially threads > > running the same program concurrently: using a global variable for the > > canary in that context is problematic because it can never be rotated, > > and so the OS is forced to use the same value as long as it remains up. > > > > Using the TLS register to index the stack canary helps with this, as it > > allows each CPU to context switch the TLS register along with the rest > > of the process, permitting each process to use its own value for the > > stack canary. > > > > 2021-11-15 Ard Biesheuvel > > > > * config/arm/arm-opts.h (enum stack_protector_guard): New > > * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem): > > New > > * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define > > (arm_option_override_internal): Handle and put in error checks > > for stack protector guard options. > > (arm_option_reconfigure_globals): Likewise > > (arm_stack_protect_tls_canary_mem): New > > (arm_stack_protect_guard): New > > * config/arm/arm.md (stack_protect_set): New > > (stack_protect_set_tls): Likewise > > (stack_protect_test): Likewise > > (stack_protect_test_tls): Likewise > > (reload_tp_hard): Likewise > > * config/arm/arm.opt (-mstack-protector-guard): New > > (-mstack-protector-guard-offset): New. > > * doc/invoke.texi: Document new options > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/arm/stack-protector-7.c: New test. > > * gcc.target/arm/stack-protector-8.c: New test. > > > > Signed-off-by: Ard Biesheuvel > > --- > > gcc/config/arm/arm-opts.h| 6 ++ > > gcc/config/arm/arm-protos.h | 2 + > > gcc/config/arm/arm.c | 55 +++ > > gcc/config/arm/arm.md| 71 +++- > > gcc/config/arm/arm.opt | 22 ++ > > gcc/doc/invoke.texi | 11 +++ > > gcc/testsuite/gcc.target/arm/stack-protector-7.c | 10 +++ > > gcc/testsuite/gcc.target/arm/stack-protector-8.c | 5 ++ > > 8 files changed, 180 insertions(+), 2 deletions(-) > > > > diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h > > index 5c4b62f404f7..581ba3c4fbbb 100644 > > --- a/gcc/config/arm/arm-opts.h > > +++ b/gcc/config/arm/arm-opts.h > > @@ -69,4 +69,10 @@ enum arm_tls_type { > >TLS_GNU, > >TLS_GNU2 > > }; > > + > > +/* Where to get the canary for the stack protector. */ > > +enum stack_protector_guard { > > + SSP_TLSREG, /* per-thread canary in TLS register */ > > + SSP_GLOBAL /* global canary */ > > +}; > > #endif > > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > > index 9b1f61394ad7..d8d605920c97 100644 > > --- a/gcc/config/arm/arm-protos.h > > +++ b/gcc/config/arm/arm-protos.h > > @@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, rtx, > > rtx, rtx, rtx, rtx, rtx); > > extern rtx arm_load_tp (rtx); > > extern bool arm_coproc_builtin_available (enum unspecv); > > extern bool arm_coproc_ldc_stc_legitimate_address (rtx); > > +extern rtx arm_stack_protect_tls_canary_mem (bool); > > + > > > > #if defined TREE_CODE > > extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree); > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > > index a5b403eb3e49..e5077348ce07 100644 > > --- a/gcc/config/arm/arm.c > > +++ b/gcc/config/arm/arm.c > > @@ -829,6 +829,9 @@ static const struct attribute_spec > > arm_attribute_table[] = > > > > #undef TARGET_MD_ASM_ADJUST > > #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust > > + > > +#undef TARGET_STACK_PROTECT_GUARD > > +#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard > > > > /* Obstack for minipool constant handling. */ > > static struct obstack minipool_obstack; > > @@ -3176,6 +3179,26 @@ arm_option_override_in
Re: [PATCH v5 1/1] [ARM] Add support for TLS register based stack protector canary access
(+ Richard Earnshaw) On Wed, 12 Jan 2022 at 19:29, Ard Biesheuvel wrote: > > On Wed, 17 Nov 2021 at 18:12, Ard Biesheuvel wrote: > > > > (+ Ramana) > > > > Ping? > > > On Mon, 15 Nov 2021 at 19:04, Ard Biesheuvel wrote: > > > > > > Add support for accessing the stack canary value via the TLS register, > > > so that multiple threads running in the same address space can use > > > distinct canary values. This is intended for the Linux kernel running in > > > SMP mode, where processes entering the kernel are essentially threads > > > running the same program concurrently: using a global variable for the > > > canary in that context is problematic because it can never be rotated, > > > and so the OS is forced to use the same value as long as it remains up. > > > > > > Using the TLS register to index the stack canary helps with this, as it > > > allows each CPU to context switch the TLS register along with the rest > > > of the process, permitting each process to use its own value for the > > > stack canary. > > > > > > 2021-11-15 Ard Biesheuvel > > > > > > * config/arm/arm-opts.h (enum stack_protector_guard): New > > > * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem): > > > New > > > * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define > > > (arm_option_override_internal): Handle and put in error checks > > > for stack protector guard options. > > > (arm_option_reconfigure_globals): Likewise > > > (arm_stack_protect_tls_canary_mem): New > > > (arm_stack_protect_guard): New > > > * config/arm/arm.md (stack_protect_set): New > > > (stack_protect_set_tls): Likewise > > > (stack_protect_test): Likewise > > > (stack_protect_test_tls): Likewise > > > (reload_tp_hard): Likewise > > > * config/arm/arm.opt (-mstack-protector-guard): New > > > (-mstack-protector-guard-offset): New. > > > * doc/invoke.texi: Document new options > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/arm/stack-protector-7.c: New test. > > > * gcc.target/arm/stack-protector-8.c: New test. > > > > > > Signed-off-by: Ard Biesheuvel > > > --- > > > gcc/config/arm/arm-opts.h| 6 ++ > > > gcc/config/arm/arm-protos.h | 2 + > > > gcc/config/arm/arm.c | 55 +++ > > > gcc/config/arm/arm.md| 71 > > > +++- > > > gcc/config/arm/arm.opt | 22 ++ > > > gcc/doc/invoke.texi | 11 +++ > > > gcc/testsuite/gcc.target/arm/stack-protector-7.c | 10 +++ > > > gcc/testsuite/gcc.target/arm/stack-protector-8.c | 5 ++ > > > 8 files changed, 180 insertions(+), 2 deletions(-) > > > > > > diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h > > > index 5c4b62f404f7..581ba3c4fbbb 100644 > > > --- a/gcc/config/arm/arm-opts.h > > > +++ b/gcc/config/arm/arm-opts.h > > > @@ -69,4 +69,10 @@ enum arm_tls_type { > > >TLS_GNU, > > >TLS_GNU2 > > > }; > > > + > > > +/* Where to get the canary for the stack protector. */ > > > +enum stack_protector_guard { > > > + SSP_TLSREG, /* per-thread canary in TLS register */ > > > + SSP_GLOBAL /* global canary */ > > > +}; > > > #endif > > > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > > > index 9b1f61394ad7..d8d605920c97 100644 > > > --- a/gcc/config/arm/arm-protos.h > > > +++ b/gcc/config/arm/arm-protos.h > > > @@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, rtx, > > > rtx, rtx, rtx, rtx, rtx); > > > extern rtx arm_load_tp (rtx); > > > extern bool arm_coproc_builtin_available (enum unspecv); > > > extern bool arm_coproc_ldc_stc_legitimate_address (rtx); > > > +extern rtx arm_stack_protect_tls_canary_mem (bool); > > > + > > > > > > #if defined TREE_CODE > > > extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, > > > tree); > > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > > > index a5b403eb3e49..e5077348ce07 100644 > > > --- a/gc