Re: [RFC/RFT,V2] CFI: Add support for gcc CFI in aarch64

2023-12-13 Thread Dan Li
+ Likun

On Tue, 28 Mar 2023 at 06:18, Sami Tolvanen  wrote:
>
> On Mon, Mar 27, 2023 at 2:30 AM Peter Zijlstra  wrote:
> >
> > On Sat, Mar 25, 2023 at 01:54:16AM -0700, Dan Li wrote:
> >
> > > In the compiler part[4], most of the content is the same as Sami's
> > > implementation[3], except for some minor differences, mainly including:
> > >
> > > 1. The function typeid is calculated differently and it is difficult
> > > to be consistent.
> >
> > This means there is an effective ABI break between the compilers, which
> > is sad :-( Is there really nothing to be done about this?
>
> I agree, this would be unfortunate, and would also be a compatibility
> issue with rustc where there's ongoing work to support
> clang-compatible CFI type hashes:
>
> https://github.com/rust-lang/rust/pull/105452
>
> Sami


Re: [RFC/RFT, V2 0/3] Add compiler support for Kernel Control Flow Integrity

2023-07-19 Thread Dan Li via Gcc-patches
Hi Kees,

Sincerely sorry, I just saw this email.
Embarrassingly, due to another job change, my plan was postponed again :(.

I may not be able to attend this year's GCC meeting. Is there any other
way to let this get some traction in GCC? I really hope someone can help
with this topic.

BTW, I'm still looking at this and plan to finish it by the end of this
year, but it's taking too long and there's a lot of uncertainty, so
please just consider this only as a backup option.

Thanks,
Dan.

On Thu, 22 Jun 2023 at 05:54, Kees Cook  wrote:
>
> On Sat, Mar 25, 2023 at 01:11:14AM -0700, Dan Li wrote:
> > This series of patches is mainly used to support the control flow
> > integrity protection of the linux kernel [1], which is similar to
> > -fsanitize=kcfi in clang 16.0 [2,3].
> >
> > Any suggestion please let me know :).
>
> Hi Dan,
>
> It's been a couple months, and I didn't see any other feedback on this
> proposal. I was curious what the status of this work is. Are you able to
> attend GNU Cauldron[1] this year? I'd love to see this get some traction
> in GCC.
>
> Thanks!
>
> -Kees
>
> [1] https://gcc.gnu.org/wiki/cauldron2023
>
> --
> Kees Cook


Re: [RFC/RFT, V2 0/3] Add compiler support for Kernel Control Flow Integrity

2023-07-19 Thread Dan Li via Gcc-patches
Hi All,

Embarrassingly, due to personal reasons, I may not be able to complete
the series of patches on the forward side of GCC CFI for the time being.

Please forgive me for not realizing that I should have sent this help
email a long time ago :(

This topic has been delayed for a long time, and I would be very grateful
if someone can help complete this series of patches.

BTW, please let me know if there are more groups I can cc for help.

Thanks!
Dan.

On Sat, 25 Mar 2023 at 16:11, Dan Li  wrote:
>
> This series of patches is mainly used to support the control flow
> integrity protection of the linux kernel [1], which is similar to
> -fsanitize=kcfi in clang 16.0 [2,3].
>
> Any suggestion please let me know :).
>
> Thanks, Dan.
>
> [1] 
> https://lore.kernel.org/all/20220908215504.3686827-1-samitolva...@google.com/
> [2] https://clang.llvm.org/docs/ControlFlowIntegrity.html
> [3] https://reviews.llvm.org/D119296
>
> Signed-off-by: Dan Li 
>
> ---
> Dan Li (3):
>   [PR102768] flag-types.h (enum sanitize_code): Extend sanitize_code to
> 64 bits to support more features
>   [PR102768] Support CFI: Add basic support for Kernel Control Flow
> Integrity
>   [PR102768] aarch64: Add support for Kernel Control Flow Integrity
>
>  gcc/asan.h|   4 +-
>  gcc/c-family/c-attribs.cc |  10 +-
>  gcc/c-family/c-common.h   |   2 +-
>  gcc/c/c-parser.cc |   4 +-
>  gcc/cfgexpand.cc  |  26 ++
>  gcc/cgraphunit.cc |  34 +++
>  gcc/combine.cc|   1 +
>  gcc/common.opt|   4 +-
>  gcc/config/aarch64/aarch64.cc | 166 ++
>  gcc/cp/typeck.cc  |   2 +-
>  gcc/doc/invoke.texi   |  36 
>  gcc/doc/tm.texi   |  27 ++
>  gcc/doc/tm.texi.in|   8 ++
>  gcc/dwarf2asm.cc  |   2 +-
>  gcc/emit-rtl.cc   |   1 +
>  gcc/emit-rtl.h|   4 +
>  gcc/final.cc  |  24 -
>  gcc/flag-types.h  |  67 +++---
>  gcc/gimple.cc |  11 +++
>  gcc/gimple.h  |   5 +-
>  gcc/opt-suggestions.cc|   2 +-
>  gcc/opts.cc   |  26 +++---
>  gcc/opts.h|   8 +-
>  gcc/output.h  |   3 +
>  gcc/reg-notes.def |   1 +
>  gcc/target.def|  38 
>  gcc/toplev.cc |   4 +
>  gcc/tree-cfg.cc   |   2 +-
>  gcc/tree.cc   | 144 +
>  gcc/tree.h|   1 +
>  gcc/varasm.cc |  26 ++
>  31 files changed, 627 insertions(+), 66 deletions(-)
>
> --
> 2.17.1
>


[PING] AArch64: add R30_REGNUM into shrink-wrapping separate

2022-03-14 Thread Dan Li via Gcc-patches

Gentile ping for this :), thanks.

Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590906.html


R30_REGNUM could also be used as a component in shrink-wrapping
separate, this patch enables it in aarch64.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_get_separate_components):
Remove bitmap clear of R30_REGNUM.
(aarch64_components_for_bb): Support R30_REGNUM as a component.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shrink_wrap_separate_1.c: New test.


---
 gcc/config/aarch64/aarch64.cc   |  4 ++--
 .../gcc.target/aarch64/shrink_wrap_separate_1.c | 17 +
 2 files changed, 19 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 8bcee8be9eb..6e1589b0312 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8463,7 +8463,6 @@ aarch64_get_separate_components (void)
   if (reg1 != INVALID_REGNUM)
 bitmap_clear_bit (components, reg1);
 
-  bitmap_clear_bit (components, LR_REGNUM);

   bitmap_clear_bit (components, SP_REGNUM);
 
   return components;

@@ -8500,7 +8499,8 @@ aarch64_components_for_bb (basic_block bb)
   /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
   for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
 if (!fixed_regs[regno]
-   && !crtl->abi->clobbers_full_reg_p (regno)
+   && (regno == R30_REGNUM
+   || !crtl->abi->clobbers_full_reg_p (regno))
&& (TEST_HARD_REG_BIT (extra_caller_saves, regno)
|| bitmap_bit_p (in, regno)
|| bitmap_bit_p (gen, regno)
diff --git a/gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c 
b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c
new file mode 100644
index 000..34002705ace
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fomit-frame-pointer -fdump-rtl-pro_and_epilogue" } */
+
+void f();
+
+int g(int x)
+{
+  if (x == 0)
+{
+  __asm__ ("":::"x19", "x20");
+  return 1;
+}
+  f();
+  return 2;
+}
+
+/* { dg-final { scan-rtl-dump {The components we wrap separately are \[sep 30\]} 
"pro_and_epilogue"  } } */
--
2.17.1


[PING^2] AArch64: add R30_REGNUM into shrink-wrapping separate

2022-03-28 Thread Dan Li via Gcc-patches

Gentile ping for this :), thanks.

Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590906.html


R30_REGNUM could also be used as a component in shrink-wrapping
separate, this patch enables it in aarch64.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_get_separate_components):
Remove bitmap clear of R30_REGNUM.
(aarch64_components_for_bb): Support R30_REGNUM as a component.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shrink_wrap_separate_1.c: New test.


---
 gcc/config/aarch64/aarch64.cc   |  4 ++--
 .../gcc.target/aarch64/shrink_wrap_separate_1.c | 17 +
 2 files changed, 19 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 8bcee8be9eb..6e1589b0312 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8463,7 +8463,6 @@ aarch64_get_separate_components (void)
   if (reg1 != INVALID_REGNUM)
 bitmap_clear_bit (components, reg1);
 
-  bitmap_clear_bit (components, LR_REGNUM);

   bitmap_clear_bit (components, SP_REGNUM);
 
   return components;

@@ -8500,7 +8499,8 @@ aarch64_components_for_bb (basic_block bb)
   /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
   for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
 if (!fixed_regs[regno]
-   && !crtl->abi->clobbers_full_reg_p (regno)
+   && (regno == R30_REGNUM
+   || !crtl->abi->clobbers_full_reg_p (regno))
&& (TEST_HARD_REG_BIT (extra_caller_saves, regno)
|| bitmap_bit_p (in, regno)
|| bitmap_bit_p (gen, regno)
diff --git a/gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c 
b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c
new file mode 100644
index 000..34002705ace
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fomit-frame-pointer -fdump-rtl-pro_and_epilogue" } */
+
+void f();
+
+int g(int x)
+{
+  if (x == 0)
+{
+  __asm__ ("":::"x19", "x20");
+  return 1;
+}
+  f();
+  return 2;
+}
+
+/* { dg-final { scan-rtl-dump {The components we wrap separately are \[sep 30\]} 
"pro_and_epilogue"  } } */
--
2.17.1


[PING^3] AArch64: add R30 into shrink-wrapping separate

2022-04-05 Thread Dan Li via Gcc-patches

Gentile ping for this :), thanks.

Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590906.html


R30_REGNUM could also be used as a component in shrink-wrapping
separate, this patch enables it in aarch64.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_get_separate_components):
Remove bitmap clear of R30_REGNUM.
(aarch64_components_for_bb): Support R30_REGNUM as a component.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shrink_wrap_separate_1.c: New test.


---
 gcc/config/aarch64/aarch64.cc   |  4 ++--
 .../gcc.target/aarch64/shrink_wrap_separate_1.c | 17 +
 2 files changed, 19 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 8bcee8be9eb..6e1589b0312 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8463,7 +8463,6 @@ aarch64_get_separate_components (void)
   if (reg1 != INVALID_REGNUM)
 bitmap_clear_bit (components, reg1);
 
-  bitmap_clear_bit (components, LR_REGNUM);

   bitmap_clear_bit (components, SP_REGNUM);
 
   return components;

@@ -8500,7 +8499,8 @@ aarch64_components_for_bb (basic_block bb)
   /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
   for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
 if (!fixed_regs[regno]
-   && !crtl->abi->clobbers_full_reg_p (regno)
+   && (regno == R30_REGNUM
+   || !crtl->abi->clobbers_full_reg_p (regno))
&& (TEST_HARD_REG_BIT (extra_caller_saves, regno)
|| bitmap_bit_p (in, regno)
|| bitmap_bit_p (gen, regno)
diff --git a/gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c 
b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c
new file mode 100644
index 000..34002705ace
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fomit-frame-pointer -fdump-rtl-pro_and_epilogue" } */
+
+void f();
+
+int g(int x)
+{
+  if (x == 0)
+{
+  __asm__ ("":::"x19", "x20");
+  return 1;
+}
+  f();
+  return 2;
+}
+
+/* { dg-final { scan-rtl-dump {The components we wrap separately are \[sep 30\]} 
"pro_and_epilogue"  } } */
--
2.17.1


Re: [PING] AArch64: add R30_REGNUM into shrink-wrapping separate

2022-04-15 Thread Dan Li via Gcc-patches




On 4/12/22 06:05, Richard Sandiford wrote:

Dan Li  writes:

Gentile ping for this :), thanks.

Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590906.html


Sorry, I should have realised this at the time, but I don't think
we can do this after all.  The ABI requires us to set up the frame
chain before assigning to the frame pointer.

Sorry again for the bogus suggestion.

Thanks,
Richard



OK, thanks Richard :)

But I think I still don't quite understand it, so could the x30 be
shrinked alone when we don't need the frame chain?

Thanks,
Dan.


[PING^3][PATCH,v2,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-18 Thread Dan Li via Gcc-patches

Gentile ping for this again.
Could someone please give me a quick reply first? :), thanks.

Link: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586204.html


Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768


Signed-off-by: Dan Li 

gcc/c-family/ChangeLog:

* c-attribs.c (handle_no_sanitize_shadow_call_stack_attribute):
New.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_shadow_call_stack_enabled):
New decl.
* config/aarch64/aarch64.c (aarch64_shadow_call_stack_enabled):
New.
(aarch64_expand_prologue):  Push x30 onto SCS before it's
pushed onto stack.
(aarch64_expand_epilogue):  Pop x30 frome SCS.
* config/aarch64/aarch64.h (TARGET_SUPPORT_SHADOW_CALL_STACK):
New macro.
(TARGET_CHECK_SCS_RESERVED_REGISTER):   Likewise.
* config/aarch64/aarch64.md (scs_push): New template.
(scs_pop):  Likewise.
* defaults.h (TARGET_SUPPORT_SHADOW_CALL_STACK):New macro.
* doc/extend.texi:  Document -fsanitize=shadow-call-stack.
* doc/invoke.texi:  Document attribute.
* flag-types.h (enum sanitize_code):Add
SANITIZE_SHADOW_CALL_STACK.
* opts-global.c (handle_common_deferred_options):   Add SCS
compile option check.
* opts.c (finish_options):  Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shadow_call_stack_1.c: New test.
* gcc.target/aarch64/shadow_call_stack_2.c: New test.
* gcc.target/aarch64/shadow_call_stack_3.c: New test.
* gcc.target/aarch64/shadow_call_stack_4.c: New test.
---
 gcc/c-family/c-attribs.c  | 21 +
 gcc/config/aarch64/aarch64-protos.h   |  1 +
 gcc/config/aarch64/aarch64.c  | 27 +++
 gcc/config/aarch64/aarch64.h  | 11 +
 gcc/config/aarch64/aarch64.md | 18 
 gcc/defaults.h|  4 ++
 gcc/doc/extend.texi   |  7 +++
 gcc/doc/invoke.texi   | 29 
 gcc/flag-types.h  |  2 +
 gcc/opts-global.c |  6 +++
 gcc/opts.c| 12 +
 .../gcc.target/aarch64/shadow_call_stack_1.c  |  6 +++
 .../gcc.target/aarch64/shadow_call_stack_2.c  |  6 +++
 .../gcc.target/aarch64/shadow_call_stack_3.c  | 45 +++
 .../gcc.target/aarch64/shadow_call_stack_4.c  | 18 
 15 files changed, 213 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 007b928c54b..9b3a35c06bf 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -56,6 +56,8 @@ static tree handle_cold_attribute (tree *, tree, tree, int, 
bool *);
 static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_sanitize_address_attribute (tree *, tree, tree,
  int, bool *);
+static tree handle_no_sanitize_shadow_call_stack_attribute (tree *, tree,
+ tree, int, bool *);
 static tree handle_no_sanitize_thread_attribute (tree *, tree, tree,
 int, bool *);
 static tree handle_no_address_safety_analysis_attribute (tree *, tree, tree,
@@ -454,6 +456,10 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_no_sanitize_attribute, NULL },
   { "no_sanitize_address",0, 0, true, false, false, false,
  handle_no_sanitize_address_attribute, NULL },
+  { "no_sanitize_shadow_call_stack",
+ 0, 0, true, false, false, false,
+ handle_no_sanitize_shadow_call_stack_attribute,
+ NULL },
   { "no_sanitize_thread", 0, 0, true, false, false, false,
  handle_no_sanitize_thread_attribute, NULL },
   { "no_sanitize_undefined",  0, 0, true, false, false, false,
@@ -1175,6 +1181,21 @@ handle_no_sanitize_address

Re: [PING^3][PATCH,v2,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-23 Thread Dan Li via Gcc-patches




On 1/20/22 04:02, Richard Sandiford wrote:

Thanks for the patch and sorry for the (very) slow review.


Thanks for the review, Richard :).


+/* Handle a "no_sanitize_shadow_call_stack" attribute; arguments as in
+   struct attribute_spec.handler.  */
+static tree
+handle_no_sanitize_shadow_call_stack_attribute (tree *node, tree name,
+ tree, int, bool *no_add_attrs)
+{
+  *no_add_attrs = true;
+  if (TREE_CODE (*node) != FUNCTION_DECL)
+warning (OPT_Wattributes, "%qE attribute ignored", name);
+  else
+add_no_sanitize_value (*node, SANITIZE_SHADOW_CALL_STACK);
+
+  return NULL_TREE;
+}
+


Do we need this?  I think these days the preference is to use the
general "no_sanitize" attribute with an argument, which is also the
syntax documented on the clang page.

We have to support no_sanitize_foo attributes for some of the early
sanitiser features, to avoid breaking backwards compatibility, but it
doesn't look like clang ever supported no_sanitize_shadow_call_stack.


Oh, "no_sanitize" is enough, I will delete this in the next version.


+/* Return TRUE if shadow call stack should be enabled for the current
+   function, otherwise return FALSE.  */
+
+bool
+aarch64_shadow_call_stack_enabled (void)
+{
+  /* This function should only be called after frame laid out.  */
+  gcc_assert (cfun->machine->frame.laid_out);
+
+  if (crtl->calls_eh_return)
+return false;
+
+  /* We only deal with a function if its LR is pushed onto stack
+ and attribute no_sanitize_shadow_call_stack is not specified.  */


(This would need to be updated if we do drop the specific attribute.)


Ok.


+  /* Push return address to shadow call stack.  */
+  if (aarch64_shadow_call_stack_enabled ())
+   emit_insn (gen_scs_push ());


Formatting nit: should only be indented by two spaces more than the “if”.
Same below.


Got it.


I guess doing it like this means that we also continue to store x30 to the
frame in the traditional way.  And that's probably necessary, given that
the saved x30 forms part of link chain.


Yes, we just added an extra insn to the prologue like:
+   str x30, [x18], #8
stp x29, x30, [sp, #-16]!
...


+
 if (flag_stack_usage_info)
   current_function_static_stack_size = constant_lower_bound (frame_size);
   
@@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall)

 RTX_FRAME_RELATED_P (insn) = 1;
   }
   
+  /* Pop return address from shadow call stack.  */

+  if (aarch64_shadow_call_stack_enabled ())
+   emit_insn (gen_scs_pop ());
+


This looks correct, but following on from the above, I guess we continue
to restore x30 from the frame in the traditional way first, and then
overwrite it with the SCS value.  I think the output code would be
slightly neater if we suppressed the first restore of x30.


Yes, the current epilogue is like:
...
ldp x29, x30, [sp], #16
+   ldr x30, [x18, #-8]!

I think may be we can keep the two x30 pops here, for two reasons:
1) The implementation of scs in clang is to pop x30 twice, it might be
better to be consistent with clang here[1].
2) If we keep the first restore of x30, it may provide some flexibility
for other scenarios. For example, we can directly patch the scs_push/pop
insns in the binary to turn it into a binary without scs;

[1] 
https://github.com/llvm/llvm-project/commit/f11eb3ebe77729426e562d7d4d7ebb1d5ff2e7c8


+/* This value represents whether the shadow call stack is implemented on
+   the target platform.  */
+#define TARGET_SUPPORT_SHADOW_CALL_STACK true
+
+#define TARGET_CHECK_SCS_RESERVED_REGISTER()   \
+  do { \
+if (!fixed_regs[R18_REGNUM])   \
+  error ("%<-fsanitize=shadow-call-stack%> only "  \
+"allowed with %<-ffixed-x18%>");   \
+  } while (0)
+


Please make these target hooks instead.  The first one can use DEFHOOKPOD.


Ok, I will add a field to gcc_target via DEFHOOKPOD.


+;; Save X30 in the X18-based POST_INC stack (consistent with clang).
+(define_insn "scs_push"
+  [(set (mem:DI (reg:DI R18_REGNUM)) (reg:DI R30_REGNUM))
+   (set (reg:DI R18_REGNUM) (plus:DI (reg:DI R18_REGNUM) (const_int 8)))]
+  ""
+  "str\\tx30, [x18], #8"
+  [(set_attr "type" "store_8")]
+)
+
+;; Load X30 form the X18-based PRE_DEC stack (consistent with clang).
+(define_insn "scs_pop"
+  [(set (reg:DI R18_REGNUM) (minus:DI (reg:DI R18_REGNUM) (const_int 8)))
+   (set (reg:DI R30_REGNUM) (mem:DI (reg:DI R18_REGNUM)))]
+  ""
+  "ldr\\tx30, [x18, #-8]!"
+  [(set_attr "type" "load_8")]
+)
+


The two SETs here work in parallel, with the define_insn as a whole
following a "read input operands, act, write output operands" sequence.
The (mem: …) address therefore sees the old value of r18 rather than the
new one.  Also, RTL rules say that subtractions need to be written as
additions of the negative.


Oh, sorry, I got it wrong here.


I think the pattern would therefore be something l

Re: [PING^3][PATCH,v2,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-25 Thread Dan Li via Gcc-patches




On 1/25/22 02:19, Richard Sandiford wrote:

Dan Li  writes:

+
  if (flag_stack_usage_info)
current_function_static_stack_size = constant_lower_bound (frame_size);

@@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall)

  RTX_FRAME_RELATED_P (insn) = 1;
}

+  /* Pop return address from shadow call stack.  */

+  if (aarch64_shadow_call_stack_enabled ())
+   emit_insn (gen_scs_pop ());
+


This looks correct, but following on from the above, I guess we continue
to restore x30 from the frame in the traditional way first, and then
overwrite it with the SCS value.  I think the output code would be
slightly neater if we suppressed the first restore of x30.


Yes, the current epilogue is like:
  ...
  ldp x29, x30, [sp], #16
+   ldr x30, [x18, #-8]!

I think may be we can keep the two x30 pops here, for two reasons:
1) The implementation of scs in clang is to pop x30 twice, it might be
better to be consistent with clang here[1].


This doesn't seem a very compelling reason on its own, unless it's
explicitly meant to be observable behaviour.  (But then, observed how?)



Well, probably sticking to pop x30 twice is not a good idea.
AFAICT, there doesn't seem to be an explicit requirement that
this behavior must be followed.

BTW:
Do we still need to emit the .cfi_restore 30 directive here if we
want to save a pop x30? (Sorry I'm not familiar enough with DWARF.)

Since the aarch64 linux kernel always enables -fno-omit-frame-pointer,
the generated assembly code may be as follows:

str x30, [x18], 8
ldp x29, x30, [sp], 16
..
ldr x29, [sp], 16
## Do we still need a .cfi_restore 30 here
.cfi_restore 29
.cfi_def_cfa_offset 0
ldr x30, [x18, -8]!
## Or may be a non-CFA based directive here
ret


2) If we keep the first restore of x30, it may provide some flexibility
for other scenarios. For example, we can directly patch the scs_push/pop
insns in the binary to turn it into a binary without scs;


Is that a supported (and ideally documented) use case?  If it is,
then it becomes important for correctness that the compiler emits
exactly the opcodes in the original patch.  If it isn't, then the
compiler can emit other instructions that have the same effect.



Oh, no, this is just a little trick that can be used for compatibility.
(Maybe some scenarios such as our company sometimes need to be
compatible with some non-open source binaries from different
manufacturers, two pops could make life easier :). )

If this is not a consideration for our community, please ignore
this request.


I'd like a definitive ruling on this from the kernel folks before
the patch goes in.



Ok, I'll cc some kernel folks to make sure I didn't miss something.


If binary patching is supposed to be possible then scs_push and
scs_pop *do* need to be separate define_insns.  But they also need
to have some magic unspec that differentiates them from normal
pushes and pops, e.g.:

   (set ...mem...
(unspec:DI [...reg...] UNSPEC_SCS_PUSH))

so that there is no chance that the pattern would be treated as
a normal move and optimised accordingly.



Yeah, this template looks more appropriate if it is to be treated
as a special directive.

Thanks for your suggestions,
Dan


Re: [PING^3][PATCH,v2,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-25 Thread Dan Li via Gcc-patches

Hi, all,

Sorry for bothering.

I'm trying to commit aarch64 scs code to the gcc and there is an issue
that I'm not sure about, could someone give me some suggestions?
(To avoid noise, I did't cc PING^3 [1] to the kernel mail list :) )

When clang enables scs, the following instructions are usually generated:

str x30, [x18], 8
ldp x29, x30, [sp], 16
..
ldp x29, x30, [sp], 16  ## x30 pop
ldr x30, [x18, -8]! ## x30 pop again
ret

The x30 register is popped twice here, Richard suggested that we can
omit the first x30 pop here.

AFAICT, it seems fine and also safe for SCS. But I'm not sure if I'm
missing something with the kernel, could someone give some suggestions?

The previous discussion can be found here [1].

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589257.html

Thanks a lot!
Dan

On 1/25/22 22:51, Dan Li wrote:



On 1/25/22 02:19, Richard Sandiford wrote:

Dan Li  writes:

+
  if (flag_stack_usage_info)
    current_function_static_stack_size = constant_lower_bound (frame_size);
@@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall)
  RTX_FRAME_RELATED_P (insn) = 1;
    }
+  /* Pop return address from shadow call stack.  */
+  if (aarch64_shadow_call_stack_enabled ())
+    emit_insn (gen_scs_pop ());
+


This looks correct, but following on from the above, I guess we continue
to restore x30 from the frame in the traditional way first, and then
overwrite it with the SCS value.  I think the output code would be
slightly neater if we suppressed the first restore of x30.


Yes, the current epilogue is like:
  ...
  ldp x29, x30, [sp], #16
+   ldr x30, [x18, #-8]!

I think may be we can keep the two x30 pops here, for two reasons:
1) The implementation of scs in clang is to pop x30 twice, it might be
better to be consistent with clang here[1].


This doesn't seem a very compelling reason on its own, unless it's
explicitly meant to be observable behaviour.  (But then, observed how?)



Well, probably sticking to pop x30 twice is not a good idea.
AFAICT, there doesn't seem to be an explicit requirement that
this behavior must be followed.

BTW:
Do we still need to emit the .cfi_restore 30 directive here if we
want to save a pop x30? (Sorry I'm not familiar enough with DWARF.)

Since the aarch64 linux kernel always enables -fno-omit-frame-pointer,
the generated assembly code may be as follows:

str x30, [x18], 8
ldp x29, x30, [sp], 16
..
ldr x29, [sp], 16
     ## Do we still need a .cfi_restore 30 here
.cfi_restore 29
.cfi_def_cfa_offset 0
ldr x30, [x18, -8]!
     ## Or may be a non-CFA based directive here
ret


2) If we keep the first restore of x30, it may provide some flexibility
for other scenarios. For example, we can directly patch the scs_push/pop
insns in the binary to turn it into a binary without scs;


Is that a supported (and ideally documented) use case?  If it is,
then it becomes important for correctness that the compiler emits
exactly the opcodes in the original patch.  If it isn't, then the
compiler can emit other instructions that have the same effect.



Oh, no, this is just a little trick that can be used for compatibility.
(Maybe some scenarios such as our company sometimes need to be
compatible with some non-open source binaries from different
manufacturers, two pops could make life easier :). )

If this is not a consideration for our community, please ignore
this request.


I'd like a definitive ruling on this from the kernel folks before
the patch goes in.



Ok, I'll cc some kernel folks to make sure I didn't miss something.


If binary patching is supposed to be possible then scs_push and
scs_pop *do* need to be separate define_insns.  But they also need
to have some magic unspec that differentiates them from normal
pushes and pops, e.g.:

   (set ...mem...
    (unspec:DI [...reg...] UNSPEC_SCS_PUSH))

so that there is no chance that the pattern would be treated as
a normal move and optimised accordingly.



Yeah, this template looks more appropriate if it is to be treated
as a special directive.

Thanks for your suggestions,
Dan


Re: [PING^3][PATCH,v2,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-26 Thread Dan Li via Gcc-patches

Thanks, Ard,

On 1/26/22 00:10, Ard Biesheuvel wrote:

On Wed, 26 Jan 2022 at 08:53, Dan Li  wrote:


Hi, all,

Sorry for bothering.

I'm trying to commit aarch64 scs code to the gcc and there is an issue
that I'm not sure about, could someone give me some suggestions?
(To avoid noise, I did't cc PING^3 [1] to the kernel mail list :) )

When clang enables scs, the following instructions are usually generated:

str x30, [x18], 8
ldp x29, x30, [sp], 16
..
ldp x29, x30, [sp], 16  ## x30 pop
ldr x30, [x18, -8]! ## x30 pop again
ret

The x30 register is popped twice here, Richard suggested that we can
omit the first x30 pop here.

AFAICT, it seems fine and also safe for SCS. But I'm not sure if I'm
missing something with the kernel, could someone give some suggestions?

The previous discussion can be found here [1].

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589257.html



As was pointed out in the discussion, binary patching is in fact a
concern for the Linux kernel. E.g., Android uses generic binary
builds, but we would like to be able to switch between pointer
authentication and shadow call stack at boot time, rather than always
support both, and take the SCS performance hit on systems that
implement PAC as well.

However, it seems more straight-forward to patch PACIASP and AUTIASP
instructions into SCS push/pop instructions rather than the other way
around, as we can force the use of these exact opcodes [in the NOP
space]), as well as rely on existing unwind annotations to locate any
such instruction in the binary.



Well, then I think I don't need to submit a kernel patch to
enable SCS for gcc :)

BTW:
Do we have a plan to submit patches of dynamic patch PAC into
the kernel recently?


So omitting the load of X30 from the ordinary stack seems fine to me.


On 1/25/22 22:51, Dan Li wrote:



On 1/25/22 02:19, Richard Sandiford wrote:

Well, probably sticking to pop x30 twice is not a good idea.
AFAICT, there doesn't seem to be an explicit requirement that






Ok, I'll cc some kernel folks to make sure I didn't miss something.


To Richard:

Sorry for my mistake.

Due to binary compatibility issues, SCS related code may not
be directly merged into libgcc/glibc, do we still need to
add this patch into GCC? (I'd like to finish it if that
makes sense).


Thanks all for your time!
Dan


If binary patching is supposed to be possible then scs_push and
scs_pop *do* need to be separate define_insns.  But they also need
to have some magic unspec that differentiates them from normal
pushes and pops, e.g.:

(set ...mem...
 (unspec:DI [...reg...] UNSPEC_SCS_PUSH))

so that there is no chance that the pattern would be treated as
a normal move and optimised accordingly.



Yeah, this template looks more appropriate if it is to be treated
as a special directive.

Thanks for your suggestions,
Dan


Re: [PING^3][PATCH,v2,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-26 Thread Dan Li via Gcc-patches




On 1/26/22 03:09, Ard Biesheuvel wrote:

On Wed, 26 Jan 2022 at 11:40, Dan Li  wrote:


Thanks, Ard,

On 1/26/22 00:10, Ard Biesheuvel wrote:

On Wed, 26 Jan 2022 at 08:53, Dan Li  wrote:


Hi, all,

Sorry for bothering.

I'm trying to commit aarch64 scs code to the gcc and there is an issue
that I'm not sure about, could someone give me some suggestions?
(To avoid noise, I did't cc PING^3 [1] to the kernel mail list :) )

When clang enables scs, the following instructions are usually generated:

str x30, [x18], 8
ldp x29, x30, [sp], 16
..
ldp x29, x30, [sp], 16  ## x30 pop
ldr x30, [x18, -8]! ## x30 pop again
ret

The x30 register is popped twice here, Richard suggested that we can
omit the first x30 pop here.

AFAICT, it seems fine and also safe for SCS. But I'm not sure if I'm
missing something with the kernel, could someone give some suggestions?

The previous discussion can be found here [1].

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589257.html



As was pointed out in the discussion, binary patching is in fact a
concern for the Linux kernel. E.g., Android uses generic binary
builds, but we would like to be able to switch between pointer
authentication and shadow call stack at boot time, rather than always
support both, and take the SCS performance hit on systems that
implement PAC as well.

However, it seems more straight-forward to patch PACIASP and AUTIASP
instructions into SCS push/pop instructions rather than the other way
around, as we can force the use of these exact opcodes [in the NOP
space]), as well as rely on existing unwind annotations to locate any
such instruction in the binary.



Well, then I think I don't need to submit a kernel patch to
enable SCS for gcc :)



Not entirely.


BTW:
Do we have a plan to submit patches of dynamic patch PAC into
the kernel recently?



At the moment, there are just some ideas floating around. I did
implement a proof of concept that uses unwind data, but it hit some
issues with cfi_negate_ra_state being emitted imprecisely (GCC) or not
at all (Clang) in some cases. Using objtool would be another
possibility.

So in summary, getting SCS support proper into GCC is definitely worth
the effort.


Got it!

And thanks for the suggestion, Ard :)


[PATCH] [PATCH, v3, 1/1, AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-29 Thread Dan Li via Gcc-patches
Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

Signed-off-by: Dan Li 

gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_layout_frame):
Change callee_adjust when scs is enabled.
(aarch64_restore_callee_saves): Avoid pop x30 twice
when scs is enabled.
(aarch64_expand_prologue): Push x30 onto SCS before it's
pushed onto stack.
(aarch64_expand_epilogue): Pop x30 frome SCS, while
preventing it from being popped from the regular stack again.
(aarch64_override_options_internal): Add SCS compile option check.
(TARGET_HAVE_SHADOW_CALL_STACK): New hook.
* config/aarch64/aarch64.h (struct GTY): Add is_scs_enabled.
* config/aarch64/aarch64.md (scs_push):  New template.
(scs_pop): Likewise.
* doc/invoke.texi: Document -fsanitize=shadow-call-stack.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add hook have_shadow_call_stack.
* flag-types.h (enum sanitize_code):
Add SANITIZE_SHADOW_CALL_STACK.
* opts.c: Add shadow-call-stack.
* target.def: New hook.
* toplev.c (process_options): Add SCS compile option check.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shadow_call_stack_1.c: New test.
* gcc.target/aarch64/shadow_call_stack_2.c: New test.
* gcc.target/aarch64/shadow_call_stack_3.c: New test.
* gcc.target/aarch64/shadow_call_stack_4.c: New test.
* gcc.target/aarch64/shadow_call_stack_5.c: New test.
* gcc.target/aarch64/shadow_call_stack_6.c: New test.
* gcc.target/aarch64/shadow_call_stack_7.c: New test.
* gcc.target/aarch64/shadow_call_stack_8.c: New test.
---
V3:
- Change scs_push/pop to standard move patterns.
- Optimize scs_pop to avoid pop x30 twice when shadow stack is enabled.

 gcc/config/aarch64/aarch64.c  | 66 +--
 gcc/config/aarch64/aarch64.h  |  4 ++
 gcc/config/aarch64/aarch64.md | 10 +++
 gcc/doc/invoke.texi   | 30 +
 gcc/doc/tm.texi   |  5 ++
 gcc/doc/tm.texi.in|  2 +
 gcc/flag-types.h  |  2 +
 gcc/opts.c|  1 +
 gcc/target.def|  8 +++
 .../gcc.target/aarch64/shadow_call_stack_1.c  |  6 ++
 .../gcc.target/aarch64/shadow_call_stack_2.c  |  6 ++
 .../gcc.target/aarch64/shadow_call_stack_3.c  | 45 +
 .../gcc.target/aarch64/shadow_call_stack_4.c  | 20 ++
 .../gcc.target/aarch64/shadow_call_stack_5.c  | 18 +
 .../gcc.target/aarch64/shadow_call_stack_6.c  | 18 +
 .../gcc.target/aarch64/shadow_call_stack_7.c  | 18 +
 .../gcc.target/aarch64/shadow_call_stack_8.c  | 24 +++
 gcc/toplev.c  | 10 +++
 18 files changed, 289 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_8.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 699c105a42a..461c205010e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -79,6 +79,7 @@
 #include "tree-ssa-loop-niter.h"
 #include "fractional-cost.h"
 #include "rtlanal.h"
+#include "asan.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -7478,10 +7479,31 @@ aarch64_layout_frame (void)
   frame.sve_callee_adjust = 0;
   frame.callee_offset = 0;
 
+  /* Shadow call stack only deal with functions where the LR is pushed
+ onto the stack and without specifying the "no_sanitize" attribute
+ with the argument "shadow-call-stack".  */
+  frame.is_scs_enabled
+= (!crtl->calls_eh_return
+   && (sanitize_flags_p (SANITIZE_SHADOW_CALL_STACK)
+  && known_ge (cfun->machine->frame.reg_offset[LR_REGNUM], 0)));
+
+  /* When sh

Re: [PING^3][PATCH,v2,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-29 Thread Dan Li via Gcc-patches

Hi, Richard,
I have sent out my v3[1], and (probably) fixed the previous issues,
please let me know if i got something wrong :)

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589471.html

Thanks,
Dan.

On 1/25/22 02:19, Richard Sandiford wrote:

Dan Li  writes:

+
  if (flag_stack_usage_info)
current_function_static_stack_size = constant_lower_bound (frame_size);

@@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall)

  RTX_FRAME_RELATED_P (insn) = 1;
}

+  /* Pop return address from shadow call stack.  */

+  if (aarch64_shadow_call_stack_enabled ())
+   emit_insn (gen_scs_pop ());
+


This looks correct, but following on from the above, I guess we continue
to restore x30 from the frame in the traditional way first, and then
overwrite it with the SCS value.  I think the output code would be
slightly neater if we suppressed the first restore of x30.


Yes, the current epilogue is like:
  ...
  ldp x29, x30, [sp], #16
+   ldr x30, [x18, #-8]!

I think may be we can keep the two x30 pops here, for two reasons:
1) The implementation of scs in clang is to pop x30 twice, it might be
better to be consistent with clang here[1].


This doesn't seem a very compelling reason on its own, unless it's
explicitly meant to be observable behaviour.  (But then, observed how?)


2) If we keep the first restore of x30, it may provide some flexibility
for other scenarios. For example, we can directly patch the scs_push/pop
insns in the binary to turn it into a binary without scs;


Is that a supported (and ideally documented) use case?  If it is,
then it becomes important for correctness that the compiler emits
exactly the opcodes in the original patch.  If it isn't, then the
compiler can emit other instructions that have the same effect.

I'd like a definitive ruling on this from the kernel folks before
the patch goes in.

If binary patching is supposed to be possible then scs_push and
scs_pop *do* need to be separate define_insns.  But they also need
to have some magic unspec that differentiates them from normal
pushes and pops, e.g.:

   (set ...mem...
(unspec:DI [...reg...] UNSPEC_SCS_PUSH))

so that there is no chance that the pattern would be treated as
a normal move and optimised accordingly.


+;; Save X30 in the X18-based POST_INC stack (consistent with clang).
+(define_insn "scs_push"
+  [(set (mem:DI (reg:DI R18_REGNUM)) (reg:DI R30_REGNUM))
+   (set (reg:DI R18_REGNUM) (plus:DI (reg:DI R18_REGNUM) (const_int 8)))]
+  ""
+  "str\\tx30, [x18], #8"
+  [(set_attr "type" "store_8")]
+)
+
+;; Load X30 form the X18-based PRE_DEC stack (consistent with clang).
+(define_insn "scs_pop"
+  [(set (reg:DI R18_REGNUM) (minus:DI (reg:DI R18_REGNUM) (const_int 8)))
+   (set (reg:DI R30_REGNUM) (mem:DI (reg:DI R18_REGNUM)))]
+  ""
+  "ldr\\tx30, [x18, #-8]!"
+  [(set_attr "type" "load_8")]
+)
+


The two SETs here work in parallel, with the define_insn as a whole
following a "read input operands, act, write output operands" sequence.
The (mem: …) address therefore sees the old value of r18 rather than the
new one.  Also, RTL rules say that subtractions need to be written as
additions of the negative.


Oh, sorry, I got it wrong here.


I think the pattern would therefore be something like:

[(set (reg:DI R30_REGNUM) (mem:DI (plus:DI (reg:DI R18_REGNUM)
 (const_int -8]
 (set (reg:DI R18_REGNUM) (plus:DI (reg:DI R18_REGNUM) (const_int -8)))]

However, since these are normal moves, I think we should be able to use
the standard move patterns.  E.g. the push could be:

(define_expand "scs_push"
[(set (mem:DI (pre_dec:DI (reg:DI R18_REGNUM)))
  (reg:DI R30_REGNUM))])

and similarly for the pop.


Thanks, this template looks better.

Since the push/pop of SCS and normal stack in clang are different
(for example, scs push is post_inc, while normal stack is pre_dec),
in order to maintain consistency, I think it can be changed to the
following:

(define_expand "scs_push"
[(set (mem:DI (post_inc:DI (reg:DI R18_REGNUM)))
  (reg:DI R30_REGNUM))])

(define_expand "scs_pop"
[(set (reg:DI R30_REGNUM)
 (mem:DI (pre_dec:DI (reg:DI R18_REGNUM])


Yeah, I copied the wrong name above, sorry.

Thanks,
Richard


Re: [PING^3][PATCH,v2,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-02-02 Thread Dan Li via Gcc-patches




On 1/31/22 08:26, Richard Sandiford wrote:

Thanks for the discussion and sorry for the slow reply, was out most of
last week.

Dan Li  writes:

Thanks, Ard,

On 1/26/22 00:10, Ard Biesheuvel wrote:

On Wed, 26 Jan 2022 at 08:53, Dan Li  wrote:


Hi, all,

Sorry for bothering.

I'm trying to commit aarch64 scs code to the gcc and there is an issue
that I'm not sure about, could someone give me some suggestions?
(To avoid noise, I did't cc PING^3 [1] to the kernel mail list :) )


So omitting the load of X30 from the ordinary stack seems fine to me.


OK, thanks.  Let's go with that for now then.  There would still be
time to change our minds before GCC 12 is released, if anyone feels
that patching SCS code would be useful.

Reading it back, I think my previous message came across as sounding

like a complaint against binary patching, which wasn't the case at all.
I think it would be fine to support patching, even if it was just for a
single vendor rather than expected to be a common case.  It's just that,
if we did want to support it, we'd need to document it as a requirement
(at least within GCC) and change the implementation accordingly.


Got it, then I'll implement this feature as discussed above and see
if we could add additional options for SCS later.

Thanks,
Dan


Re: [PATCH] [PATCH,v3,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-02-02 Thread Dan Li via Gcc-patches



On 1/31/22 09:00, Richard Sandiford wrote:

Dan Li  writes:

Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

  
  /* This file should be included last.  */

  #include "target-def.h"
@@ -7478,10 +7479,31 @@ aarch64_layout_frame (void)
frame.sve_callee_adjust = 0;
frame.callee_offset = 0;
  
+  /* Shadow call stack only deal with functions where the LR is pushed


Typo: s/deal/deals/



Sorry for my non-standard English expression :)

+ onto the stack and without specifying the "no_sanitize" attribute
+ with the argument "shadow-call-stack".  */
+  frame.is_scs_enabled
+= (!crtl->calls_eh_return
+   && (sanitize_flags_p (SANITIZE_SHADOW_CALL_STACK)
+  && known_ge (cfun->machine->frame.reg_offset[LR_REGNUM], 0)));


Nit, but normal GCC style would be to use a single chain of &&s here:

   frame.is_scs_enabled
 = (!crtl->calls_eh_return
&& sanitize_flags_p (SANITIZE_SHADOW_CALL_STACK)
&& known_ge (cfun->machine->frame.reg_offset[LR_REGNUM], 0));



Got it.

+
+  /* When shadow call stack is enabled, the scs_pop in the epilogue will
+ restore x30, and we don't need to pop x30 again in the traditional
+ way.  At this time, if candidate2 is x30, we need to adjust
+ max_push_offset to 256 to ensure that the offset meets the requirements
+ of emit_move_insn.  Similarly, if candidate1 is x30, we need to set
+ max_push_offset to 0, because x30 is not popped up at this time, so
+ callee_adjust cannot be adjusted.  */
HOST_WIDE_INT max_push_offset = 0;
if (frame.wb_candidate2 != INVALID_REGNUM)
-max_push_offset = 512;
-  else if (frame.wb_candidate1 != INVALID_REGNUM)
+{
+  if (frame.is_scs_enabled && frame.wb_candidate2 == R30_REGNUM)
+   max_push_offset = 256;
+  else
+   max_push_offset = 512;
+}
+  else if ((frame.wb_candidate1 != INVALID_REGNUM)
+  && !(frame.is_scs_enabled && frame.wb_candidate1 == R30_REGNUM))
  max_push_offset = 256;
HOST_WIDE_INT const_size, const_outgoing_args_size, const_fp_offset;


Maybe we should instead add separate fields for wb_push_candidate[12] and
wb_pop_candidate[12].  The pop candidates would start out the same as the
push candidates, but R30_REGNUM would get replaced with INVALID_REGNUM
for SCS.



This looks more reasonable, I'll change it in the next version.

Admittedly, suppressing the restore of x30 is turning out to be a bit
more difficult than I'd realised :-/


[…]
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 2792bb29adb..1610a4fd74c 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -916,6 +916,10 @@ struct GTY (()) aarch64_frame
unsigned spare_pred_reg;
  
bool laid_out;

+
+  /* Nonzero if shadow call stack should be enabled for the current
+ function, otherwise return FALSE.  */


“True” seems better than “Nonzero” given that this is a bool.
(A lot of GCC bools were originally ints, which is why “nonzero”
still appears in non-obvious places.)

I think we can just drop “otherwise return FALSE”: “return” doesn't
seem appropriate here, given that it's a variable.



Got it, thanks for the explanation.

Looks great otherwise.  Thanks especially for testing the corner cases. :-)

One minor thing:


+/* { dg-final { scan-assembler-times "str\tx30, \\\[x18\\\], \[#|$\]?8" 2 } } 
*/
+/* { dg-final { scan-assembler-times "ldr\tx30, \\\[x18, \[#|$\]?-8\\\]!" 2 } 
} */


This sort of regexp can be easier to write if you quote them using {…}
rather than "…", since it reduces the number of backslashes needed.  E.g.:

/* { dg-final { scan-assembler-times {str\tx30, \[x18\], [#|$]?8} 2 } } */

The current version is fine too though, and is widely used.  Just mentioning
it in case it's useful in future.



Oh, thanks Richard, I didn't notice it before.

Also, [#|$]? can be written #?.



Ok.

Thanks,
Richard


[PATCH] [PATCH, v4, 1/1, AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-02-05 Thread Dan Li via Gcc-patches
Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

Signed-off-by: Dan Li 

gcc/ChangeLog:

* config/aarch64/aarch64.c (SLOT_REQUIRED):
Rename wb_candidate[12] to wb_push_candidate[12].
(aarch64_layout_frame): Likewise, and
change callee_adjust when scs is enabled.
(aarch64_save_callee_saves):
Rename wb_candidate[12] to wb_push_candidate[12].
(aarch64_restore_callee_saves): Likewise.
(aarch64_get_separate_components): Likewise.
(aarch64_expand_prologue): Push x30 onto SCS before it's
pushed onto stack.
(aarch64_expand_epilogue): Pop x30 frome SCS, while
preventing it from being popped from the regular stack again.
(aarch64_override_options_internal): Add SCS compile option check.
(TARGET_HAVE_SHADOW_CALL_STACK): New hook.
* config/aarch64/aarch64.h (struct GTY): Add is_scs_enabled,
wb_pop_candidate[12], and rename wb_candidate[12] to
wb_push_candidate[12].
* config/aarch64/aarch64.md (scs_push): New template.
(scs_pop): Likewise.
* doc/invoke.texi: Document -fsanitize=shadow-call-stack.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add hook have_shadow_call_stack.
* flag-types.h (enum sanitize_code):
Add SANITIZE_SHADOW_CALL_STACK.
* opts.c: Add shadow-call-stack.
* target.def: New hook.
* toplev.c (process_options): Add SCS compile option check.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shadow_call_stack_1.c: New test.
* gcc.target/aarch64/shadow_call_stack_2.c: New test.
* gcc.target/aarch64/shadow_call_stack_3.c: New test.
* gcc.target/aarch64/shadow_call_stack_4.c: New test.
* gcc.target/aarch64/shadow_call_stack_5.c: New test.
* gcc.target/aarch64/shadow_call_stack_6.c: New test.
* gcc.target/aarch64/shadow_call_stack_7.c: New test.
* gcc.target/aarch64/shadow_call_stack_8.c: New test.
---
V4:
- Added wb_[push|pop]_candidates[12] to ensure push/pop can
emit different registers.

V3:
- Change scs_push/pop to standard move patterns.
- Optimize scs_pop to avoid pop x30 twice when shadow stack is enabled.

 gcc/config/aarch64/aarch64.c  | 121 +-
 gcc/config/aarch64/aarch64.h  |  21 ++-
 gcc/config/aarch64/aarch64.md |  10 ++
 gcc/doc/invoke.texi   |  30 +
 gcc/doc/tm.texi   |   5 +
 gcc/doc/tm.texi.in|   2 +
 gcc/flag-types.h  |   2 +
 gcc/opts.c|   1 +
 gcc/target.def|   8 ++
 .../gcc.target/aarch64/shadow_call_stack_1.c  |   6 +
 .../gcc.target/aarch64/shadow_call_stack_2.c  |   6 +
 .../gcc.target/aarch64/shadow_call_stack_3.c  |  45 +++
 .../gcc.target/aarch64/shadow_call_stack_4.c  |  20 +++
 .../gcc.target/aarch64/shadow_call_stack_5.c  |  18 +++
 .../gcc.target/aarch64/shadow_call_stack_6.c  |  18 +++
 .../gcc.target/aarch64/shadow_call_stack_7.c  |  18 +++
 .../gcc.target/aarch64/shadow_call_stack_8.c  |  24 
 gcc/toplev.c  |  10 ++
 18 files changed, 332 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_8.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 699c105a42a..f4d962917c4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -79,6 +79,7 @@
 #include "tree-ssa-loop-niter.h"
 #include "fractional-cost.h"
 #include "rtlanal.h"
+#include "asan.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -7291,8 +7292,8 @@ aarch64_layout_frame (void)
 #define SLOT_NOT_REQUIRED (-2)
 #define SLOT_REQUIRED (-1)
 
-  frame.wb_candidate1 = INVALID_REGNUM;
-  frame.wb_c

Re: [PATCH] [PATCH,v3,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-02-05 Thread Dan Li via Gcc-patches

Hi, Richard,
I have sent out my v4[1], please let me know if i got something wrong :).

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589921.html

Thanks,
Dan.


On 1/31/22 09:00, Richard Sandiford wrote:

Dan Li  writes:

Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

Signed-off-by: Dan Li 

gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_layout_frame):
Change callee_adjust when scs is enabled.
(aarch64_restore_callee_saves): Avoid pop x30 twice
when scs is enabled.
(aarch64_expand_prologue): Push x30 onto SCS before it's
pushed onto stack.
(aarch64_expand_epilogue): Pop x30 frome SCS, while
preventing it from being popped from the regular stack again.
(aarch64_override_options_internal): Add SCS compile option check.
(TARGET_HAVE_SHADOW_CALL_STACK): New hook.
* config/aarch64/aarch64.h (struct GTY): Add is_scs_enabled.
* config/aarch64/aarch64.md (scs_push):  New template.
(scs_pop): Likewise.
* doc/invoke.texi: Document -fsanitize=shadow-call-stack.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add hook have_shadow_call_stack.
* flag-types.h (enum sanitize_code):
Add SANITIZE_SHADOW_CALL_STACK.
* opts.c: Add shadow-call-stack.
* target.def: New hook.
* toplev.c (process_options): Add SCS compile option check.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shadow_call_stack_1.c: New test.
* gcc.target/aarch64/shadow_call_stack_2.c: New test.
* gcc.target/aarch64/shadow_call_stack_3.c: New test.
* gcc.target/aarch64/shadow_call_stack_4.c: New test.
* gcc.target/aarch64/shadow_call_stack_5.c: New test.
* gcc.target/aarch64/shadow_call_stack_6.c: New test.
* gcc.target/aarch64/shadow_call_stack_7.c: New test.
* gcc.target/aarch64/shadow_call_stack_8.c: New test.
---
V3:
- Change scs_push/pop to standard move patterns.
- Optimize scs_pop to avoid pop x30 twice when shadow stack is enabled.

  gcc/config/aarch64/aarch64.c  | 66 +--
  gcc/config/aarch64/aarch64.h  |  4 ++
  gcc/config/aarch64/aarch64.md | 10 +++
  gcc/doc/invoke.texi   | 30 +
  gcc/doc/tm.texi   |  5 ++
  gcc/doc/tm.texi.in|  2 +
  gcc/flag-types.h  |  2 +
  gcc/opts.c|  1 +
  gcc/target.def|  8 +++
  .../gcc.target/aarch64/shadow_call_stack_1.c  |  6 ++
  .../gcc.target/aarch64/shadow_call_stack_2.c  |  6 ++
  .../gcc.target/aarch64/shadow_call_stack_3.c  | 45 +
  .../gcc.target/aarch64/shadow_call_stack_4.c  | 20 ++
  .../gcc.target/aarch64/shadow_call_stack_5.c  | 18 +
  .../gcc.target/aarch64/shadow_call_stack_6.c  | 18 +
  .../gcc.target/aarch64/shadow_call_stack_7.c  | 18 +
  .../gcc.target/aarch64/shadow_call_stack_8.c  | 24 +++
  gcc/toplev.c  | 10 +++
  18 files changed, 289 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_5.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_6.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_7.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_8.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 699c105a42a..461c205010e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -79,6 +79,7 @@
  #include "tree-ssa-loop-niter.h"
  #include "fractional-cost.h"
  #include "rtlanal.h"
+#include "asan.h"
  
  /* This file should be included last.  */

  #include "target-def.h"
@@ -7478,10 +7479,31 @@ aarch64_layout_frame (void)
frame.sve_callee_adjust = 0;
frame.callee_offset = 0;
  
+  /* Shadow call stack only deal with functions where the LR is pushed


Typo: s/deal/deals/


+ onto the stack and without specifying the "no_sa

[AArch64] Question about the condition of calls_alloca in aarch64_layout_frame

2022-02-07 Thread Dan Li via Gcc-patches

There is the following code in aarch64_layout_frame:

else if (crtl->outgoing_args_size.is_constant (&const_outgoing_args_size)
 && frame.saved_regs_size.is_constant (&const_saved_regs_size)
 && const_outgoing_args_size + const_saved_regs_size < 512
 && (!saves_below_hard_fp_p || const_outgoing_args_size == 0)
// 1)
 && !(cfun->calls_alloca 
  && frame.hard_fp_offset.is_constant (&const_fp_offset)
  && const_fp_offset < max_push_offset))
  {
/* Frame with small outgoing arguments:

   sub sp, sp, frame_size
   stp reg1, reg2, [sp, outgoing_args_size]
   stp reg3, reg4, [sp, outgoing_args_size + 16]  */
frame.initial_adjust = frame.frame_size;
frame.callee_offset = const_outgoing_args_size;
  }
..
else if (frame.hard_fp_offset.is_constant (&const_fp_offset)
 && const_fp_offset < max_push_offset)
  {
// 2)
/* Frame with large outgoing arguments or SVE saves, but with
   a small local area:

   stp reg1, reg2, [sp, -hard_fp_offset]!
   stp reg3, reg4, [sp, 16]
   [sub sp, sp, below_hard_fp_saved_regs_size]
   [save SVE registers relative to SP]
   sub sp, sp, outgoing_args_size  */


As described in 2), "Frame with large outgoing arguments or SVE saves,
but with a small local area".

But due to the condition at 1), the following code (small outgoing with
a small local area) also uses the insns in 2), which is slightly different
from the description:

//aarch64-linux-gnu-gcc   main.c -O0 -S main.s -g -w -fomit-frame-pointer

#include 
#include 
#define REP9(X) X,X,X,X,X,X,X,X,X

int alloc_size;

int main(void)
{
outgoing (REP9(1));
char * y = alloca(alloc_size);
return 0;
}


Could the condition in 1) be removed, or am I missing something?

Thanks,
Dan.


Re: [PATCH] [PATCH,v4,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-02-09 Thread Dan Li via Gcc-patches




On 2/9/22 08:08, Richard Sandiford wrote:

Dan Li  writes:

+
+  /* When shadow call stack is enabled, the scs_pop in the epilogue will
+ restore x30, and we don't need to pop x30 again in the traditional
+ way.  Pop candidates record the registers that need to be popped
+ eventually.  */
+  if (frame.is_scs_enabled)
+{
+  if (frame.wb_push_candidate2 == R30_REGNUM)
+   frame.wb_pop_candidate2 = INVALID_REGNUM;
+  else if (frame.wb_push_candidate1 == R30_REGNUM)
+   frame.wb_pop_candidate1 = INVALID_REGNUM;


Although it makes no difference to the behaviour, I think it would be
clearer to use pop rather than push in the checks here.



Got it.

@@ -7885,8 +7914,8 @@ aarch64_save_callee_saves (poly_int64 start_offset,
bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno);
  
if (skip_wb

- && (regno == cfun->machine->frame.wb_candidate1
- || regno == cfun->machine->frame.wb_candidate2))
+ && (regno == cfun->machine->frame.wb_push_candidate1
+ || regno == cfun->machine->frame.wb_push_candidate2))
continue;
  
if (cfun->machine->reg_is_wrapped_separately[regno])

@@ -7996,8 +8025,8 @@ aarch64_restore_callee_saves (poly_int64 start_offset, 
unsigned start,
rtx reg, mem;
  
if (skip_wb

- && (regno == cfun->machine->frame.wb_candidate1
- || regno == cfun->machine->frame.wb_candidate2))
+ && (regno == cfun->machine->frame.wb_push_candidate1
+ || regno == cfun->machine->frame.wb_push_candidate2))


Shouldn't this be using pop rather than push?



There might be a little difference:

- Using push candidates means that a register to be ignored in pop
candidates will not be emitted again during the "restore" (pop_candidates
should always be a subset of push_candidates, since popping a register
without a push might not make sense).

- Using pop candidates means that a registers to be ignored in pop
candidates will be re-emitted during the "restore". For example,
if we specify to ignore the x20 register in pop:

--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7502,6 +7502,8 @@ aarch64_layout_frame (void)
frame.wb_pop_candidate1 = INVALID_REGNUM;
 }
 
+  if (frame.wb_pop_candidate2 == R20_REGNUM)

+   frame.wb_pop_candidate2 = INVALID_REGNUM;
   /* If candidate2 is INVALID_REGNUM, we need to adjust max_push_offset to
  256 to ensure that the offset meets the requirements of emit_move_insn.
  Similarly, if candidate1 is INVALID_REGNUM, we need to set

With the test case:

int main(void)
{
__asm__ ("":::"x19", "x20");
return 0;
}

When we use "pop_candidate[12]", one more insn is emitted:

00400604 :
   400604:   a9bf53f3stp x19, x20, [sp, #-16]!
   400608:   5280mov w0, #0x0
+  40060c:   f94007f4ldr x20, [sp, #8]
   400610:   f84107f3ldr x19, [sp], #16
   400614:   d65f03c0ret

But in the case of ignoring a specific register (like scs ignores x30),
there is no difference between the two (because we always need
to explicitly specify which registers to ignore in the parameter of
aarch64_restore_callee_saves).

If pop looks better here, I'd like to change it to pop in the
next version :).


+  /* When shadow call stack is enabled, the scs_pop in the epilogue will
+ restore x30, we don't need to restore x30 again in the traditional
+ way.  */
+  if (cfun->machine->frame.is_scs_enabled)
+aarch64_restore_callee_saves (callee_offset - sve_callee_adjust,
+ R0_REGNUM, R29_REGNUM,
+ callee_adjust != 0, &cfi_ops);
+  else
+aarch64_restore_callee_saves (callee_offset - sve_callee_adjust,
+ R0_REGNUM, R30_REGNUM,
+ callee_adjust != 0, &cfi_ops);
+


Very minor, but I think it would be better to have:

   unsigned int last_gpr = (cfun->machine->frame.is_scs_enabled
   ? R29_REGNUM : R30_REGNUM);

so that we don't need to repeat the other arguments.  There's then
less risk of the two versions getting out of sync.



Got it.

  
if (need_barrier_p)

  emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
@@ -9066,6 +9109,17 @@ aarch64_expand_epilogue (bool for_sibcall)
RTX_FRAME_RELATED_P (insn) = 1;
  }
  
+  /* Pop return address from shadow call stack.  */

+  if (cfun->machine->frame.is_scs_enabled)
+{
+  machine_mode mode = aarch64_reg_save_mode (R30_REGNUM);
+  rtx reg = gen_rtx_REG (mode, R30_REGNUM);
+
+  insn = emit_insn (gen_scs_pop ());
+  add_reg_note (insn, REG_CFA_RESTORE, reg);
+

Re: [PATCH] [PATCH,v4,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-02-11 Thread Dan Li via Gcc-patches




On 2/10/22 01:55, Richard Sandiford wrote:


There might be a little difference:

- Using push candidates means that a register to be ignored in pop
candidates will not be emitted again during the "restore" (pop_candidates
should always be a subset of push_candidates, since popping a register
without a push might not make sense).


The push candidates are simply a subset of the saved registers though.
Similarly, the pop candidates are simply a subset of the restored registers.
So I think the requirement operates at that level: the restored registers
must be a subset of the saved registers.

In other circumstances it could have been the other way around:
there might have been a change that stopped us from saving two
registers during the allocation, but we wanted to carry on restoring
two registers during the deallocation.  I don't think there's a
reason that the push candidates *have* to be a superset of the
pop candidates (even though they are with the current change).



Oh yeah, that sounds more reasonable.


When we use "pop_candidate[12]", one more insn is emitted:

00400604 :
 400604:   a9bf53f3stp x19, x20, [sp, #-16]!
 400608:   5280mov w0, #0x0
+  40060c:   f94007f4ldr x20, [sp, #8]
 400610:   f84107f3ldr x19, [sp], #16
 400614:   d65f03c0ret

But in the case of ignoring a specific register (like scs ignores x30),
there is no difference between the two (because we always need
to explicitly specify which registers to ignore in the parameter of
aarch64_restore_callee_saves).


I think this is the correct behaviour.  If we don't want to restore
a register at all then it should be excluded from the restore list
somehow.  In your case you're doing that be using a limit of
X29_REGNUM instead of X30_REGNUM.



Got it, I'll use pop candidates in the next version.


FWIW, I did wonder whether aarch64_restore_callee_saves should be
doing the scs pop, rather than aarch64_expand_epilogue, and in an
earlier draft of the previous review I'd asked for that.  It does
seem conceptually cleaner, but in practice, it would probably have
been awkward to implement.  E.g. we'd need to explicitly stop an
LDP being formed with X30 as the second register.



Well, then I think I should keep it the same here :).


But treating scs push and scs pop as part of the register save and
restore sequences would have one advantage: it would allow the
scs push and scs pop to be shrink-wrapped.



Sorry for my limited knowledge of shrink warping, I don't think I get
it here (I tried to find a case when compiling the kernel and some
gcc test cases but I still don't have a clue.).

I see that the bitmap of LR_REGNUM is cleared in
aarch64_get_separate_components and scs push/pop are x18 based operations.

If we handle them in aarch64_restore/save_callee_saves,
could scs push/pop be shrink-wrapped in some cases?

Thanks,
Dan


Re: [PATCH] [PATCH,v4,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-02-11 Thread Dan Li via Gcc-patches




On 2/11/22 01:53, Richard Sandiford wrote:

Dan Li  writes:

On 2/10/22 01:55, Richard Sandiford wrote:



But treating scs push and scs pop as part of the register save and
restore sequences would have one advantage: it would allow the
scs push and scs pop to be shrink-wrapped.



Sorry for my limited knowledge of shrink warping, I don't think I get
it here (I tried to find a case when compiling the kernel and some
gcc test cases but I still don't have a clue.).

I see that the bitmap of LR_REGNUM is cleared in
aarch64_get_separate_components and scs push/pop are x18 based operations.

If we handle them in aarch64_restore/save_callee_saves,
could scs push/pop be shrink-wrapped in some cases?


Yeah, I think so.  E.g. for:

void f();
int g(int x) {
 if (x == 0)
 return 1;
 f();
 return 2;
}

shrink wrapping would allow the scs push and pop to move along with the
x30 save:

g:
 cbnzw0, .L9
 mov w0, 1
 ret
.L9:
 stp x29, x30, [sp, -16]!
 mov x29, sp
 bl  f
 mov w0, 2
 ldp x29, x30, [sp], 16
 ret



Thanks Richard, (to make sure I understand correctly :)) I think
it means that the current patch could do a "shrink-wapping", but
the X30 could not be treat as a "component", now it could gen code
like:

g:
cbnzw0, .L9
mov w0, 1
ret
.L9:
str x30, [x18], 8
stp x29, x30, [sp, -16]!
mov x29, sp
bl  f
ldr x30, [x18, -8]!
mov w0, 2
ldr x29, [sp], 16
ret


The idea is that aarch64_save_callee_saves would treat the scs push
as part of saving x30 (along with the normal store to the frame chain,
when used).  aarch64_restore_callee_saves would similarly treat the scs
pop as the way of restoring x30 (instead of loading from the frame chain).
This is in contrast to the current patch, where the scs push and pop are
treated as fixed parts of the prologue and epilogue instead, and where
aarch64_restore_callee_saves tries to avoid doing anything for x30.

If shrink-wrapping decides to treat x30 as a separate “component”, as it
does in the example above, then the scs push and pop would be emitted
by aarch64_process_components instead.

It would be more complex, but it would give better code.



Following your idea, I made a poc to add x30 in component bitmap:

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 35f6f64f5b2..fc9b5e7af54 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8359,7 +8359,7 @@ aarch64_get_separate_components (void)
   if (reg1 != INVALID_REGNUM)
 bitmap_clear_bit (components, reg1);
 
-  bitmap_clear_bit (components, LR_REGNUM);

   bitmap_clear_bit (components, SP_REGNUM);
 
   return components;

@@ -8396,7 +8396,7 @@ aarch64_components_for_bb (basic_block bb)
   /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
   for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
 if (!fixed_regs[regno]
-   && !crtl->abi->clobbers_full_reg_p (regno)
+   && (!crtl->abi->clobbers_full_reg_p (regno) || regno == R30_REGNUM)
&& (TEST_HARD_REG_BIT (extra_caller_saves, regno)
|| bitmap_bit_p (in, regno)
|| bitmap_bit_p (gen, regno)

And with a test code compiled with -fno-omit-frame-pointer:

void f();
int g(int x) {
if (x == 0) {
__asm__ ("":::"x19", "x20");
return 1;
}
f();
return 2;
}

Then it seems X30 is treat as a "component" (the test
result of aarch64.exp also seems fine).

g:
stp x19, x20, [sp, -32]!
cbnzw0, .L2
mov w0, 1
ldp x19, x20, [sp], 32
ret
.L2:
str x30, [sp, 16]
bl  f
ldr x30, [sp, 16]
mov w0, 2
ldp x19, x20, [sp], 32
ret

And I think maybe we could handle this through three patches:
1.Keep current patch (a V5) unchanged for scs.
2.Add shrink-warpping for X30:
logically this might be a separate topic, and I think more testing
might be needed here (Well, I'm a little worried about if there might
be other effects, since I just read this part of the code roughly
yesterday).
3.Add scs push/pop to shrink-wrapping (and maybe we can do the same for
the PAC code in pro/epilogue, since it's also the operation of the X30).

Thanks,
Dan


Re: [PATCH] [PATCH,v4,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-02-11 Thread Dan Li via Gcc-patches




On 2/11/22 07:35, Richard Sandiford wrote:

Dan Li  writes:

On 2/11/22 01:53, Richard Sandiford wrote:

Dan Li  writes:

On 2/10/22 01:55, Richard Sandiford wrote:



And I think maybe we could handle this through three patches:
1.Keep current patch (a V5) unchanged for scs.
2.Add shrink-warpping for X30:
logically this might be a separate topic, and I think more testing
might be needed here (Well, I'm a little worried about if there might
be other effects, since I just read this part of the code roughly
yesterday).
3.Add scs push/pop to shrink-wrapping (and maybe we can do the same for
the PAC code in pro/epilogue, since it's also the operation of the X30).


Yeah, that's fair.

(Like I said earlier, I wasn't asking for the shrink-wrapping change.
It was just a note in passing.  But as you point out, the individual
shrink-wrapping support would be even more work than I'd imagined.)



Got it!

Thanks,
Dan


[PATCH] [PATCH, v5, 1/1, AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-02-12 Thread Dan Li via Gcc-patches
Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

Signed-off-by: Dan Li 

gcc/ChangeLog:

* config/aarch64/aarch64.cc (SLOT_REQUIRED):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_layout_frame): Likewise, and
change callee_adjust when scs is enabled.
(aarch64_save_callee_saves):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_restore_callee_saves):
Change wb_candidate[12] to wb_pop_candidate[12].
(aarch64_get_separate_components):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_expand_prologue): Push x30 onto SCS before it's
pushed onto stack.
(aarch64_expand_epilogue): Pop x30 frome SCS, while
preventing it from being popped from the regular stack again.
(aarch64_override_options_internal): Add SCS compile option check.
(TARGET_HAVE_SHADOW_CALL_STACK): New hook.
* config/aarch64/aarch64.h (struct GTY): Add is_scs_enabled,
wb_pop_candidate[12], and rename wb_candidate[12] to
wb_push_candidate[12].
* config/aarch64/aarch64.md (scs_push): New template.
(scs_pop): Likewise.
* doc/invoke.texi: Document -fsanitize=shadow-call-stack.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add hook have_shadow_call_stack.
* flag-types.h (enum sanitize_code):
Add SANITIZE_SHADOW_CALL_STACK.
* opts.cc: Add shadow-call-stack.
* target.def: New hook.
* toplev.cc (process_options): Add SCS compile option check.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shadow_call_stack_1.c: New test.
* gcc.target/aarch64/shadow_call_stack_2.c: New test.
* gcc.target/aarch64/shadow_call_stack_3.c: New test.
* gcc.target/aarch64/shadow_call_stack_4.c: New test.
* gcc.target/aarch64/shadow_call_stack_5.c: New test.
* gcc.target/aarch64/shadow_call_stack_6.c: New test.
* gcc.target/aarch64/shadow_call_stack_7.c: New test.
* gcc.target/aarch64/shadow_call_stack_8.c: New test.
---
V5:
- Modify part of wb_push_candidates to wb_pop_candidates.
- Rebase to the mainline (20220210).
V4:
- Added wb_[push|pop]_candidates[12] to ensure push/pop can
emit different registers.

V3:
- Change scs_push/pop to standard move patterns.
- Optimize scs_pop to avoid pop x30 twice when shadow stack is enabled.

 gcc/config/aarch64/aarch64.cc | 113 +-
 gcc/config/aarch64/aarch64.h  |  21 +++-
 gcc/config/aarch64/aarch64.md |  10 ++
 gcc/doc/invoke.texi   |  30 +
 gcc/doc/tm.texi   |   5 +
 gcc/doc/tm.texi.in|   2 +
 gcc/flag-types.h  |   2 +
 gcc/opts.cc   |   1 +
 gcc/target.def|   8 ++
 .../gcc.target/aarch64/shadow_call_stack_1.c  |   6 +
 .../gcc.target/aarch64/shadow_call_stack_2.c  |   6 +
 .../gcc.target/aarch64/shadow_call_stack_3.c  |  45 +++
 .../gcc.target/aarch64/shadow_call_stack_4.c  |  20 
 .../gcc.target/aarch64/shadow_call_stack_5.c  |  18 +++
 .../gcc.target/aarch64/shadow_call_stack_6.c  |  18 +++
 .../gcc.target/aarch64/shadow_call_stack_7.c  |  18 +++
 .../gcc.target/aarch64/shadow_call_stack_8.c  |  24 
 gcc/toplev.cc |  10 ++
 18 files changed, 326 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_8.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index e3f18fbe7da..35f6f64f5b2 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -80,6 +80,7 @@
 #include "fractional-cost.h"
 #include "rtlanal.h"
 #include "tree-dfa.h"
+#include "asan.h"
 
 /* This file should be included last.  *

Re: [PATCH] [PATCH,v4,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-02-12 Thread Dan Li via Gcc-patches




On 2/11/22 07:35, Richard Sandiford wrote:

Dan Li  writes:

On 2/11/22 01:53, Richard Sandiford wrote:

Dan Li  writes:

On 2/10/22 01:55, Richard Sandiford wrote:



And I think maybe we could handle this through three patches:
1.Keep current patch (a V5) unchanged for scs.
2.Add shrink-warpping for X30:
logically this might be a separate topic, and I think more testing
might be needed here (Well, I'm a little worried about if there might
be other effects, since I just read this part of the code roughly
yesterday).
3.Add scs push/pop to shrink-wrapping (and maybe we can do the same for
the PAC code in pro/epilogue, since it's also the operation of the X30).


Yeah, that's fair.

(Like I said earlier, I wasn't asking for the shrink-wrapping change.
It was just a note in passing.  But as you point out, the individual
shrink-wrapping support would be even more work than I'd imagined.)


Hi, Richard,
I have sent out the v5[1] and rebased it to mainline at the same time,
please let me know if there is anything else I need to do :)

[1].https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590311.html

Thanks,
Dan


Re: [PATCH] [PATCH,v5,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-02-16 Thread Dan Li via Gcc-patches




On 2/15/22 10:02, Richard Sandiford wrote:

Dan Li  writes:

Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].


Looks good, thanks.  However, when I bootstrap it on
aarch64-linux-gnu I get:

.../gcc/ubsan.cc: In function ‘bool 
ubsan_expand_null_ifn(gimple_stmt_iterator*)’:
.../gcc/ubsan.cc:835:50: error: enumerated and non-enumerated type in 
conditional expression [-Werror=extra]
   835 | = (flag_sanitize_recover & ((check_align ? SANITIZE_ALIGNMENT 
: 0)
   |  
^~~~
.../gcc/ubsan.cc:836:51: error: enumerated and non-enumerated type in 
conditional expression [-Werror=extra]
   836 | | (check_null ? SANITIZE_NULL : 
0)))
   |~~~^~~

I think this is because you're taking the last available bit
of the enum :-)

A hacky fix is to add "+ 0" to SANITIZE_ALIGNMENT and SANITIZE_NULL
in the code quoted above (i.e. the code in the error messages).
That seems slightly more robust than a cast to unsigned int (say),
since "+ 0" will work even if the values become 64-bit quantities
in future.

Richard



Ah, apologize for my mistake! I specified --disable-werror in
./configure from the beginning, I didn't see this problem before.

As you said, I use the following patch:

diff --git a/gcc/ubsan.cc b/gcc/ubsan.cc
index 5641d3cc3be..a858994c841 100644
--- a/gcc/ubsan.cc
+++ b/gcc/ubsan.cc
@@ -832,8 +832,8 @@ ubsan_expand_null_ifn (gimple_stmt_iterator *gsip)
   else
 {
   enum built_in_function bcode
-   = (flag_sanitize_recover & ((check_align ? SANITIZE_ALIGNMENT : 0)
-   | (check_null ? SANITIZE_NULL : 0)))
+   = (flag_sanitize_recover & ((check_align ? SANITIZE_ALIGNMENT + 0 : 0)
+   | (check_null ? SANITIZE_NULL + 0 : 0)))
  ? BUILT_IN_UBSAN_HANDLE_TYPE_MISMATCH_V1
  : BUILT_IN_UBSAN_HANDLE_TYPE_MISMATCH_V1_ABORT;
   tree fn = builtin_decl_implicit (bcode);


And tested fine in native compiling for x86_64 , I will change it in the
next version.

BTW:
The platform I'm using is x86-64, so I'm trying to find a way to
reproduce this issue when cross-compiling with aarch64, which I
haven't found so far, the issue only seems to happen with native
compilation.

But most of the code changes are for the aarch64 platform,
is it enough for me to do the following tests before submitting
the patch?
1) A full compile of gcc under x86_64 platform
(make; make install; make bootstrap;)
2) Test all testsuites in aarch64 cross-compile environment
(make -k check)


Thanks,
Dan


[PATCH] [PATCH, v6, 1/1, AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-02-19 Thread Dan Li via Gcc-patches
Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

Signed-off-by: Dan Li 

gcc/ChangeLog:

* config/aarch64/aarch64.cc (SLOT_REQUIRED):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_layout_frame): Likewise, and
change callee_adjust when scs is enabled.
(aarch64_save_callee_saves):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_restore_callee_saves):
Change wb_candidate[12] to wb_pop_candidate[12].
(aarch64_get_separate_components):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_expand_prologue): Push x30 onto SCS before it's
pushed onto stack.
(aarch64_expand_epilogue): Pop x30 frome SCS, while
preventing it from being popped from the regular stack again.
(aarch64_override_options_internal): Add SCS compile option check.
(TARGET_HAVE_SHADOW_CALL_STACK): New hook.
* config/aarch64/aarch64.h (struct GTY): Add is_scs_enabled,
wb_pop_candidate[12], and rename wb_candidate[12] to
wb_push_candidate[12].
* config/aarch64/aarch64.md (scs_push): New template.
(scs_pop): Likewise.
* doc/invoke.texi: Document -fsanitize=shadow-call-stack.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add hook have_shadow_call_stack.
* flag-types.h (enum sanitize_code):
Add SANITIZE_SHADOW_CALL_STACK.
* opts.cc (parse_sanitizer_options): Add shadow-call-stack
and exclude SANITIZE_SHADOW_CALL_STACK.
* target.def: New hook.
* toplev.cc (process_options): Add SCS compile option check.
* ubsan.cc (ubsan_expand_null_ifn): Enum type conversion.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shadow_call_stack_1.c: New test.
* gcc.target/aarch64/shadow_call_stack_2.c: New test.
* gcc.target/aarch64/shadow_call_stack_3.c: New test.
* gcc.target/aarch64/shadow_call_stack_4.c: New test.
* gcc.target/aarch64/shadow_call_stack_5.c: New test.
* gcc.target/aarch64/shadow_call_stack_6.c: New test.
* gcc.target/aarch64/shadow_call_stack_7.c: New test.
* gcc.target/aarch64/shadow_call_stack_8.c: New test.
---
V6:
- Exclude SANITIZE_SHADOW_CALL_STACK.
- Fix enum type conversion error.
- Rebase to the mainline (20220216).

V5:
- Modify part of wb_push_candidates to wb_pop_candidates.
- Rebase to the mainline (20220210).

V4:
- Added wb_[push|pop]_candidates[12] to ensure push/pop can
emit different registers.

V3:
- Change scs_push/pop to standard move patterns.
- Optimize scs_pop to avoid pop x30 twice when shadow stack is enabled.

 gcc/config/aarch64/aarch64.cc | 113 +-
 gcc/config/aarch64/aarch64.h  |  21 +++-
 gcc/config/aarch64/aarch64.md |  10 ++
 gcc/doc/invoke.texi   |  30 +
 gcc/doc/tm.texi   |   5 +
 gcc/doc/tm.texi.in|   2 +
 gcc/flag-types.h  |   2 +
 gcc/opts.cc   |   4 +-
 gcc/target.def|   8 ++
 .../gcc.target/aarch64/shadow_call_stack_1.c  |   6 +
 .../gcc.target/aarch64/shadow_call_stack_2.c  |   6 +
 .../gcc.target/aarch64/shadow_call_stack_3.c  |  45 +++
 .../gcc.target/aarch64/shadow_call_stack_4.c  |  20 
 .../gcc.target/aarch64/shadow_call_stack_5.c  |  18 +++
 .../gcc.target/aarch64/shadow_call_stack_6.c  |  18 +++
 .../gcc.target/aarch64/shadow_call_stack_7.c  |  18 +++
 .../gcc.target/aarch64/shadow_call_stack_8.c  |  24 
 gcc/toplev.cc |  10 ++
 gcc/ubsan.cc  |   4 +-
 19 files changed, 330 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_8.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 37ed22

Re: [PATCH] [PATCH,v5,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-02-19 Thread Dan Li via Gcc-patches




On 2/15/22 10:02, Richard Sandiford wrote:

Dan Li  writes:

Shadow Call Stack can be used to protect the return address of a


Looks good, thanks.  However, when I bootstrap it on
aarch64-linux-gnu I get:

.../gcc/ubsan.cc: In function ‘bool 
ubsan_expand_null_ifn(gimple_stmt_iterator*)’:
.../gcc/ubsan.cc:835:50: error: enumerated and non-enumerated type in 
conditional expression [-Werror=extra]
   835 | = (flag_sanitize_recover & ((check_align ? SANITIZE_ALIGNMENT 
: 0)
   |  
^~~~
.../gcc/ubsan.cc:836:51: error: enumerated and non-enumerated type in 
conditional expression [-Werror=extra]
   836 | | (check_null ? SANITIZE_NULL : 
0)))
   |~~~^~~

I think this is because you're taking the last available bit
of the enum :-)

A hacky fix is to add "+ 0" to SANITIZE_ALIGNMENT and SANITIZE_NULL
in the code quoted above (i.e. the code in the error messages).
That seems slightly more robust than a cast to unsigned int (say),
since "+ 0" will work even if the values become 64-bit quantities
in future.

Richard


Hi, Richard,
I have sent out the v6[1]. I re-tested native/cross-compile, and tested
all test cases in both cases (make -k check).

The test results of this patch are consistent with the current mainline
(but in my test environment there are some test cases that fail in both).

The current version seems fine to me, please let me know if I'm missing
something :)

[1]https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590602.html

Thanks,
Dan


[PATCH] [PATCH] AArch64: add R30_REGNUM into shrink-wrapping separate

2022-02-24 Thread Dan Li via Gcc-patches
R30_REGNUM could also be used as a component in shrink-wrapping
separate, this patch enables it in aarch64.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_get_separate_components):
Remove bitmap clear of R30_REGNUM.
(aarch64_components_for_bb): Support R30_REGNUM as a component.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shrink_wrap_separate_1.c: New test.
---
 gcc/config/aarch64/aarch64.cc   |  4 ++--
 .../gcc.target/aarch64/shrink_wrap_separate_1.c | 17 +
 2 files changed, 19 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 8bcee8be9eb..6e1589b0312 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8463,7 +8463,6 @@ aarch64_get_separate_components (void)
   if (reg1 != INVALID_REGNUM)
 bitmap_clear_bit (components, reg1);
 
-  bitmap_clear_bit (components, LR_REGNUM);
   bitmap_clear_bit (components, SP_REGNUM);
 
   return components;
@@ -8500,7 +8499,8 @@ aarch64_components_for_bb (basic_block bb)
   /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
   for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
 if (!fixed_regs[regno]
-   && !crtl->abi->clobbers_full_reg_p (regno)
+   && (regno == R30_REGNUM
+   || !crtl->abi->clobbers_full_reg_p (regno))
&& (TEST_HARD_REG_BIT (extra_caller_saves, regno)
|| bitmap_bit_p (in, regno)
|| bitmap_bit_p (gen, regno)
diff --git a/gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c 
b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c
new file mode 100644
index 000..34002705ace
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fomit-frame-pointer -fdump-rtl-pro_and_epilogue" } */
+
+void f();
+
+int g(int x)
+{
+  if (x == 0)
+{
+  __asm__ ("":::"x19", "x20");
+  return 1;
+}
+  f();
+  return 2;
+}
+
+/* { dg-final { scan-rtl-dump {The components we wrap separately are \[sep 
30\]} "pro_and_epilogue"  } } */
-- 
2.17.1



Re: [PATCH] [PATCH,v4,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-02-25 Thread Dan Li via Gcc-patches




On 2/11/22 07:35, Richard Sandiford wrote:

Dan Li  writes:

On 2/11/22 01:53, Richard Sandiford wrote:

Dan Li  writes:

On 2/10/22 01:55, Richard Sandiford wrote:



void f();
int g(int x) {
  if (x == 0) {
  __asm__ ("":::"x19", "x20");
  return 1;
  }
  f();
  return 2;
}

Then it seems X30 is treat as a "component" (the test
result of aarch64.exp also seems fine).

g:
  stp x19, x20, [sp, -32]!
  cbnzw0, .L2
  mov w0, 1
  ldp x19, x20, [sp], 32
  ret
.L2:
  str x30, [sp, 16]
  bl  f
  ldr x30, [sp, 16]
  mov w0, 2
  ldp x19, x20, [sp], 32
  ret

And I think maybe we could handle this through three patches:
1.Keep current patch (a V5) unchanged for scs.
2.Add shrink-warpping for X30:
logically this might be a separate topic, and I think more testing
might be needed here (Well, I'm a little worried about if there might
be other effects, since I just read this part of the code roughly
yesterday).
3.Add scs push/pop to shrink-wrapping (and maybe we can do the same for
the PAC code in pro/epilogue, since it's also the operation of the X30).


Yeah, that's fair.

(Like I said earlier, I wasn't asking for the shrink-wrapping change.
It was just a note in passing.  But as you point out, the individual
shrink-wrapping support would be even more work than I'd imagined.)



Hi, Richard,

As said before, I try to enable R30 as component[1] in
shrink-wrapping separate.

The test results of this patch in x86-64 native compile/aarch64
cross-compile are consistent with the mainline (but there are
still few use cases not working properly in my test environment).

please let me know if I'm missing something :)

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590906.html



Re: [PATCH] [RFC][PR102768] aarch64: Add compiler support for Shadow Call Stack

2021-11-23 Thread Dan Li via Gcc-patches

Hi Szabolcs,

First of all, apologies for my late reply (since I just had a new baby,
I'm quite busy recently and also because I'm not familiar with C++
exception handling, it takes me some time to learn this part).

On 11/3/21 8:00 PM, Szabolcs Nagy wrote:

The 11/03/2021 00:24, Dan Li wrote:

On 11/2/21 9:04 PM, Szabolcs Nagy wrote:

The 11/02/2021 00:06, Dan Li via Gcc-patches wrote:

Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as described in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768


i'm not a gcc maintainer, but i prefer such feature
to be in upstream gcc instead of in a plugin.

it will require update to the documentation:

which should mention that it depends on -ffixed-x18
(probably that should be enforced too) which is an
important abi issue: functions following the normal
pcs can clobber x18 and break scs.


Thanks Szabolcs, I will update the documentation in next version.

It sounds reasonable to enforced -ffixed-x18 with scs, but I see
that clang doesn’t do that. Maybe it is better to be consistent
with clang here?


i mean gcc can issue a diagnostic if -ffixed-x18 is not passed.
(it seems clang rejects scs too without -ffixed-x18)


Oh, yes, you are right. Clang rejects scs without -ffixed-x18[1],
I should add a similar check in next version.

and that there is no unwinder support.


Ok, let me try to add a support for this.


i assume exception handling info has to change for scs to
work (to pop the shadow stack when transferring control),
so either scs must require -fno-exceptions or the eh info
changes must be implemented.

i think the kernel does not require exceptions and does
not depend on the unwinder runtime in libgcc, so this
is optional for the linux kernel use-case.


I recompiled a glibc and gcc runtime library with -ffixed-x18 enabled.
As you said, the scs stack needs to be popped at the same time during
exception handling.

I saw that Clang is processed by adding
".cfi_escape 0x16, 0x12, 0x02, 0x82, 0x78"
directive (x18 -= 8;) after each emit of scs push[2].

But this directive has problems when executed in libgcc:
1)context->reg[x] in uw_init_context_1 are all based on cfa, most
  registers have no initial values by default.
2)Address of shadow call stack (x18) cannot(and should not) be calculated
  based on cfa, and I did not yet find a way to assign hardware register
  x18 to context->reg[18].
3)This causes libgcc to crash when parsing .cfi_escape exp because of 0
  address dereference (* x18)
  (execute_stack_op => case DW_OP_breg18: _Unwind_GetGR)
4)uw_install_context_1 does not restore all hardware registers by default
  before eh return, so context->reg[18] can't write directly to hw x18.
  (In clang, __unw_getcontext/__unw_resume will save/restore all hardware
  registers, so this directive works fine in my libunwind test.)

I tried to fix this problem through a patch[3], the exception handling
works fine in my test environment, but I'm not sure if this fix is
ppropriate for two reasons:
1)libgcc does not push/pop all registers by default during exception
  handling. Is this change appropriate?
2)The test case may not be able to test this patch, because the test
  environment requires at least on glibc/gcc runtime compiled with
  -ffixed-x18.

May be it's better to rely on -fno-exceptions for this patch first? and If
the glibc/gcc runtime also supports SCS later, the problem can be fixed
at the same time.

PS:
I'm still not familiar enough with exception handling in libgcc/libunwind,
please correct me if there are any mistakes :)

[1] 
https://github.com/llvm/llvm-project/commit/f11eb3ebe77729426e562d7d4d7ebb1d5ff2e7c8
[2] https://reviews.llvm.org/D54609
[3] https://gcc.gnu.org/bugzilla/attachment.cgi?id=51854&action=diff



Re: [PATCH] [RFC][PR102768] aarch64: Add compiler support for Shadow Call Stack

2021-11-23 Thread Dan Li via Gcc-patches




On 11/23/21 6:51 PM, Szabolcs Nagy wrote:

The 11/23/2021 16:32, Dan Li wrote:

On 11/3/21 8:00 PM, Szabolcs Nagy wrote:

i assume exception handling info has to change for scs to
work (to pop the shadow stack when transferring control),
so either scs must require -fno-exceptions or the eh info
changes must be implemented.

i think the kernel does not require exceptions and does
not depend on the unwinder runtime in libgcc, so this
is optional for the linux kernel use-case.


I recompiled a glibc and gcc runtime library with -ffixed-x18 enabled.
As you said, the scs stack needs to be popped at the same time during
exception handling.

I saw that Clang is processed by adding
".cfi_escape 0x16, 0x12, 0x02, 0x82, 0x78"
directive (x18 -= 8;) after each emit of scs push[2].

But this directive has problems when executed in libgcc:
1)context->reg[x] in uw_init_context_1 are all based on cfa, most
   registers have no initial values by default.
2)Address of shadow call stack (x18) cannot(and should not) be calculated
   based on cfa, and I did not yet find a way to assign hardware register
   x18 to context->reg[18].
3)This causes libgcc to crash when parsing .cfi_escape exp because of 0
   address dereference (* x18)
   (execute_stack_op => case DW_OP_breg18: _Unwind_GetGR)
4)uw_install_context_1 does not restore all hardware registers by default
   before eh return, so context->reg[18] can't write directly to hw x18.
   (In clang, __unw_getcontext/__unw_resume will save/restore all hardware
   registers, so this directive works fine in my libunwind test.)

I tried to fix this problem through a patch[3], the exception handling
works fine in my test environment, but I'm not sure if this fix is
ppropriate for two reasons:
1)libgcc does not push/pop all registers by default during exception
   handling. Is this change appropriate?
2)The test case may not be able to test this patch, because the test
   environment requires at least on glibc/gcc runtime compiled with
   -ffixed-x18.

May be it's better to rely on -fno-exceptions for this patch first? and If
the glibc/gcc runtime also supports SCS later, the problem can be fixed
at the same time.


i did not look at the exception handling in detail (that's
difficult to understand for me too).

to use scs, non-default abi is required anyway, so not
supporting exceptions sounds fine to me. however it should
be documented and ideally enforced (-fexceptions should
be rejected, just like -fno-fixed-x18).

Thanks Szabolcs,

This sounds reasonable to me, and I'll fix it in the next version.


i assume the linux kernel does not require -fexceptions.


AFAIK, -fexceptions are not used in the linux kernel.


PS:
I'm still not familiar enough with exception handling in libgcc/libunwind,
please correct me if there are any mistakes :)

[1] 
https://github.com/llvm/llvm-project/commit/f11eb3ebe77729426e562d7d4d7ebb1d5ff2e7c8
[2] https://reviews.llvm.org/D54609
[3] https://gcc.gnu.org/bugzilla/attachment.cgi?id=51854&action=diff



[PATCH] [RFC, v2, 1/1, AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2021-11-25 Thread Dan Li via Gcc-patches
Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

Signed-off-by: Dan Li 

gcc/c-family/ChangeLog:

* c-attribs.c (handle_no_sanitize_shadow_call_stack_attribute):
New.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_shadow_call_stack_enabled):
New decl.
* config/aarch64/aarch64.c (aarch64_shadow_call_stack_enabled):
New.
(aarch64_expand_prologue):  Push x30 onto SCS before it's
pushed onto stack.
(aarch64_expand_epilogue):  Pop x30 frome SCS.
* config/aarch64/aarch64.h (TARGET_SUPPORT_SHADOW_CALL_STACK):
New macro.
(TARGET_CHECK_SCS_RESERVED_REGISTER):   Likewise.
* config/aarch64/aarch64.md (scs_push): New template.
(scs_pop):  Likewise.
* defaults.h (TARGET_SUPPORT_SHADOW_CALL_STACK):New macro.
* doc/extend.texi:  Document -fsanitize=shadow-call-stack.
* doc/invoke.texi:  Document attribute.
* flag-types.h (enum sanitize_code):Add
SANITIZE_SHADOW_CALL_STACK.
* opts-global.c (handle_common_deferred_options):   Add SCS
compile option check.
* opts.c (finish_options):  Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shadow_call_stack_1.c: New test.
* gcc.target/aarch64/shadow_call_stack_2.c: New test.
* gcc.target/aarch64/shadow_call_stack_3.c: New test.
* gcc.target/aarch64/shadow_call_stack_4.c: New test.
---
 gcc/c-family/c-attribs.c  | 21 +
 gcc/config/aarch64/aarch64-protos.h   |  1 +
 gcc/config/aarch64/aarch64.c  | 27 +++
 gcc/config/aarch64/aarch64.h  | 11 +
 gcc/config/aarch64/aarch64.md | 18 
 gcc/defaults.h|  4 ++
 gcc/doc/extend.texi   |  7 +++
 gcc/doc/invoke.texi   | 29 
 gcc/flag-types.h  |  2 +
 gcc/opts-global.c |  6 +++
 gcc/opts.c| 12 +
 .../gcc.target/aarch64/shadow_call_stack_1.c  |  6 +++
 .../gcc.target/aarch64/shadow_call_stack_2.c  |  6 +++
 .../gcc.target/aarch64/shadow_call_stack_3.c  | 45 +++
 .../gcc.target/aarch64/shadow_call_stack_4.c  | 18 
 15 files changed, 213 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 007b928c54b..9b3a35c06bf 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -56,6 +56,8 @@ static tree handle_cold_attribute (tree *, tree, tree, int, 
bool *);
 static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_sanitize_address_attribute (tree *, tree, tree,
  int, bool *);
+static tree handle_no_sanitize_shadow_call_stack_attribute (tree *, tree,
+ tree, int, bool *);
 static tree handle_no_sanitize_thread_attribute (tree *, tree, tree,
 int, bool *);
 static tree handle_no_address_safety_analysis_attribute (tree *, tree, tree,
@@ -454,6 +456,10 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_no_sanitize_attribute, NULL },
   { "no_sanitize_address",0, 0, true, false, false, false,
  handle_no_sanitize_address_attribute, NULL },
+  { "no_sanitize_shadow_call_stack",
+ 0, 0, true, false, false, false,
+ handle_no_sanitize_shadow_call_stack_attribute,
+ NULL },
   { "no_sanitize_thread", 0, 0, true, false, false, false,
  handle_no_sanitize_thread_attribute, NULL },
   { "no_sanitize_undefined",  0, 0, true, false, false, false,
@@ -1175,6 +1181,21 @@ handle_no_sanitize_address_attribute (tree *node, tree 
name, tree, int,
   return NULL_TREE;
 }
 
+/* Handle a "no_sanitize_shadow_call_stack" attribute; arguments as

[PATCH] [PATCH, v2, 1/1, AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2021-12-05 Thread Dan Li via Gcc-patches
Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

Signed-off-by: Dan Li 

gcc/c-family/ChangeLog:

* c-attribs.c (handle_no_sanitize_shadow_call_stack_attribute):
New.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_shadow_call_stack_enabled):
New decl.
* config/aarch64/aarch64.c (aarch64_shadow_call_stack_enabled):
New.
(aarch64_expand_prologue):  Push x30 onto SCS before it's
pushed onto stack.
(aarch64_expand_epilogue):  Pop x30 frome SCS.
* config/aarch64/aarch64.h (TARGET_SUPPORT_SHADOW_CALL_STACK):
New macro.
(TARGET_CHECK_SCS_RESERVED_REGISTER):   Likewise.
* config/aarch64/aarch64.md (scs_push): New template.
(scs_pop):  Likewise.
* defaults.h (TARGET_SUPPORT_SHADOW_CALL_STACK):New macro.
* doc/extend.texi:  Document -fsanitize=shadow-call-stack.
* doc/invoke.texi:  Document attribute.
* flag-types.h (enum sanitize_code):Add
SANITIZE_SHADOW_CALL_STACK.
* opts-global.c (handle_common_deferred_options):   Add SCS
compile option check.
* opts.c (finish_options):  Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shadow_call_stack_1.c: New test.
* gcc.target/aarch64/shadow_call_stack_2.c: New test.
* gcc.target/aarch64/shadow_call_stack_3.c: New test.
* gcc.target/aarch64/shadow_call_stack_4.c: New test.
---
 gcc/c-family/c-attribs.c  | 21 +
 gcc/config/aarch64/aarch64-protos.h   |  1 +
 gcc/config/aarch64/aarch64.c  | 27 +++
 gcc/config/aarch64/aarch64.h  | 11 +
 gcc/config/aarch64/aarch64.md | 18 
 gcc/defaults.h|  4 ++
 gcc/doc/extend.texi   |  7 +++
 gcc/doc/invoke.texi   | 29 
 gcc/flag-types.h  |  2 +
 gcc/opts-global.c |  6 +++
 gcc/opts.c| 12 +
 .../gcc.target/aarch64/shadow_call_stack_1.c  |  6 +++
 .../gcc.target/aarch64/shadow_call_stack_2.c  |  6 +++
 .../gcc.target/aarch64/shadow_call_stack_3.c  | 45 +++
 .../gcc.target/aarch64/shadow_call_stack_4.c  | 18 
 15 files changed, 213 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 007b928c54b..9b3a35c06bf 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -56,6 +56,8 @@ static tree handle_cold_attribute (tree *, tree, tree, int, 
bool *);
 static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_sanitize_address_attribute (tree *, tree, tree,
  int, bool *);
+static tree handle_no_sanitize_shadow_call_stack_attribute (tree *, tree,
+ tree, int, bool *);
 static tree handle_no_sanitize_thread_attribute (tree *, tree, tree,
 int, bool *);
 static tree handle_no_address_safety_analysis_attribute (tree *, tree, tree,
@@ -454,6 +456,10 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_no_sanitize_attribute, NULL },
   { "no_sanitize_address",0, 0, true, false, false, false,
  handle_no_sanitize_address_attribute, NULL },
+  { "no_sanitize_shadow_call_stack",
+ 0, 0, true, false, false, false,
+ handle_no_sanitize_shadow_call_stack_attribute,
+ NULL },
   { "no_sanitize_thread", 0, 0, true, false, false, false,
  handle_no_sanitize_thread_attribute, NULL },
   { "no_sanitize_undefined",  0, 0, true, false, false, false,
@@ -1175,6 +1181,21 @@ handle_no_sanitize_address_attribute (tree *node, tree 
name, tree, int,
   return NULL_TREE;
 }
 
+/* Handle a "no_sanitize_shadow_call_stack" attribute; arguments as

Re: [PATCH] [PATCH,v2,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2021-12-05 Thread Dan Li via Gcc-patches




On 12/6/21 10:41 AM, Dan Li wrote:

Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

Signed-off-by: Dan Li 


Add more reviewers :)


[PATCH] [PATCH] aarch64:fix redundant check in aut insn generation [PR103017] During the generation of the epilogue of aarch64(aarch64_expand_epilogue), the value of crtl->calls_eh_return does not nee

2021-11-01 Thread Dan Li via Gcc-patches
gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_expand_epilogue):
* config/aarch64/aarch64.md:

Signed-off-by: Dan Li 
---
 gcc/config/aarch64/aarch64.c  | 6 +-
 gcc/config/aarch64/aarch64.md | 3 +--
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 699c105a42a..8448e56443c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9076,13 +9076,9 @@ aarch64_expand_epilogue (bool for_sibcall)
2) The RETAA instruction is not available before ARMv8.3-A, so if we are
   generating code for !TARGET_ARMV8_3 we can't use it and must
   explicitly authenticate.
-
-   3) On an eh_return path we make extra stack adjustments to update the
-  canonical frame address to be the exception handler's CFA.  We want
-  to authenticate using the CFA of the function which calls eh_return.
 */
   if (aarch64_return_address_signing_enabled ()
-  && (for_sibcall || !TARGET_ARMV8_3 || crtl->calls_eh_return))
+  && (for_sibcall || !TARGET_ARMV8_3))
 {
   switch (aarch64_ra_sign_key)
{
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1a39470a1fe..65ee6159d73 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -879,8 +879,7 @@ (define_insn "*do_return"
   {
 const char *ret = NULL;
 if (aarch64_return_address_signing_enabled ()
-   && (TARGET_PAUTH)
-   && !crtl->calls_eh_return)
+   && (TARGET_PAUTH))
   {
if (aarch64_ra_sign_key == AARCH64_KEY_B)
  ret = "retab";
-- 
2.17.1



[PATCH] [PR103017] aarch64:fix redundant check in aut insn generation

2021-11-01 Thread Dan Li via Gcc-patches
During the generation of the epilogue of aarch64(aarch64_expand_epilogue),
the value of crtl->calls_eh_return does not need to be checked again.
This value has been checked during aarch64_return_address_signing_enabled.

gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_expand_epilogue):
* config/aarch64/aarch64.md:

Signed-off-by: Dan Li 
---
 gcc/config/aarch64/aarch64.c  | 6 +-
 gcc/config/aarch64/aarch64.md | 3 +--
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 699c105a42a..8448e56443c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9076,13 +9076,9 @@ aarch64_expand_epilogue (bool for_sibcall)
2) The RETAA instruction is not available before ARMv8.3-A, so if we are
   generating code for !TARGET_ARMV8_3 we can't use it and must
   explicitly authenticate.
-
-   3) On an eh_return path we make extra stack adjustments to update the
-  canonical frame address to be the exception handler's CFA.  We want
-  to authenticate using the CFA of the function which calls eh_return.
 */
   if (aarch64_return_address_signing_enabled ()
-  && (for_sibcall || !TARGET_ARMV8_3 || crtl->calls_eh_return))
+  && (for_sibcall || !TARGET_ARMV8_3))
 {
   switch (aarch64_ra_sign_key)
{
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1a39470a1fe..65ee6159d73 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -879,8 +879,7 @@ (define_insn "*do_return"
   {
 const char *ret = NULL;
 if (aarch64_return_address_signing_enabled ()
-   && (TARGET_PAUTH)
-   && !crtl->calls_eh_return)
+   && (TARGET_PAUTH))
   {
if (aarch64_ra_sign_key == AARCH64_KEY_B)
  ret = "retab";
-- 
2.17.1



[PATCH] [RFC][PR102768] aarch64: Add compiler support for Shadow Call Stack

2021-11-02 Thread Dan Li via Gcc-patches
Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as described in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

gcc/c-family/ChangeLog:

* c-attribs.c (handle_no_sanitize_shadow_call_stack_attribute):

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_shadow_call_stack_enabled):
* config/aarch64/aarch64.c (aarch64_shadow_call_stack_enabled):
(aarch64_expand_prologue):
(aarch64_expand_epilogue):
* config/aarch64/aarch64.h (TARGET_SUPPORT_SHADOW_CALL_STACK):
* config/aarch64/aarch64.md (scs_push):
(scs_pop):
* defaults.h (TARGET_SUPPORT_SHADOW_CALL_STACK):
* flag-types.h (enum sanitize_code):
* opts.c (finish_options):

Signed-off-by: Dan Li 
---
 gcc/c-family/c-attribs.c| 21 +
 gcc/config/aarch64/aarch64-protos.h |  1 +
 gcc/config/aarch64/aarch64.c| 27 +++
 gcc/config/aarch64/aarch64.h|  4 
 gcc/config/aarch64/aarch64.md   | 18 ++
 gcc/defaults.h  |  4 
 gcc/flag-types.h|  2 ++
 gcc/opts.c  |  6 ++
 8 files changed, 83 insertions(+)

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 007b928c54b..9b3a35c06bf 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -56,6 +56,8 @@ static tree handle_cold_attribute (tree *, tree, tree, int, 
bool *);
 static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_sanitize_address_attribute (tree *, tree, tree,
  int, bool *);
+static tree handle_no_sanitize_shadow_call_stack_attribute (tree *, tree,
+ tree, int, bool *);
 static tree handle_no_sanitize_thread_attribute (tree *, tree, tree,
 int, bool *);
 static tree handle_no_address_safety_analysis_attribute (tree *, tree, tree,
@@ -454,6 +456,10 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_no_sanitize_attribute, NULL },
   { "no_sanitize_address",0, 0, true, false, false, false,
  handle_no_sanitize_address_attribute, NULL },
+  { "no_sanitize_shadow_call_stack",
+ 0, 0, true, false, false, false,
+ handle_no_sanitize_shadow_call_stack_attribute,
+ NULL },
   { "no_sanitize_thread", 0, 0, true, false, false, false,
  handle_no_sanitize_thread_attribute, NULL },
   { "no_sanitize_undefined",  0, 0, true, false, false, false,
@@ -1175,6 +1181,21 @@ handle_no_sanitize_address_attribute (tree *node, tree 
name, tree, int,
   return NULL_TREE;
 }
 
+/* Handle a "no_sanitize_shadow_call_stack" attribute; arguments as in
+   struct attribute_spec.handler.  */
+static tree
+handle_no_sanitize_shadow_call_stack_attribute (tree *node, tree name,
+ tree, int, bool *no_add_attrs)
+{
+  *no_add_attrs = true;
+  if (TREE_CODE (*node) != FUNCTION_DECL)
+warning (OPT_Wattributes, "%qE attribute ignored", name);
+  else
+add_no_sanitize_value (*node, SANITIZE_SHADOW_CALL_STACK);
+
+  return NULL_TREE;
+}
+
 /* Handle a "no_sanitize_thread" attribute; arguments as in
struct attribute_spec.handler.  */
 
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 768e8fae136..150c015df21 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -893,6 +893,7 @@ void aarch64_register_pragmas (void);
 void aarch64_relayout_simd_types (void);
 void aarch64_reset_previous_fndecl (void);
 bool aarch64_return_address_signing_enabled (void);
+bool aarch64_shadow_call_stack_enabled (void);
 bool aarch64_bti_enabled (void);
 void aarch64_save_restore_target_globals (tree);
 void aarch64_addti_scratch_regs (rtx, rtx, rtx *,
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 699c105a42a..5a36a459f4e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -79,6 +79,7 @@
 #include "tree-ssa-loop-niter.h"
 #include "fractional-cost.h"
 #include "rtlanal.h"
+#include "asan.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -7799,6 +7800,24 @

Re: [PATCH] [RFC][PR102768] aarch64: Add compiler support for Shadow Call Stack

2021-11-02 Thread Dan Li via Gcc-patches




On 11/2/21 9:04 PM, Szabolcs Nagy wrote:

The 11/02/2021 00:06, Dan Li via Gcc-patches wrote:

Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as described in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768


i'm not a gcc maintainer, but i prefer such feature
to be in upstream gcc instead of in a plugin.

it will require update to the documentation:

which should mention that it depends on -ffixed-x18
(probably that should be enforced too) which is an
important abi issue: functions following the normal
pcs can clobber x18 and break scs.


Thanks Szabolcs, I will update the documentation in next version.

It sounds reasonable to enforced -ffixed-x18 with scs, but I see
that clang doesn’t do that. Maybe it is better to be consistent
with clang here?


and that there is no unwinder support.


Ok, let me try to add a support for this.


the abi issue means it is unlikely to be useful in
linux user space (even if libc and unwinder support
is implemented), but it can be still useful in
freestanding code such as the linux kernel.

thanks.



[RFC/RFT 0/3] Add compiler support for Control Flow Integrity

2022-12-18 Thread Dan Li via Gcc-patches
This series of patches is mainly used to support the control flow
integrity protection of the linux kernel [1], which is similar to
-fsanitize=kcfi in clang 16.0 [2,3].

I hope that this feature will also support user-mode CFI in the
future (at least for developers who can recompile the runtime),
so I use -fsanitize=cfi as a compilation option here.

Any suggestion please let me know :).

Thanks, Dan.

[1] 
https://lore.kernel.org/all/20220908215504.3686827-1-samitolva...@google.com/
[2] https://clang.llvm.org/docs/ControlFlowIntegrity.html
[3] https://reviews.llvm.org/D119296

Dan Li (3):
  [PR102768] flag-types.h (enum sanitize_code): Extend sanitize_code to
64 bits to support more features
  [PR102768] Support CFI: Add new pass for Control Flow Integrity
  [PR102768] aarch64: Add support for Control Flow Integrity

Signed-off-by: Dan Li 

---
 gcc/Makefile.in   |   1 +
 gcc/asan.h|   4 +-
 gcc/c-family/c-attribs.cc |  10 +-
 gcc/c-family/c-common.h   |   2 +-
 gcc/c/c-parser.cc |   4 +-
 gcc/cgraphunit.cc |  34 +++
 gcc/common.opt|   4 +-
 gcc/config/aarch64/aarch64.cc | 106 
 gcc/cp/typeck.cc  |   2 +-
 gcc/doc/invoke.texi   |  35 +++
 gcc/doc/passes.texi   |  10 +
 gcc/doc/tm.texi   |  27 +++
 gcc/doc/tm.texi.in|   8 +
 gcc/dwarf2asm.cc  |   2 +-
 gcc/flag-types.h  |  67 ++---
 gcc/opt-suggestions.cc|   2 +-
 gcc/opts.cc   |  26 +-
 gcc/opts.h|   8 +-
 gcc/output.h  |   3 +
 gcc/passes.def|   1 +
 gcc/target.def|  39 +++
 .../aarch64/control_flow_integrity_1.c|  14 ++
 .../aarch64/control_flow_integrity_2.c|  25 ++
 .../aarch64/control_flow_integrity_3.c|  23 ++
 gcc/toplev.cc |   4 +
 gcc/tree-cfg.cc   |   2 +-
 gcc/tree-cfi.cc   | 229 ++
 gcc/tree-pass.h   |   1 +
 gcc/tree.cc   | 144 +++
 gcc/tree.h|   1 +
 gcc/varasm.cc |  29 +++
 31 files changed, 803 insertions(+), 64 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/control_flow_integrity_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/control_flow_integrity_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/control_flow_integrity_3.c
 create mode 100644 gcc/tree-cfi.cc

-- 
2.17.1



[RFC/RFT 3/3] [PR102768] aarch64: Add support for Control Flow Integrity

2022-12-19 Thread Dan Li via Gcc-patches
In the AArch64 platform, typeid can be directly inserted in front
of the function header (offset is -4).

For all functions that will not be called indirectly, insert the
reserved RESERVED_CFI_TYPEID (0x0) as typeid in front of them. If
not, the attacker may use the instruction/data before the function
as typeid to bypass CFI.

All typeids ignore some bits (& AARCH64_UNALLOCATED_INSN_MASK) to
avoid conflicts with the AArch64 instruction set.

Signed-off-by: Dan Li 

gcc/ChangeLog:

PR c/102768
* config/aarch64/aarch64.cc (RESERVED_CFI_TYPEID): Macro definition.
(DEFAULT_CFI_TYPEID): Likewise.
(AARCH64_UNALLOCATED_INSN_MASK): Likewise.
(aarch64_gimple_get_func_cfi_typeid): Platform-dependent
CFI function.
(aarch64_calc_func_cfi_typeid): Likewise.
(cgraph_indirectly_callable): Determine whether a funtion may
be called indirectly.
(aarch64_output_func_cfi_typeid): Platform-dependent CFI function.
(TARGET_HAVE_CFI): New hook.
(TARGET_CALC_FUNC_CFI_TYPEID): Likewise.
(TARGET_ASM_OUTPUT_FUNC_CFI_TYPEID): Likewise.
(TARGET_GIMPLE_GET_FUNC_CFI_TYPEID): Likewise.
* doc/invoke.texi: Document -fsanitize=cfi.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/control_flow_integrity_1.c: New test.
* gcc.target/aarch64/control_flow_integrity_2.c: New test.
* gcc.target/aarch64/control_flow_integrity_3.c: New test.
---
 gcc/config/aarch64/aarch64.cc | 106 ++
 gcc/doc/invoke.texi   |  35 ++
 .../aarch64/control_flow_integrity_1.c|  14 +++
 .../aarch64/control_flow_integrity_2.c|  25 +
 .../aarch64/control_flow_integrity_3.c|  23 
 5 files changed, 203 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/control_flow_integrity_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/control_flow_integrity_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/control_flow_integrity_3.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 5c9e7791a12..2796df0cdf3 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -81,6 +81,7 @@
 #include "rtlanal.h"
 #include "tree-dfa.h"
 #include "asan.h"
+#include "ssa.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -5450,6 +5451,99 @@ aarch64_output_sve_addvl_addpl (rtx offset)
   return buffer;
 }
 
+/* Reserved for all functions that cannot be called indirectly.  */
+#define RESERVED_CFI_TYPEID 0x0U
+
+/* If the typeid of a function that can be called indirectly is equal to
+   RESERVED_CFI_TYPEID, change it to DEFAULT_CFI_TYPEID.  */
+#define DEFAULT_CFI_TYPEID 0x0ADAU
+
+/* Mask of reserved and unallocated instructions in AArch64 platform.  */
+#define AARCH64_UNALLOCATED_INSN_MASK 0xE7FFU
+
+/* Generate gimple insns to return the callee's typeid to a tmp var,
+   for aarch64, like:
+   __cfi_tmp = *(fptr - 4);  */
+
+static tree
+aarch64_gimple_get_func_cfi_typeid (gimple_seq *stmts,
+   location_t loc, tree fptr)
+{
+  gimple *stmt;
+  tree result, rhs;
+
+  result = create_tmp_var (integer_type_node, "__cfi_tmp");
+  result = make_ssa_name (result, NULL);
+
+  rhs = build_pointer_type (integer_type_node);
+  rhs = build_int_cst_type (rhs, -4);
+  rhs = build2 (MEM_REF, integer_type_node, fptr, rhs);
+
+  stmt = gimple_build_assign (result, rhs);
+  gimple_set_location (stmt, loc);
+
+  SSA_NAME_DEF_STMT (result) = stmt;
+
+  gimple_seq_add_stmt (stmts, stmt);
+
+  return result;
+}
+
+static unsigned int
+aarch64_calc_func_cfi_typeid (const_tree fntype)
+{
+  unsigned int hash;
+
+  /* The value of typeid has a probability of being the same as the encoding
+ of an instruction.  If the attacker can find the same encoding as the
+ typeid in the assembly code, then he has found a usable jump location.
+ So here, a platform-related mask is used when generating a typeid to
+ avoid such conflicts as much as possible.  */
+  hash = unified_type_hash (fntype) & AARCH64_UNALLOCATED_INSN_MASK;
+
+  /* RESERVED_CFI_TYPEID is reserved for functions that cannot
+ be called indirectly.  */
+  if (hash == RESERVED_CFI_TYPEID)
+hash = DEFAULT_CFI_TYPEID;
+
+  return hash;
+}
+
+static bool
+cgraph_indirectly_callable (struct cgraph_node *node,
+   void *data ATTRIBUTE_UNUSED)
+{
+  if (node->externally_visible || node->address_taken)
+return true;
+
+  return false;
+}
+
+static void
+aarch64_output_func_cfi_typeid (FILE * stream, tree decl)
+{
+  struct cgraph_node *node;
+  unsigned int cur_func_typeid;
+
+  node = cgraph_node::get (decl);
+
+  if (!node->call_for_symbol_thunks_and_aliases (cgraph_indirectly_callable,
+  NULL, true))
+/* CF

[RFC/RFT 1/3] [PR102768] flag-types.h (enum sanitize_code): Extend sanitize_code to 64 bits to support more features

2022-12-19 Thread Dan Li via Gcc-patches
32-bit sanitize_code can no longer accommodate new options,
extending it to 64-bit.

Signed-off-by: Dan Li 

gcc/ChangeLog:

PR c/102768
* asan.h (sanitize_flags_p): Promote to uint64_t.
* common.opt: Likewise.
* dwarf2asm.cc (dw2_output_indirect_constant_1): Likewise.
* flag-types.h (enum sanitize_code): Likewise.
* opt-suggestions.cc (option_proposer::build_option_suggestions):
Likewise.
* opts.cc (find_sanitizer_argument): Likewise.
(report_conflicting_sanitizer_options): Likewise.
(get_closest_sanitizer_option): Likewise.
(parse_sanitizer_options): Likewise.
(parse_no_sanitize_attribute): Likewise.
* opts.h (parse_sanitizer_options): Likewise.
(parse_no_sanitize_attribute): Likewise.
* tree-cfg.cc (print_no_sanitize_attr_value): Likewise.

gcc/c-family/ChangeLog:

* c-attribs.cc (add_no_sanitize_value): Likewise.
(handle_no_sanitize_attribute): Likewise.
* c-common.h (add_no_sanitize_value): Likewise.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_declaration_or_fndef): Likewise.

gcc/cp/ChangeLog:

* typeck.cc (get_member_function_from_ptrfunc): Likewise.
---
 gcc/asan.h|  4 +--
 gcc/c-family/c-attribs.cc | 10 +++---
 gcc/c-family/c-common.h   |  2 +-
 gcc/c/c-parser.cc |  4 +--
 gcc/common.opt|  4 +--
 gcc/cp/typeck.cc  |  2 +-
 gcc/dwarf2asm.cc  |  2 +-
 gcc/flag-types.h  | 65 ---
 gcc/opt-suggestions.cc|  2 +-
 gcc/opts.cc   | 22 ++---
 gcc/opts.h|  8 ++---
 gcc/tree-cfg.cc   |  2 +-
 12 files changed, 64 insertions(+), 63 deletions(-)

diff --git a/gcc/asan.h b/gcc/asan.h
index d4ea49cb240..5b98172549b 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -233,9 +233,9 @@ asan_protect_stack_decl (tree decl)
remove all flags mentioned in "no_sanitize" of DECL_ATTRIBUTES.  */
 
 static inline bool
-sanitize_flags_p (unsigned int flag, const_tree fn = current_function_decl)
+sanitize_flags_p (uint64_t flag, const_tree fn = current_function_decl)
 {
-  unsigned int result_flags = flag_sanitize & flag;
+  uint64_t result_flags = flag_sanitize & flag;
   if (result_flags == 0)
 return false;
 
diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 111a33f405a..a73e2364525 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -1118,23 +1118,23 @@ handle_cold_attribute (tree *node, tree name, tree 
ARG_UNUSED (args),
 /* Add FLAGS for a function NODE to no_sanitize_flags in DECL_ATTRIBUTES.  */
 
 void
-add_no_sanitize_value (tree node, unsigned int flags)
+add_no_sanitize_value (tree node, uint64_t flags)
 {
   tree attr = lookup_attribute ("no_sanitize", DECL_ATTRIBUTES (node));
   if (attr)
 {
-  unsigned int old_value = tree_to_uhwi (TREE_VALUE (attr));
+  uint64_t old_value = tree_to_uhwi (TREE_VALUE (attr));
   flags |= old_value;
 
   if (flags == old_value)
return;
 
-  TREE_VALUE (attr) = build_int_cst (unsigned_type_node, flags);
+  TREE_VALUE (attr) = build_int_cst (long_long_unsigned_type_node, flags);
 }
   else
 DECL_ATTRIBUTES (node)
   = tree_cons (get_identifier ("no_sanitize"),
-  build_int_cst (unsigned_type_node, flags),
+  build_int_cst (long_long_unsigned_type_node, flags),
   DECL_ATTRIBUTES (node));
 }
 
@@ -1145,7 +1145,7 @@ static tree
 handle_no_sanitize_attribute (tree *node, tree name, tree args, int,
  bool *no_add_attrs)
 {
-  unsigned int flags = 0;
+  uint64_t flags = 0;
   *no_add_attrs = true;
   if (TREE_CODE (*node) != FUNCTION_DECL)
 {
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 52a85bfb783..eb91b9703db 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1500,7 +1500,7 @@ extern enum flt_eval_method
 excess_precision_mode_join (enum flt_eval_method, enum flt_eval_method);
 
 extern int c_flt_eval_method (bool ts18661_p);
-extern void add_no_sanitize_value (tree node, unsigned int flags);
+extern void add_no_sanitize_value (tree node, uint64_t flags);
 
 extern void maybe_add_include_fixit (rich_location *, const char *, bool);
 extern void maybe_suggest_missing_token_insertion (rich_location *richloc,
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index f679d53706a..9d55ea55fa6 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -2217,7 +2217,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
  start_init (NULL_TREE, asm_name, global_bindings_p (), 
&richloc);
  /* A parameter is initialized, which is invalid.  Don't
 attempt to instrument the initializer.  */
- int flag_sanitize_save = flag_sanitize;
+ 

[RFC/RFT 2/3] [PR102768] Support CFI: Add new pass for Control Flow Integrity

2022-12-19 Thread Dan Li via Gcc-patches
The CFI sanitizer enabled with -fsanitize=cfi implements a forward
edge control flow integrity scheme for indirect calls, roughly
similar to -fsanitize=kcfi [1] in llvm.

At compile time, it appends a uniform type identifier before the
first instruction of each function and inserts check code before
each indirect call in a function with protection enabled.

At runtime, according to the code order, the check code for each
indirect call will be executed first, and it will:
1. Dynamically obtain the typeid before the callee function.
2. Compare it to the expected typeid of the current call site (caller).
3. If the two match, continue to execute the indirect call, if not,
call the user-defined callback function cfi_check_failed.

A typeid (type identifier) is a 32-bit constant on all platforms,
whose value depends on the function's prototype, and is invariant
across compilation units. However, different platforms may ignore
some of the bits to avoid conflicts with instructions.

If a program contains indirect calls to assembly functions, they
must be manually annotated with the expected type identifiers to
prevent errors. To make this easier, gcc generates a weak SHN_ABS
__cfi_typeid_ symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one translation unit linked into the program
takes the function address.

It should be noted that on different platforms, the location of
typeid insertion (the offset between it and the function header)
may be different, such as [1], and this patch only implements
the platform-independent part.

[1]: https://reviews.llvm.org/D119296

Signed-off-by: Dan Li 

gcc/ChangeLog:

PR c/102768
* Makefile.in: Add tree-cfi.o.
* cgraphunit.cc (output_decl_cfi_typeid_symbol): Output the
CFI typeid corresponding to each external declaration when necessary.
(output_decl_cfi_typeid_symbols): Likewise.
* doc/passes.texi: Document it.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: New hooks.
* flag-types.h (enum sanitize_code):
Add SANITIZE_CONTROL_FLOW_INTEGRITY.
* opts.cc (parse_sanitizer_options): Add cfi and exclude
SANITIZE_CONTROL_FLOW_INTEGRITY.
* output.h (default_output_func_cfi_typeid): Declare.
(default_calc_func_cfi_typeid): Declare.
(default_gimple_get_func_cfi_typeid): Declare.
* passes.def: Add pass_cfi.
* target.def: Add new hooks.
* toplev.cc (process_options): Add CFI compile option check.
* tree-pass.h (make_pass_cfi): Declare.
* tree.cc (tree_node_sizes[): Add the unified tree type hash
calculation functions.
(append_unified_type_hash): Likewise.
(initialize_unified_tree_type_hash_table): Likewise.
(append_unified_type_name_hash): Likewise.
(append_unified_type_precision_hash): Likewise.
(append_unified_function_ret_and_args_hash): Likewise.
(unified_type_hash): Likewise.
(init_ttree): Likewise.
* tree.h (unified_type_hash): Declare.
* varasm.cc (assemble_start_function): Output the CFI typeid
of each function.
(default_output_func_cfi_typeid): New.
(default_gimple_get_func_cfi_typeid): New.
(default_calc_func_cfi_typeid): New.
* tree-cfi.cc: New file.
---
 gcc/Makefile.in |   1 +
 gcc/cgraphunit.cc   |  34 +++
 gcc/doc/passes.texi |  10 ++
 gcc/doc/tm.texi |  27 ++
 gcc/doc/tm.texi.in  |   8 ++
 gcc/flag-types.h|   2 +
 gcc/opts.cc |   4 +-
 gcc/output.h|   3 +
 gcc/passes.def  |   1 +
 gcc/target.def  |  39 
 gcc/toplev.cc   |   4 +
 gcc/tree-cfi.cc | 229 
 gcc/tree-pass.h |   1 +
 gcc/tree.cc | 144 
 gcc/tree.h  |   1 +
 gcc/varasm.cc   |  29 ++
 16 files changed, 536 insertions(+), 1 deletion(-)
 create mode 100644 gcc/tree-cfi.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 31ff95500c9..0d23bad6b63 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1610,6 +1610,7 @@ OBJS = \
tree-call-cdce.o \
tree-cfg.o \
tree-cfgcleanup.o \
+   tree-cfi.o \
tree-chrec.o \
tree-complex.o \
tree-data-ref.o \
diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc
index 76d541755b8..fb4999559ae 100644
--- a/gcc/cgraphunit.cc
+++ b/gcc/cgraphunit.cc
@@ -,6 +,37 @@ ipa_passes (void)
   bitmap_obstack_release (NULL);
 }
 
+/* Output a weak symbol value of a decl's typeid (hash) to the
+   assembly file, like:
+   .weak __cfi_typeid_A
+   .set __cfi_typeid_A, 0x0ADA
+   typeid is platform-dependent, because the bits in typeid that conflicts
+   with the instruction set of the current platform needs to be ignored.  */
+
+static void
+output_decl_cfi_typeid_symbol (FILE *stream, tree fndecl)
+{
+  unsigne

Re: [RFC/RFT 0/3] Add compiler support for Control Flow Integrity

2023-02-10 Thread Dan Li via Gcc-patches
On 02/09, Hongtao Liu wrote:
> On Mon, Dec 19, 2022 at 3:59 PM Dan Li via Gcc-patches
>  wrote:
> >
> > This series of patches is mainly used to support the control flow
> > integrity protection of the linux kernel [1], which is similar to
> > -fsanitize=kcfi in clang 16.0 [2,3].
> >
> > I hope that this feature will also support user-mode CFI in the
> > future (at least for developers who can recompile the runtime),
> > so I use -fsanitize=cfi as a compilation option here.
> >
> > Any suggestion please let me know :).
> Do you have this series as a branch somewhere that we could also try for x86?

Hi Hongtao,

I haven't tried this feature on the x86 platform, if possible, I will try it in
the next version.

Thanks,
Dan.

> --
> BR,
> Hongtao


Re: [RFC/RFT 0/3] Add compiler support for Control Flow Integrity

2023-02-10 Thread Dan Li via Gcc-patches
On 02/08, Peter Collingbourne wrote:
> On Sun, Dec 18, 2022 at 10:06 PM Dan Li  wrote:
> >
> > This series of patches is mainly used to support the control flow
> > integrity protection of the linux kernel [1], which is similar to
> > -fsanitize=kcfi in clang 16.0 [2,3].
> >
> > I hope that this feature will also support user-mode CFI in the
> > future (at least for developers who can recompile the runtime),
> > so I use -fsanitize=cfi as a compilation option here.
> 
> Please don't. The various CFI-related build flags are confusing enough
> without also having this inconsistency between Clang and GCC.

Hi Peter,

Got it, as discussed before[1], in the next version I will use the same
compile option.

[1]. 
https://patchwork.kernel.org/project/linux-arm-kernel/patch/20221219061758.23321-1-ashimida.1...@gmail.com/

Thanks,
Dan.

> 
> Peter


[PING][PATCH, v2, 1/1, AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2021-12-20 Thread Dan Li via Gcc-patches

Gentile ping for this :), thanks.
Link: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586204.html


Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768


Signed-off-by: Dan Li 

gcc/c-family/ChangeLog:

* c-attribs.c (handle_no_sanitize_shadow_call_stack_attribute):
New.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_shadow_call_stack_enabled):
New decl.
* config/aarch64/aarch64.c (aarch64_shadow_call_stack_enabled):
New.
(aarch64_expand_prologue):  Push x30 onto SCS before it's
pushed onto stack.
(aarch64_expand_epilogue):  Pop x30 frome SCS.
* config/aarch64/aarch64.h (TARGET_SUPPORT_SHADOW_CALL_STACK):
New macro.
(TARGET_CHECK_SCS_RESERVED_REGISTER):   Likewise.
* config/aarch64/aarch64.md (scs_push): New template.
(scs_pop):  Likewise.
* defaults.h (TARGET_SUPPORT_SHADOW_CALL_STACK):New macro.
* doc/extend.texi:  Document -fsanitize=shadow-call-stack.
* doc/invoke.texi:  Document attribute.
* flag-types.h (enum sanitize_code):Add
SANITIZE_SHADOW_CALL_STACK.
* opts-global.c (handle_common_deferred_options):   Add SCS
compile option check.
* opts.c (finish_options):  Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shadow_call_stack_1.c: New test.
* gcc.target/aarch64/shadow_call_stack_2.c: New test.
* gcc.target/aarch64/shadow_call_stack_3.c: New test.
* gcc.target/aarch64/shadow_call_stack_4.c: New test.
---
 gcc/c-family/c-attribs.c  | 21 +
 gcc/config/aarch64/aarch64-protos.h   |  1 +
 gcc/config/aarch64/aarch64.c  | 27 +++
 gcc/config/aarch64/aarch64.h  | 11 +
 gcc/config/aarch64/aarch64.md | 18 
 gcc/defaults.h|  4 ++
 gcc/doc/extend.texi   |  7 +++
 gcc/doc/invoke.texi   | 29 
 gcc/flag-types.h  |  2 +
 gcc/opts-global.c |  6 +++
 gcc/opts.c| 12 +
 .../gcc.target/aarch64/shadow_call_stack_1.c  |  6 +++
 .../gcc.target/aarch64/shadow_call_stack_2.c  |  6 +++
 .../gcc.target/aarch64/shadow_call_stack_3.c  | 45 +++
 .../gcc.target/aarch64/shadow_call_stack_4.c  | 18 
 15 files changed, 213 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 007b928c54b..9b3a35c06bf 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -56,6 +56,8 @@ static tree handle_cold_attribute (tree *, tree, tree, int, 
bool *);
 static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_sanitize_address_attribute (tree *, tree, tree,
  int, bool *);
+static tree handle_no_sanitize_shadow_call_stack_attribute (tree *, tree,
+ tree, int, bool *);
 static tree handle_no_sanitize_thread_attribute (tree *, tree, tree,
 int, bool *);
 static tree handle_no_address_safety_analysis_attribute (tree *, tree, tree,
@@ -454,6 +456,10 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_no_sanitize_attribute, NULL },
   { "no_sanitize_address",0, 0, true, false, false, false,
  handle_no_sanitize_address_attribute, NULL },
+  { "no_sanitize_shadow_call_stack",
+ 0, 0, true, false, false, false,
+ handle_no_sanitize_shadow_call_stack_attribute,
+ NULL },
   { "no_sanitize_thread", 0, 0, true, false, false, false,
  handle_no_sanitize_thread_attribute, NULL },
   { "no_sanitize_undefined",  0, 0, true, false, false, false,
@@ -1175,6 +1181,21 @@ handle_no_sanitize_address_attribute (tree *node, tree 
name, tree

[PING^2][PATCH,v2,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-04 Thread Dan Li via Gcc-patches

Gentile ping for this again :), thanks.
Link: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586204.html


Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768


Signed-off-by: Dan Li 

gcc/c-family/ChangeLog:

* c-attribs.c (handle_no_sanitize_shadow_call_stack_attribute):
New.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_shadow_call_stack_enabled):
New decl.
* config/aarch64/aarch64.c (aarch64_shadow_call_stack_enabled):
New.
(aarch64_expand_prologue):  Push x30 onto SCS before it's
pushed onto stack.
(aarch64_expand_epilogue):  Pop x30 frome SCS.
* config/aarch64/aarch64.h (TARGET_SUPPORT_SHADOW_CALL_STACK):
New macro.
(TARGET_CHECK_SCS_RESERVED_REGISTER):   Likewise.
* config/aarch64/aarch64.md (scs_push): New template.
(scs_pop):  Likewise.
* defaults.h (TARGET_SUPPORT_SHADOW_CALL_STACK):New macro.
* doc/extend.texi:  Document -fsanitize=shadow-call-stack.
* doc/invoke.texi:  Document attribute.
* flag-types.h (enum sanitize_code):Add
SANITIZE_SHADOW_CALL_STACK.
* opts-global.c (handle_common_deferred_options):   Add SCS
compile option check.
* opts.c (finish_options):  Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shadow_call_stack_1.c: New test.
* gcc.target/aarch64/shadow_call_stack_2.c: New test.
* gcc.target/aarch64/shadow_call_stack_3.c: New test.
* gcc.target/aarch64/shadow_call_stack_4.c: New test.
---
 gcc/c-family/c-attribs.c  | 21 +
 gcc/config/aarch64/aarch64-protos.h   |  1 +
 gcc/config/aarch64/aarch64.c  | 27 +++
 gcc/config/aarch64/aarch64.h  | 11 +
 gcc/config/aarch64/aarch64.md | 18 
 gcc/defaults.h|  4 ++
 gcc/doc/extend.texi   |  7 +++
 gcc/doc/invoke.texi   | 29 
 gcc/flag-types.h  |  2 +
 gcc/opts-global.c |  6 +++
 gcc/opts.c| 12 +
 .../gcc.target/aarch64/shadow_call_stack_1.c  |  6 +++
 .../gcc.target/aarch64/shadow_call_stack_2.c  |  6 +++
 .../gcc.target/aarch64/shadow_call_stack_3.c  | 45 +++
 .../gcc.target/aarch64/shadow_call_stack_4.c  | 18 
 15 files changed, 213 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 007b928c54b..9b3a35c06bf 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -56,6 +56,8 @@ static tree handle_cold_attribute (tree *, tree, tree, int, 
bool *);
 static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_sanitize_address_attribute (tree *, tree, tree,
  int, bool *);
+static tree handle_no_sanitize_shadow_call_stack_attribute (tree *, tree,
+ tree, int, bool *);
 static tree handle_no_sanitize_thread_attribute (tree *, tree, tree,
 int, bool *);
 static tree handle_no_address_safety_analysis_attribute (tree *, tree, tree,
@@ -454,6 +456,10 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_no_sanitize_attribute, NULL },
   { "no_sanitize_address",0, 0, true, false, false, false,
  handle_no_sanitize_address_attribute, NULL },
+  { "no_sanitize_shadow_call_stack",
+ 0, 0, true, false, false, false,
+ handle_no_sanitize_shadow_call_stack_attribute,
+ NULL },
   { "no_sanitize_thread", 0, 0, true, false, false, false,
  handle_no_sanitize_thread_attribute, NULL },
   { "no_sanitize_undefined",  0, 0, true, false, false, false,
@@ -1175,6 +1181,21 @@ handle_no_sanitize_address_attribute (tree *node, tree 
name, tree

[RFC/RFT, V2 1/3] [PR102768] flag-types.h (enum sanitize_code): Extend sanitize_code to 64 bits to support more features

2023-03-25 Thread Dan Li via Gcc-patches
32-bit sanitize_code can no longer accommodate new options,
extending it to 64-bit.

Signed-off-by: Dan Li 

gcc/ChangeLog:

PR c/102768
* asan.h (sanitize_flags_p): Promote to uint64_t.
* common.opt: Likewise.
* dwarf2asm.cc (dw2_output_indirect_constant_1): Likewise.
* flag-types.h (enum sanitize_code): Likewise.
* opt-suggestions.cc (option_proposer::build_option_suggestions):
Likewise.
* opts.cc (find_sanitizer_argument): Likewise.
(report_conflicting_sanitizer_options): Likewise.
(get_closest_sanitizer_option): Likewise.
(parse_sanitizer_options): Likewise.
(parse_no_sanitize_attribute): Likewise.
* opts.h (parse_sanitizer_options): Likewise.
(parse_no_sanitize_attribute): Likewise.
* tree-cfg.cc (print_no_sanitize_attr_value): Likewise.

gcc/c-family/ChangeLog:

* c-attribs.cc (add_no_sanitize_value): Likewise.
(handle_no_sanitize_attribute): Likewise.
* c-common.h (add_no_sanitize_value): Likewise.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_declaration_or_fndef): Likewise.

gcc/cp/ChangeLog:

* typeck.cc (get_member_function_from_ptrfunc): Likewise.
---
 gcc/asan.h|  4 +--
 gcc/c-family/c-attribs.cc | 10 +++---
 gcc/c-family/c-common.h   |  2 +-
 gcc/c/c-parser.cc |  4 +--
 gcc/common.opt|  4 +--
 gcc/cp/typeck.cc  |  2 +-
 gcc/dwarf2asm.cc  |  2 +-
 gcc/flag-types.h  | 65 ---
 gcc/opt-suggestions.cc|  2 +-
 gcc/opts.cc   | 22 ++---
 gcc/opts.h|  8 ++---
 gcc/tree-cfg.cc   |  2 +-
 12 files changed, 64 insertions(+), 63 deletions(-)

diff --git a/gcc/asan.h b/gcc/asan.h
index d4ea49cb240..5b98172549b 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -233,9 +233,9 @@ asan_protect_stack_decl (tree decl)
remove all flags mentioned in "no_sanitize" of DECL_ATTRIBUTES.  */
 
 static inline bool
-sanitize_flags_p (unsigned int flag, const_tree fn = current_function_decl)
+sanitize_flags_p (uint64_t flag, const_tree fn = current_function_decl)
 {
-  unsigned int result_flags = flag_sanitize & flag;
+  uint64_t result_flags = flag_sanitize & flag;
   if (result_flags == 0)
 return false;
 
diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 111a33f405a..a73e2364525 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -1118,23 +1118,23 @@ handle_cold_attribute (tree *node, tree name, tree 
ARG_UNUSED (args),
 /* Add FLAGS for a function NODE to no_sanitize_flags in DECL_ATTRIBUTES.  */
 
 void
-add_no_sanitize_value (tree node, unsigned int flags)
+add_no_sanitize_value (tree node, uint64_t flags)
 {
   tree attr = lookup_attribute ("no_sanitize", DECL_ATTRIBUTES (node));
   if (attr)
 {
-  unsigned int old_value = tree_to_uhwi (TREE_VALUE (attr));
+  uint64_t old_value = tree_to_uhwi (TREE_VALUE (attr));
   flags |= old_value;
 
   if (flags == old_value)
return;
 
-  TREE_VALUE (attr) = build_int_cst (unsigned_type_node, flags);
+  TREE_VALUE (attr) = build_int_cst (long_long_unsigned_type_node, flags);
 }
   else
 DECL_ATTRIBUTES (node)
   = tree_cons (get_identifier ("no_sanitize"),
-  build_int_cst (unsigned_type_node, flags),
+  build_int_cst (long_long_unsigned_type_node, flags),
   DECL_ATTRIBUTES (node));
 }
 
@@ -1145,7 +1145,7 @@ static tree
 handle_no_sanitize_attribute (tree *node, tree name, tree args, int,
  bool *no_add_attrs)
 {
-  unsigned int flags = 0;
+  uint64_t flags = 0;
   *no_add_attrs = true;
   if (TREE_CODE (*node) != FUNCTION_DECL)
 {
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 52a85bfb783..eb91b9703db 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1500,7 +1500,7 @@ extern enum flt_eval_method
 excess_precision_mode_join (enum flt_eval_method, enum flt_eval_method);
 
 extern int c_flt_eval_method (bool ts18661_p);
-extern void add_no_sanitize_value (tree node, unsigned int flags);
+extern void add_no_sanitize_value (tree node, uint64_t flags);
 
 extern void maybe_add_include_fixit (rich_location *, const char *, bool);
 extern void maybe_suggest_missing_token_insertion (rich_location *richloc,
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index f679d53706a..9d55ea55fa6 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -2217,7 +2217,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
  start_init (NULL_TREE, asm_name, global_bindings_p (), 
&richloc);
  /* A parameter is initialized, which is invalid.  Don't
 attempt to instrument the initializer.  */
- int flag_sanitize_save = flag_sanitize;
+ 

[RFC/RFT, V2 3/3] [PR102768] aarch64: Add support for Kernel Control Flow Integrity

2023-03-25 Thread Dan Li via Gcc-patches
In the AArch64 platform, typeid can be directly inserted in front
of the function header (offset is patch_area_entry + 4), it should
be assumed that patch_area_entry is the same for all functions.

For all functions that will not be called indirectly, insert the
reserved RESERVED_CFI_TYPEID (0x0) as typeid in front of them. If
not, the attacker may use the instruction/data before the function
as typeid to bypass CFI.

All typeids ignore some bits (& AARCH64_UNALLOCATED_INSN_MASK) to
avoid conflicts with the AArch64 instruction set (see AAPCS64 for
details).

Signed-off-by: Dan Li 

gcc/ChangeLog:

* config/aarch64/aarch64.cc (RESERVED_CFI_TYPEID): Macro definition.
(DEFAULT_CFI_TYPEID): Likewise.
(AARCH64_UNALLOCATED_INSN_MASK): Likewise.
(aarch64_calc_func_cfi_typeid): Platform-dependent CFI function.
(cgraph_indirectly_callable): Determine whether a funtion may
be called indirectly.
(aarch64_output_func_kcfi_typeid): Platform-dependent CFI function.
(aarch64_output_icall_kcfi_check): Likewise.
(TARGET_HAVE_KCFI): New hook.
(TARGET_CALC_FUNC_CFI_TYPEID): Likewise.
(TARGET_ASM_OUTPUT_FUNC_KCFI_TYPEID): Likewise.
(TARGET_ASM_OUTPUT_ICALL_KCFI_CHECK): Likewise.
* doc/invoke.texi: Document -fsanitize=kcfi.
---
 gcc/config/aarch64/aarch64.cc | 166 ++
 gcc/doc/invoke.texi   |  36 
 2 files changed, 202 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 5c9e7791a12..5b55541d437 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -5450,6 +5450,160 @@ aarch64_output_sve_addvl_addpl (rtx offset)
   return buffer;
 }
 
+/* Reserved for all functions that cannot be called indirectly.  */
+#define RESERVED_CFI_TYPEID 0x0U
+
+/* If the typeid of a function that can be called indirectly is equal to
+   RESERVED_CFI_TYPEID, change it to DEFAULT_CFI_TYPEID.  */
+#define DEFAULT_CFI_TYPEID 0x0ADAU
+
+/* Mask of reserved and unallocated instructions in AArch64 platform.  */
+#define AARCH64_UNALLOCATED_INSN_MASK 0xE7FFU
+
+static unsigned int
+aarch64_calc_func_cfi_typeid (const_tree fntype)
+{
+  unsigned int hash;
+
+  /* The value of typeid has a probability of being the same as the encoding
+ of an instruction.  If the attacker can find the same encoding as the
+ typeid in the assembly code, then he has found a usable jump location.
+ So here, a platform-related mask is used when generating a typeid to
+ avoid such conflicts as much as possible.  */
+  hash = unified_type_hash (fntype) & AARCH64_UNALLOCATED_INSN_MASK;
+
+  /* RESERVED_CFI_TYPEID is reserved for functions that cannot
+ be called indirectly.  */
+  if (hash == RESERVED_CFI_TYPEID)
+hash = DEFAULT_CFI_TYPEID;
+
+  return hash;
+}
+
+static bool
+cgraph_indirectly_callable (struct cgraph_node *node,
+   void *data ATTRIBUTE_UNUSED)
+{
+  if (node->externally_visible || node->address_taken)
+return true;
+
+  return false;
+}
+
+static void
+aarch64_output_func_kcfi_typeid (FILE * stream, tree decl)
+{
+  struct cgraph_node *node;
+  unsigned int cur_func_typeid;
+
+  node = cgraph_node::get (decl);
+
+  if (!node->call_for_symbol_thunks_and_aliases (cgraph_indirectly_callable,
+NULL, true))
+/* CFI's typeid check always considers that there is a typeid before the
+   target function, so it is also necessary to output typeid for functions
+   that cannot be called indirectly to prevent attackers from bypassing
+   CFI by using instructions/data before those functions.
+   The typeid inserted before such a function is RESERVED_CFI_TYPEID,
+   and the calculation of the typeid must ensure that this value is always
+   reserved.  */
+cur_func_typeid = RESERVED_CFI_TYPEID;
+  else
+cur_func_typeid = aarch64_calc_func_cfi_typeid (TREE_TYPE (decl));
+
+  fprintf (stream, "__kcfi_%s:\n", get_name (decl));
+  fprintf (stream, "\t.4byte %#010x\n", cur_func_typeid);
+}
+
+/* This function outputs assembly instructions to check cfi typeid before
+   indirect call (blr Xn), which may destroy x16, x17, x9 registers (according
+   to the AAPCS64 specification, these registers do not need to be restored
+   after the function call).
+   The assembly code output by this function is as follows:
+   ldurw16, [x1, #-4]
+   movkw17, #13570
+   movkw17, #17309, lsl #16
+   cmp w16, w17
+   b.eq.Lkcfi8
+   brk #0x8221
+.Lkcfi8:
+   blr x1
+ */
+
+static void
+aarch64_output_icall_kcfi_check (rtx reg, unsigned int value)
+{
+  unsigned int addr_reg, scratch_reg1, scratch_reg2;
+  unsigned int esr, addr_index, type_index;
+  char label_buf[256];
+  const char *label_ptr;
+  unsigned HOST_WIDE_INT patch_area_entry =

[RFC/RFT,V2] CFI: Add support for gcc CFI in aarch64

2023-03-25 Thread Dan Li via Gcc-patches
Based on Sami's patch[1], this patch makes the corresponding kernel
configuration of CFI available when compiling the kernel with the gcc[2].

The code after enabling cfi (with -fsanitize=kcfi,
-fpatchable-function-entry=3,1) is as follows:

int (*p)(void);
int func (int)
{
p();
}

__kcfi_func:
.4byte 0x439d3502
.global func
.text
.LPFE1:
nop
.type   func, %function
func:
nop
nop
.LFB0:
..
adrpx0, p
add x0, x0, :lo12:p
ldr x0, [x0]
ldurw16, [x0, #-8]
movkw17, #23592
movkw17, #17921, lsl #16
cmp w16, w17
b.eq.Lkcfi2
brk #0x8220
.Lkcfi2:
blr x0
..

In the compiler part[4], most of the content is the same as Sami's
implementation[3], except for some minor differences, mainly including:

1. The function typeid is calculated differently and it is difficult
to be consistent.

2. A reserved typeid (such as 0x0U on the aarch64 platform) is always
inserted in front of functions that should not be called indirectly.
Functions that can be called indirectly will not use this hash value,
which prevents instructions/data before the function from being used
as a typeid by an attacker.

3. Some bits are ignored in the typeid to avoid conflicts between the
typeid and the instruction set of a specific platform, thereby preventing
an attacker from bypassing the CFI check by using the instruction as a
typeid, such as on the aarch64 platform:
* If the following instruction sequence exists:
400620:   a9be7bfdstp x29, x30, [sp, #-32]!
400624:   910003fdmov x29, sp
400628:   f9000bf3str x19, [sp, #16]
* If the expected typeid of the indirect call is exactly 0x910003fd,
  the attacker can jump to the next instruction position of any
  "mov x29,sp" instruction (such as 0x400628 here).

4. Insert a symbol __kcfi_ before each function's typeid,
which may be helpful for fine-grained KASLR implementations.

I'm not sure if these distinctions need to be removed, they're kept
in this version for now, and I'm happy to remove them in the next
version if necessary (except for the first one).

The current implementation of gcc only supports the aarch64 platform,
the x86 version is still in progress.

This produces the following oops on CFI failure (generated using lkdtm):

/ # echo CFI_FORWARD_PROTO > /sys/kernel/debug/provoke-crash/DIRECT
[   17.153364] lkdtm: Performing direct entry CFI_FORWARD_PROTO
[   17.153675] lkdtm: Calling matched prototype ...
[   17.154188] lkdtm: Calling mismatched prototype ...
[   17.154553] CFI failure at lkdtm_indirect_call+0x38/0x54 (target: 
lkdtm_increment_int+0x0/0x2c; expected type: 0xc59c68f1)
[   17.155627] Internal error: Oops - CFI:  [#1] PREEMPT SMP
[   17.156025] Modules linked in:
[   17.156580] CPU: 1 PID: 110 Comm: sh Not tainted 
6.1.0-1-g7ead10685f57-dirty #2
[   17.156975] Hardware name: linux,dummy-virt (DT)
[   17.157290] pstate: 8045 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   17.157621] pc : lkdtm_indirect_call+0x38/0x54
[   17.157902] lr : lkdtm_CFI_FORWARD_PROTO+0x48/0x7c
[   17.158142] sp : 8b083c80
[   17.158316] x29: 8b083c80 x28: 068ccc40 x27: 
[   17.158820] x26:  x25:  x24: 0bd069a0
[   17.159239] x23: 053493c0 x22: 8b083df0 x21: 0012
[   17.159673] x20: 8aa29150 x19: 06909000 x18: 8b035068
[   17.160110] x17: c59c68f1 x16: a2d40772 x15: 0006
[   17.160530] x14:  x13: 2e2e2e2065707974 x12: 6f746f7270206465
[   17.160947] x11: 686374616d73696d x10: 8a7254b8 x9 : 89268330
[   17.161376] x8 : efff x7 : 8a7254b8 x6 : 8000f000
[   17.161789] x5 : bff4 x4 :  x3 : 
[   17.162264] x2 :  x1 : 88a4e248 x0 : 8abca500
[   17.162865] Call trace:
[   17.163099]  lkdtm_indirect_call+0x38/0x54
[   17.163411]  lkdtm_CFI_FORWARD_PROTO+0x48/0x7c
[   17.163663]  lkdtm_do_action+0x48/0x5c
[   17.163876]  direct_entry+0x144/0x160
[   17.164084]  full_proxy_write+0x84/0xe4
[   17.164298]  vfs_write+0xe8/0x32c

The gcc-related patches[4] are based on tag: releases/gcc-12.2.0.
This patch is based on tag: v6.2.

Sorry for the long delay, the next version should not be that slow.
Any suggestion please let me know :).

Thanks,
Dan.

[1] 
https://lore.kernel.org/all/20220908215504.3686827-1-samitolva...@google.com/
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107048
[3] https://reviews.llvm.org/D119296
[4] https://lore.kernel.org/all/20230325081117.93245-1-ashimida.1...@gmail.com/

Signed-off-by: Dan Li 

---
RFC/RFT V2:
- The CFI typeid check is changed from the f

[RFC/RFT, V2 2/3] [PR102768] Support CFI: Add basic support for Kernel Control Flow Integrity

2023-03-25 Thread Dan Li via Gcc-patches
The KCFI sanitizer enabled with -fsanitize=kcfi implements a forward
edge control flow integrity scheme for indirect calls, similar to
-fsanitize=kcfi [1] in llvm.

At compile time, it appends a uniform type identifier before the
first instruction of each function and inserts check code before
each indirect call in a function with protection enabled.

At runtime, according to the code order, the check code for each
indirect call will be executed first, and it will:
1. Dynamically obtain the typeid before the callee function.
2. Compare it to the expected typeid of the current call site (caller).
3. If the two match, continue to execute the indirect call, if not,
an exception will be generated, and its user (usually the low-level
code such as the OS kernel) needs to support for the exception handling.

A typeid (type identifier) is a 32-bit constant on all platforms,
whose value depends on the function's prototype, and is invariant
across compilation units. However, different platforms may ignore
some of the bits to avoid conflicts with instructions.

If a program contains indirect calls to assembly functions, they
must be manually annotated with the expected type identifiers to
prevent errors. To make this easier, gcc generates a weak SHN_ABS
__kcfi_typeid_ symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one translation unit linked into the program
takes the function address. It should be noted that on different
platforms, the location of typeid insertion (the offset between
it and the function header) may be different, such as [1], and
this patch only implements the platform-independent part.

[1]: https://reviews.llvm.org/D119296

Signed-off-by: Dan Li 

gcc/ChangeLog:

PR c/102768
* cfgexpand.cc (expand_call_stmt): Add CFI_TYPEID reg note
for call insn.
(pass_expand::execute): Check whether enable KCFI for current
function according to the attribute.
* cgraphunit.cc (output_decl_kcfi_typeid_symbol): Output the
CFI typeid corresponding to each external declaration when necessary.
(output_decl_kcfi_typeid_symbols): Likewise.
* combine.cc (distribute_notes): Add REG_CALL_CFI_TYPEID.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: New hooks.
* emit-rtl.cc (try_split): Add REG_CALL_CFI_TYPEID.
* emit-rtl.h (struct rtl_data): Add is_kcfi_enabled.
* final.cc (final_scan_insn_1): Output kcfi check for indirect calls.
* flag-types.h (enum sanitize_code):
Add SANITIZE_CONTROL_FLOW_INTEGRITY.
* gimple.cc (gimple_call_set_cfi_typeid): New.
(gimple_build_call_1): Calculate cfi_typeid when gcall is created.
* gimple.h (struct GTY): Add new member cfi_typeid for gcall.
* opts.cc (parse_sanitizer_options): Add cfi and exclude
SANITIZE_CONTROL_FLOW_INTEGRITY.
* output.h (default_output_func_kcfi_typeid): Declare.
(default_output_icall_kcfi_check): Declare.
(default_calc_func_cfi_typeid): Declare.
* reg-notes.def (REG_NOTE): Add new REG_NOTE CALL_CFI_TYPEID.
* target.def: Add new hooks.
* toplev.cc (process_options): Add CFI compile option check.
* tree.cc (tree_node_sizes[): Add the unified tree type hash
calculation functions.
(append_unified_type_hash): Likewise.
(initialize_unified_tree_type_hash_table): Likewise.
(append_unified_type_name_hash): Likewise.
(append_unified_type_precision_hash): Likewise.
(append_unified_function_ret_and_args_hash): Likewise.
(unified_type_hash): Likewise.
(init_ttree): Likewise.
* tree.h (unified_type_hash): Declare.
* varasm.cc (assemble_start_function): Output the CFI typeid
of each function.
(default_output_func_kcfi_typeid): New.
(default_output_icall_kcfi_check): New.
(default_calc_func_cfi_typeid): New.
---
 gcc/cfgexpand.cc   |  26 
 gcc/cgraphunit.cc  |  34 +++
 gcc/combine.cc |   1 +
 gcc/doc/tm.texi|  27 +
 gcc/doc/tm.texi.in |   8 +++
 gcc/emit-rtl.cc|   1 +
 gcc/emit-rtl.h |   4 ++
 gcc/final.cc   |  24 +++-
 gcc/flag-types.h   |   2 +
 gcc/gimple.cc  |  11 
 gcc/gimple.h   |   5 +-
 gcc/opts.cc|   4 +-
 gcc/output.h   |   3 +
 gcc/reg-notes.def  |   1 +
 gcc/target.def |  38 
 gcc/toplev.cc  |   4 ++
 gcc/tree.cc| 144 +
 gcc/tree.h |   1 +
 gcc/varasm.cc  |  26 
 19 files changed, 361 insertions(+), 3 deletions(-)

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index d3cc77d2ca9..69c5fa30c7e 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -2845,6 +2845,18 @@ expand_call_stmt (gcall *stmt)
add_reg_note (last, REG_CALL_NOCF_CHECK, const0_rtx);
 }
 
+  if (flag_san

[RFC/RFT, V2 0/3] Add compiler support for Kernel Control Flow Integrity

2023-03-25 Thread Dan Li via Gcc-patches
This series of patches is mainly used to support the control flow
integrity protection of the linux kernel [1], which is similar to
-fsanitize=kcfi in clang 16.0 [2,3].

Any suggestion please let me know :).

Thanks, Dan.

[1] 
https://lore.kernel.org/all/20220908215504.3686827-1-samitolva...@google.com/
[2] https://clang.llvm.org/docs/ControlFlowIntegrity.html
[3] https://reviews.llvm.org/D119296

Signed-off-by: Dan Li 

---
Dan Li (3):
  [PR102768] flag-types.h (enum sanitize_code): Extend sanitize_code to
64 bits to support more features
  [PR102768] Support CFI: Add basic support for Kernel Control Flow
Integrity
  [PR102768] aarch64: Add support for Kernel Control Flow Integrity

 gcc/asan.h|   4 +-
 gcc/c-family/c-attribs.cc |  10 +-
 gcc/c-family/c-common.h   |   2 +-
 gcc/c/c-parser.cc |   4 +-
 gcc/cfgexpand.cc  |  26 ++
 gcc/cgraphunit.cc |  34 +++
 gcc/combine.cc|   1 +
 gcc/common.opt|   4 +-
 gcc/config/aarch64/aarch64.cc | 166 ++
 gcc/cp/typeck.cc  |   2 +-
 gcc/doc/invoke.texi   |  36 
 gcc/doc/tm.texi   |  27 ++
 gcc/doc/tm.texi.in|   8 ++
 gcc/dwarf2asm.cc  |   2 +-
 gcc/emit-rtl.cc   |   1 +
 gcc/emit-rtl.h|   4 +
 gcc/final.cc  |  24 -
 gcc/flag-types.h  |  67 +++---
 gcc/gimple.cc |  11 +++
 gcc/gimple.h  |   5 +-
 gcc/opt-suggestions.cc|   2 +-
 gcc/opts.cc   |  26 +++---
 gcc/opts.h|   8 +-
 gcc/output.h  |   3 +
 gcc/reg-notes.def |   1 +
 gcc/target.def|  38 
 gcc/toplev.cc |   4 +
 gcc/tree-cfg.cc   |   2 +-
 gcc/tree.cc   | 144 +
 gcc/tree.h|   1 +
 gcc/varasm.cc |  26 ++
 31 files changed, 627 insertions(+), 66 deletions(-)

-- 
2.17.1



Re: [RFC/RFT,V2] CFI: Add support for gcc CFI in aarch64

2023-04-05 Thread Dan Li via Gcc-patches
On 03/27, Peter Zijlstra wrote:
> On Sat, Mar 25, 2023 at 01:54:16AM -0700, Dan Li wrote:
> 
> > In the compiler part[4], most of the content is the same as Sami's
> > implementation[3], except for some minor differences, mainly including:
> > 
> > 1. The function typeid is calculated differently and it is difficult
> > to be consistent.
> 
> This means there is an effective ABI break between the compilers, which
> is sad :-( Is there really nothing to be done about this?

Hi Peter,

I'm not so sure right now. According to Sami's tip, I need to learn more about
related patches. I'll make it ABI compatible in the next version if possible :)

Thanks,
Dan


Re: [RFC/RFT,V2] CFI: Add support for gcc CFI in aarch64

2023-04-05 Thread Dan Li via Gcc-patches
On 03/27, Sami Tolvanen wrote:
> On Mon, Mar 27, 2023 at 2:30 AM Peter Zijlstra  wrote:
> >
> > On Sat, Mar 25, 2023 at 01:54:16AM -0700, Dan Li wrote:
> >
> > > In the compiler part[4], most of the content is the same as Sami's
> > > implementation[3], except for some minor differences, mainly including:
> > >
> > > 1. The function typeid is calculated differently and it is difficult
> > > to be consistent.
> >
> > This means there is an effective ABI break between the compilers, which
> > is sad :-( Is there really nothing to be done about this?
> 
> I agree, this would be unfortunate, and would also be a compatibility
> issue with rustc where there's ongoing work to support
> clang-compatible CFI type hashes:
> 
> https://github.com/rust-lang/rust/pull/105452
> 
> Sami

Hi Sami,

Thanks for the info, I need to learn about it :)
Is there anything else that needs to be improved?

Thanks,
Dan