from:"scott constable via cfe\-commits"

Fix for Paths with Whitespace

2015-12-07 Thread scott constable via cfe-commits

Dear Reviewer,

The current doxygen.cfg.in will produce paths that cannot be understood by
Doxygen/Graphviz, when any generated paths contain white space. This patch
simply adds quotations to all generated paths so that white space cannot be
misinterpreted by Doxygen/Graphviz.

~Scott Constable


doxygen.cfg.in
Description: Binary data
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-25 Thread Scott Constable via cfe-commits



@@ -208,10 +209,34 @@ void llvm::setKCFIType(Module &M, Function &F, StringRef 
MangledType) {
   std::string Type = MangledType.str();
   if (M.getModuleFlag("cfi-normalize-integers"))
 Type += ".normalized";
+
+  uint32_t OutHash = static_cast(llvm::xxHash64(Type));
+  auto T = Triple(Twine(M.getTargetTriple()));
+  if (T.isX86() && T.isArch64Bit() && T.isOSLinux()) {
+// Estimate the function's arity (i.e., the number of arguments) at the ABI
+// level by counting the number of parameters that are likely to be passed
+// as registers, such as pointers and 64-bit (or smaller) integers. The
+// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+// Additional parameters or parameters larger than 64 bits may be passed on
+// the stack, in which case the arity is denoted as 7.
+size_t NumParams = F.arg_size();
+bool MayHaveStackArgs = NumParams > 6;
+
+for (unsigned int i = 0; !MayHaveStackArgs && i < NumParams; ++i) {
+  const llvm::Type *PT = F.getArg(i)->getType();
+  if (!(PT->isPointerTy() || PT->getIntegerBitWidth() <= 64))

scottconstable wrote:

Hi @phoebewang, KCFI only computes hashes for indirect calls, not direct ones. 
The example you cited uses `CallBase::getCalledFunction()`, whose documentation 
reads "Returns the function called, or null if this is an indirect function 
invocation or the function signature does not match the call signature."

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-27 Thread Scott Constable via cfe-commits


scottconstable wrote:

This PR is intended to improve security for X86 kernels, as the title suggests. 
I think that we can and should incorporate feedback from the other 
participants, several of whom have recommended that this new behavior should be 
explicitly enabled with a flag, rather than implicitly enabled by examining the 
target triple. So, for example, I could update the PR to introduce a new flag 
such as `-fsanitize-cfi-icall-experimental-arity-x86`. If the community would 
like to adopt a similar approach that is tuned to benefit Arm64 kernels, then 
that approach could be enabled with a different flag.

Flag-based enabling would also help to address the concerns about compiler 
compatibility. For instance, if an x86 kernel is being built with both C and 
Rust code, then Kconfig could check whether the local clang and rustc compilers 
support the KCFI arity enhancement; if one or the other does not support KCFI 
arity then Kconfig would fall back to the default 32-bit hash.

@rcvalle I don't see any evidence that this approach will require a plethora of 
implementations for different arch/ABI combinations. As far as I am aware, the 
vast majority of x86-64 Linux kernel code (including eBPF code) uses the 
standard ABI. Even if some kernel code does not conform to the standard ABI, 
this should not break compatibility with the arity enhancement because the 
arity enhancement's implementation derives the arity tag from the function 
type, not from the ABI. Hence, a valid function call that uses a non-standard 
ABI would still experience a matching KCFI type ID at both the call site and 
the call target. The downside is that call sites/targets with a non-standard 
ABI might be tagged with an inaccurate arity, which could matter if the 
function type's 29-bit hash collides with another function type's 29-bit hash, 
and that other function's arity tag is the same but the two functions' actual 
arities differ. IMO this seems very unlikely.

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-27 Thread Scott Constable via cfe-commits


https://github.com/scottconstable edited 
https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-27 Thread Scott Constable via cfe-commits


https://github.com/scottconstable edited 
https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-27 Thread Scott Constable via cfe-commits

scottconstable wrote:

> This is not a Rust concern, but re-reading the initial post, it _looks_ like 
> your own statistics suggest that consuming 3 bits for arity costs more than 
> it buys you. As stated, (didn't check your math, just going off what you 
> said) prior to your change, we expect 0.01383765 collisions in your sample 
> environment.

If you had checked my math, you would have noticed that the values in the 
right-hand column were not exactly correct! I found and fixed a but in my Excel 
code and updated that chart, but the numbers didn't change by much.

> After your change, we expect to have the _sum_ of your right hand column in 
> collisions, which comes out to 0.0266774 - nearly double the rate of 
> collisions we have with the basic implementation. In fact, I think that any 
> scheme like this will always going to increase the number of overall 
> collisions, given that the arity is implicitly hashed into the representation 
> already.

Your analysis is correct that the total number of expected collisions would 
increase from 0.01375771 to 0.02660650 (with the updated statistics).

> The main reason I could see to consider this is if for some reason a 
> cross-arity collision is more dangerous than a same-arity collision in terms 
> of exploitability, which I can't immediately argue, but perhaps you have 
> something for this that was just assumed in the initial post?

I think that a cross-arity collision is clearly more dangerous than a 
same-arity collision. This is what I had written in the PR description:

> > The current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr 
> > from calling a 3-arity target if the kCFI hashes collide. If this were to 
> > happen, then potentially malicious stale/dead data in RDX at the call site 
> > could suddenly become live as the third parameter at the call target.

Including the arity in the hash does prevent or even reduce the likelihood that 
a cross-arity collision will occur, if the hash function is uniform. Suppose 
that two function types' hashes collide (under the current 32-bit hash scheme). 
The probability that the arities differ is equal to 1 minus the probability 
that the arities agree:

1 - ((32/10903)*(31/10902) + (2492/10903)*(2491/10902) + ... + 
(148/10903)*(147/10902)) = 0.75901

Thus, if there is a hash collision, there is roughly a 76% chance that it will 
be a cross-arity collision.

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-21 Thread Scott Constable via cfe-commits


scottconstable wrote:

@phoebewang @lvwr Please review this PR.

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-20 Thread Scott Constable via cfe-commits


https://github.com/scottconstable created 
https://github.com/llvm/llvm-project/pull/117121

Kernel Control Flow Integrity (kCFI) is a feature that hardens indirect calls 
by comparing a 32-bit hash of the function pointer's type against a hash of the 
target function's type. If the hashes do not match, the kernel may panic (or 
log the hash check failure, depending on the kernel's configuration). These 
hashes are computed at compile time by applying the xxHash64 algorithm to each 
mangled canonical function (or function pointer) type, then truncating the 
result to 32 bits.

Like any hashing scheme, hash collisions are possible. For example, a commodity 
Linux kernel configured for Ubuntu 24.04 server has 141,617 total indirect call 
targets, with 10,903 unique function types. With a 32-bit kCFI hash, the 
expected number of collisions is 10,903-2^32+2^32*(1-1/(2^32))^10,903 = 
0.01383765 (see https://courses.cs.duke.edu/cps102/spring09/Lectures/L-18.pdf 
for the formula). This number can balloon with the addition of drivers and 
kernel modules.

This patch reduces both the expected number of collisions and the potential 
impact of a collision by augmenting the hash with an arity value that indicates 
how many parameters the function has at the ABI level. Specifically, the patch 
further truncates the kCFI hash down to 29 bits, then concatenates a 3-bit 
arity indicator as follows:

| Arity Indicator | Description |
| --- | --- |
| 0 | 0 parameters |
| 1 | 1 parameter in RDI |
| 2 | 2 parameters in RDI and RSI |
| 3 | 3 parameters in RDI, RSI, and RDX |
| 4 | 4 parameters in RDI, RSI, RDX, and RCX |
| 5 | 5 parameters in RDI, RSI, RDX, RCX, and R8 |
| 6 | 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 |
| 7 | At least one parameter may be passed on the stack |

This scheme enhances security in two ways. First, it prevents a j-arity 
function pointer from being used to call a k-arity function, unless j=k. The 
current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr from 
calling a 3-arity target if the kCFI hashes collide. If this were to happen, 
then potentially malicious stale/dead data in RDX at the call site could 
suddenly become live as the third parameter at the call target.

Second, this scheme reduces the expected number of hash collisions within each 
arity, compared against the expected number of collisions (0.01383765) for the 
32-bit hashing scheme that includes all arities. The table below shows the 
expected number of collisions for each arity, given the number of unique 
indirect callable function types within that arity in the same Ubuntu 24.04 
server kernel discussed above.

| Arity | Unique Indirect Callable Function Types | Number of Expected 
Collisions |
| - | --- | - |
| 0 | 32 |  0.0092 |
| 1 | 2492 |  0.00578125 |
| 2 | 3775 | 0.01326841 |
| 3 | 2547 | 0.00603931 |
| 4 | 1169 | 0.00127162 |
| 5 | 519 | 0.00025038 |
| 6 | 221 | 0.4528 |
| 7 | 148 | 0.2026 |

One additional benefit of this patch is that it can benefit other CFI 
approaches that build on kCFI, such as FineIBT. For example, this proposed 
enhancement to FineIBT must be able to infer (at kernel init time) which 
registers are live at an indirect call target: 
https://lkml.org/lkml/2024/9/27/982. If the arity bits are available in the 
kCFI type ID, then this information is trivial to infer.

>From 5b144089b7cde125e93a6e147009cbab0983d71a Mon Sep 17 00:00:00 2001
From: Scott D Constable 
Date: Tue, 19 Nov 2024 15:51:05 -0800
Subject: [PATCH] Enhance KCFI type IDs with a 3-bit arity indicator.

---
 clang/lib/CodeGen/CodeGenModule.cpp | 31 ++---
 clang/test/CodeGen/kcfi-normalize.c | 18 +++--
 clang/test/CodeGen/kcfi.c   | 22 +---
 3 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index b854eeb62a80ce..7cc6f120ec39a9 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2183,7 +2183,8 @@ llvm::ConstantInt 
*CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) {
 }
 
 llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) {
-  if (auto *FnType = T->getAs())
+  auto *FnType = T->getAs();
+  if (FnType)
 T = getContext().getFunctionType(
 FnType->getReturnType(), FnType->getParamTypes(),
 FnType->getExtProtoInfo().withExceptionSpec(EST_None));
@@ -2196,8 +2197,32 @@ llvm::ConstantInt 
*CodeGenModule::CreateKCFITypeId(QualType T) {
   if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers)
 Out << ".normalized";
 
-  return llvm::ConstantInt::get(Int32Ty,
-
static_cast(llvm::xxHash64(OutName)));
+  uint32_t OutHash = static_cast(llvm::xxHash64(OutName));
+  const auto &Triple = getTarget().getTriple();
+  if (Triple.isX86() && Triple.isArch64Bit() && Triple

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-26 Thread Scott Constable via cfe-commits

scottconstable wrote:

> Sorry, what I meant was we should be looking only at the number of return 
> (with void being zero, and everything else being one) and parameters from the 
> function declaration/definition and shouldn't be looking at the parameters' 
> type layout (i.e., size), and how and what registers are used to pass them 
> (i.e., we shouldn't mention or reference registers or assembly code at all in 
> the implementation and documentation). The implementation should be 
> completely ABI and architecture agnostic. Does it make sense?

I think that the aspiration to be "completely ABI and architecture agnostic" is 
irreconcilable with the security objective to prevent a register that is dead 
at a call site from becoming live at a hash-collided call target.

Your point about returns is interesting. AFAIK most x86 Linux kernels 
(including Ubuntu kernels) are being compiled with 
`-fzero-call-used-regs=used-gpr`, which in most cases should prevent a dead 
return register (`RAX`) from becoming live at a call site when the call site 
expects a return value. I can think of one corner case, but I'm not sure how 
much of a concern it would be in practice. Suppose that `int (*)(int)` collides 
with `void (*)(int)` and this allows a function `f` of the former type to call 
a function `g` of the latter type. Suppose further that `RAX` dies when `f` 
calls `g` and `g` does not touch `RAX` at all, which means that 
`-fzero-call-used-regs=used-gpr` will have no effect on `RAX` in `g`. Then when 
`g` returns to `f`, the stale value that `f` had left in `RAX` will become live 
and might be used for something malicious.

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-26 Thread Scott Constable via cfe-commits

scottconstable wrote:

> > Flag guarding this feature seems like it would also be good for any 
> > existing C users - for example, if trying to build a kernel module intended 
> > to load against a kernel image built with an older `clang`, you need to 
> > select the same type ID projection that the kernel did.
> 
> +1 to this. Adding support for this to the Rust compiler shouldn't be a 
> problem and @maurer or I could take a look at it. However, I wonder if the 
> arity information should be from the high level information (i.e., from the 
> function declarations/definitions) instead of from the lower level calling 
> convention used/code generated (and possibly also affected by/after 
> optimizations). This would ensure compatibility with any calling convention 
> and any other possible differences that may come up later when using 
> cross-language KCFI. (The KCFI encoding is based on high level type 
> information from the function declarations/definitions.)

The implementation in this PR does use high-level type information (but now 
that I look back at what I had written, I think my original PR description was 
a bit mis-leading). Here is an extended version of the arity table, with an 
additional column to show what the implementation is intended to do:

| Arity Indicator | Description | Implementation |
| --- | --- | --- |
| 0 | 0 parameters | 0 parameters |
| 1 | 1 parameter in RDI | 1 address-width (or smaller) parameter and no other 
parameters |
| 2 | 2 parameters in RDI and RSI | 2 address-width (or smaller) parameters and 
no other parameters |
| 3 | 3 parameters in RDI, RSI, and RDX | 3 address-width (or smaller) 
parameters and no other parameters |
| 4 | 4 parameters in RDI, RSI, RDX, and RCX | 4 address-width (or smaller) 
parameters and no other parameters |
| 5 | 5 parameters in RDI, RSI, RDX, RCX, and R8 | 5 address-width (or smaller) 
parameters and no other parameters |
| 6 | 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 | 6 address-width (or 
smaller) parameters and no other parameters |
| 7 | At least one parameter may be passed on the stack | The function type 
does not qualify as arity 0-6 |

Hence, this implementation uses high-level type information from clang 
(`CodeGenModule.cpp`) or LLVM (`ModuleUtils.cpp`) to infer a better 
approximation of the default x86-64 calling convention. For example, if the 
implementation were to instead use the number of parameters as a proxy for the 
arity, then this could permit a scenario where a register that is dead at the 
call site becomes live at the call target, e.g.:
```C
// test.c
struct S {
int *p1;
int *p2;
};

int foo(struct S s) {  // 1 parameter
return *s.p1 + *s.p2;
}
```
But then the struct gets decomposed into two separate registers by the compiler:
```
clang -O1 test.c -S -o test.S; cat test.S
.type   foo,@function
foo:# @foo
.cfi_startproc
# %bb.0:# %entry
movl(%rsi), %eax
addl(%rdi), %eax
retq
```
Thus, if an arity-1 function type's hash collides with `foo`'s hash, then a 
dead `RSI` register at an arity-1 caller could become live at the target `foo`.

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-23 Thread Scott Constable via cfe-commits

scottconstable wrote:

> > > > @phoebewang and @lvwr I also noticed that there is this code in LLVM:
> > > > https://github.com/llvm/llvm-project/blob/9ba6672b9f0e82a1f6d4100dc832c84447ea545c/llvm/lib/Transforms/Utils/ModuleUtils.cpp#L202-L214
> > > > 
> > > > . As far as I can tell, this code is not triggered when I build the 
> > > > Linux kernel with `-fsanitize=kcfi`.
> > > > When is this code triggered? And do you think it is necessary to 
> > > > additionally implement the arity-enhancement to this code?
> > > 
> > > 
> > > I'm not familar with KCFI. I find it's added by @samitolvanen in 
> > > [e1c36bd](https://github.com/llvm/llvm-project/commit/e1c36bde0551977d4b2efae032af6dfc4b2b3936).
> > >  I think you should triger it with attached test case.
> > 
> > 
> > It looks to me like this code might be triggered in some LTO 
> > configurations, and/or when linking code compiled from multiple source 
> > languages with the expectation that the KCFI type IDs will be compatible. 
> > Is my understanding correct?
> 
> Looks like the latter, see 
> [71c7313](https://github.com/llvm/llvm-project/commit/71c7313f42d2b6063fea09854cf4fc46fd0627e1)

Actually, I think this code was introduced to address a compatibility issue 
with KASAN, which apparently must generate KCFI-enabled code without clang. I 
found this explanation at 
https://github.com/llvm/llvm-project/commit/3b14862f0a968dc079530acbce4f2ca4aa7c1492
 and https://github.com/ClangBuiltLinux/linux/issues/1742.

Regardless, it looks like `llvm::setKCFIType` is intended to always produce the 
same KCFI type ID as `CodeGenModule::CreateKCFITypeId` for equivalent function 
types. For this PR, this implies that `llvm::setKCFIType` and 
`CodeGenModule::CreateKCFITypeId` must always infer the same arity for the same 
function type.

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-23 Thread Scott Constable via cfe-commits



@@ -208,10 +209,34 @@ void llvm::setKCFIType(Module &M, Function &F, StringRef 
MangledType) {
   std::string Type = MangledType.str();
   if (M.getModuleFlag("cfi-normalize-integers"))
 Type += ".normalized";
+
+  uint32_t OutHash = static_cast(llvm::xxHash64(Type));
+  auto T = Triple(Twine(M.getTargetTriple()));
+  if (T.isX86() && T.isArch64Bit() && T.isOSLinux()) {
+// Estimate the function's arity (i.e., the number of arguments) at the ABI
+// level by counting the number of parameters that are likely to be passed
+// as registers, such as pointers and 64-bit (or smaller) integers. The
+// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+// Additional parameters or parameters larger than 64 bits may be passed on
+// the stack, in which case the arity is denoted as 7.
+size_t NumParams = F.arg_size();
+bool MayHaveStackArgs = NumParams > 6;
+
+for (unsigned int i = 0; !MayHaveStackArgs && i < NumParams; ++i) {
+  const llvm::Type *PT = F.getArg(i)->getType();
+  if (!(PT->isPointerTy() || PT->getIntegerBitWidth() <= 64))

scottconstable wrote:

It appears that clang does not reserve stack for large arguments and instead 
this is done later by the LLVM X86 backend. For example:
```C
struct S {
int *p1;
int *p2;
int array[8];
};

int foo(struct S s, struct S *sp) {
return *s.p1 + *s.p2 + *sp->p1 + *sp->p2;
}
```
Then when I compile to LLVM IR I see:
```
define dso_local i32 @foo(ptr noundef byval(%struct.S) align 8 %s, ptr noundef 
%sp) #0 {
```
Which suggests an arity of 2. But the X86 backend transforms `foo` to pass `s` 
on the stack, and then `sp` becomes the sole argument and is passed in `rdi`. 
Hence, by the chart in the PR description, this should be treated as an arity-7 
function:

| Arity Indicator | Description |
| --- | --- |
| 0 | 0 parameters |
| 1 | 1 parameter in RDI |
| 2 | 2 parameters in RDI and RSI |
| 3 | 3 parameters in RDI, RSI, and RDX |
| 4 | 4 parameters in RDI, RSI, RDX, and RCX |
| 5 | 5 parameters in RDI, RSI, RDX, RCX, and R8 |
| 6 | 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 |
| 7 | At least one parameter may be passed on the stack |

This predicate:
```
  // typeof(*PT) = llvm::Type
  if (!(PT->isPointerTy() || PT->getIntegerBitWidth() <= 64))
MayHaveStackArgs = true;
```
should prevent `s` from being counted as a register argument and correctly set 
the arity field to 7.

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-23 Thread Scott Constable via cfe-commits


https://github.com/scottconstable updated 
https://github.com/llvm/llvm-project/pull/117121

>From b787d6d7a9c0904c5e47e32556103b8a5220a7d1 Mon Sep 17 00:00:00 2001
From: Scott D Constable 
Date: Tue, 19 Nov 2024 15:51:05 -0800
Subject: [PATCH] Enhance KCFI type IDs with a 3-bit arity indicator.

---
 clang/lib/CodeGen/CodeGenModule.cpp   | 31 +--
 clang/test/CodeGen/kcfi-normalize.c   | 18 +++
 clang/test/CodeGen/kcfi.c | 22 +++--
 llvm/lib/Transforms/Utils/ModuleUtils.cpp | 31 +--
 .../GCOVProfiling/kcfi-normalize.ll   |  7 +++--
 llvm/test/Transforms/GCOVProfiling/kcfi.ll|  7 +++--
 6 files changed, 95 insertions(+), 21 deletions(-)

diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index b854eeb62a80ce..f9c01edf4f0953 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2183,7 +2183,8 @@ llvm::ConstantInt 
*CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) {
 }
 
 llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) {
-  if (auto *FnType = T->getAs())
+  auto *FnType = T->getAs();
+  if (FnType)
 T = getContext().getFunctionType(
 FnType->getReturnType(), FnType->getParamTypes(),
 FnType->getExtProtoInfo().withExceptionSpec(EST_None));
@@ -2196,8 +2197,32 @@ llvm::ConstantInt 
*CodeGenModule::CreateKCFITypeId(QualType T) {
   if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers)
 Out << ".normalized";
 
-  return llvm::ConstantInt::get(Int32Ty,
-
static_cast(llvm::xxHash64(OutName)));
+  uint32_t OutHash = static_cast(llvm::xxHash64(OutName));
+  const auto &Triple = getTarget().getTriple();
+  if (FnType && Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux()) {
+// Estimate the function's arity (i.e., the number of arguments) at the ABI
+// level by counting the number of parameters that are likely to be passed
+// as registers, such as pointers and 64-bit (or smaller) integers. The
+// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+// Additional parameters or parameters larger than 64 bits may be passed on
+// the stack, in which case the arity is denoted as 7.
+bool MayHaveStackArgs = FnType->getNumParams() > 6;
+
+for (unsigned int i = 0; !MayHaveStackArgs && i < FnType->getNumParams();
+ ++i) {
+  const Type *PT = FnType->getParamType(i).getTypePtr();
+  if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() &&
+getContext().getTypeSize(PT) <= 64)))
+MayHaveStackArgs = true;
+}
+
+// The 3-bit arity is concatenated with the lower 29 bits of the KCFI hash
+// to form an enhanced KCFI type ID. This can prevent, for example, a
+// 3-arity function's ID from ever colliding with a 2-arity function's ID.
+OutHash = (OutHash << 3) | (MayHaveStackArgs ? 7 : FnType->getNumParams());
+  }
+
+  return llvm::ConstantInt::get(Int32Ty, OutHash);
 }
 
 void CodeGenModule::SetLLVMFunctionAttributes(GlobalDecl GD,
diff --git a/clang/test/CodeGen/kcfi-normalize.c 
b/clang/test/CodeGen/kcfi-normalize.c
index b9150e88f6ab5f..8b7445fc85e490 100644
--- a/clang/test/CodeGen/kcfi-normalize.c
+++ b/clang/test/CodeGen/kcfi-normalize.c
@@ -10,25 +10,31 @@
 void foo(void (*fn)(int), int arg) {
 // CHECK-LABEL: define{{.*}}foo
 // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE1:[0-9]+]]
-// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 1162514891) ]
+// KCFI ID = 0x2A548E59
+// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 710184537) ]
 fn(arg);
 }
 
 void bar(void (*fn)(int, int), int arg1, int arg2) {
 // CHECK-LABEL: define{{.*}}bar
 // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE2:[0-9]+]]
-// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 
448046469) ]
+// KCFI ID = 0xD5A52C2A
+// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 
-710595542) ]
 fn(arg1, arg2);
 }
 
 void baz(void (*fn)(int, int, int), int arg1, int arg2, int arg3) {
 // CHECK-LABEL: define{{.*}}baz
 // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE3:[0-9]+]]
-// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef 
%3){{.*}}[ "kcfi"(i32 -2049681433) ]
+// KCFI ID = 0x2EA2BF3B
+// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef 
%3){{.*}}[ "kcfi"(i32 782417723) ]
 fn(arg1, arg2, arg3);
 }
 
 // CHECK: ![[#]] = !{i32 4, !"cfi-normalize-integers", i32 1}
-// CHECK: ![[TYPE1]] = !{i32 -1143117868}
-// CHECK: ![[TYPE2]] = !{i32 -460921415}
-// CHECK: ![[TYPE3]] = !{i32 -333839615}
+// KCFI ID = DEEB3EA2
+// CHECK: ![[TYPE1]] = !{i32 -555008350}
+// KCFI ID = 24372DCB
+// CHECK: ![[TYPE2]] = !{i32 607595979}
+// KCFI ID = 0x60D0180C
+// CHECK: ![[TYPE3]] = !{i32 1624250380}
diff --git a/clang/test/CodeGen/kcfi.c b/clang/test/CodeGen

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-23 Thread Scott Constable via cfe-commits



@@ -208,10 +209,34 @@ void llvm::setKCFIType(Module &M, Function &F, StringRef 
MangledType) {
   std::string Type = MangledType.str();
   if (M.getModuleFlag("cfi-normalize-integers"))
 Type += ".normalized";
+
+  uint32_t OutHash = static_cast(llvm::xxHash64(Type));
+  auto T = Triple(Twine(M.getTargetTriple()));

scottconstable wrote:

Thank you for the suggestion! This does look tidier than what I had written. I 
updated the PR.

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-24 Thread Scott Constable via cfe-commits



@@ -208,10 +209,34 @@ void llvm::setKCFIType(Module &M, Function &F, StringRef 
MangledType) {
   std::string Type = MangledType.str();
   if (M.getModuleFlag("cfi-normalize-integers"))
 Type += ".normalized";
+
+  uint32_t OutHash = static_cast(llvm::xxHash64(Type));
+  auto T = Triple(Twine(M.getTargetTriple()));
+  if (T.isX86() && T.isArch64Bit() && T.isOSLinux()) {
+// Estimate the function's arity (i.e., the number of arguments) at the ABI
+// level by counting the number of parameters that are likely to be passed
+// as registers, such as pointers and 64-bit (or smaller) integers. The
+// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+// Additional parameters or parameters larger than 64 bits may be passed on
+// the stack, in which case the arity is denoted as 7.
+size_t NumParams = F.arg_size();
+bool MayHaveStackArgs = NumParams > 6;
+
+for (unsigned int i = 0; !MayHaveStackArgs && i < NumParams; ++i) {
+  const llvm::Type *PT = F.getArg(i)->getType();
+  if (!(PT->isPointerTy() || PT->getIntegerBitWidth() <= 64))

scottconstable wrote:

Thank you for the suggestion. I looked at 
`llvm::Argument::hasPassPointeeByValueCopyAttr()`, but it looks like it is only 
available where a function is being defined. It does not appear to be available 
where a call is made through a function pointer. Therefore, I'm not sure that 
`llvm::Argument::hasPassPointeeByValueCopyAttr()` will be helpful since KCFI 
requires the ID to be computed identically at both the call site and the call 
target.

Or, do you think I am overlooking something, and that there is a way to use 
`llvm::Argument::hasPassPointeeByValueCopyAttr()` or something similar at an 
indirect call site? As far as I can tell, the only information that is 
available at an indirect call site is the function pointer type, which does 
contain the number of arguments and their types, but does not appear to contain 
an indication as to whether an argument may be passed on the stack.

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-24 Thread Scott Constable via cfe-commits

scottconstable wrote:

> Although the default calling convention uses 6 registers, others like RegCall 
> uses more. Do you want to check calling convention as well?

AFAIK the use case for KCFI is very narrow: the x86-64 Linux kernel. And I 
don't believe that the kernel uses (or even allows?) any calling convention 
other than the default. The kernel documentation also says that the eBPF 
calling convention "maps directly to ABIs used by the kernel on 64-bit 
architectures." But I admit I am not an expert on the Linux ABI nor am I an 
expert on the full scope of KCFI use cases.

Maybe @lvwr can weigh in?

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-12-04 Thread Scott Constable via cfe-commits

scottconstable wrote:

> I think Scott's second point may be the relevant one - this may be similar 
> strength or slightly weaker, but having an indicator stating which registers 
> are live is potentially needed to allow FineIBT to poison them during 
> speculative execution.
> 
> Given how much padding is used in the X86 kernel around function headers, 
> have you considered just stealing another byte to encode the data for your 
> arity stuff, and considering it to be a separate mitigation from KCFI tags?

Yes, and this is what our first prototype does. There is a brief writeup here: 
https://lkml.org/lkml/2024/9/27/982.

When I started to refine the prototype into a PR, I noticed that stealing 
another byte in the padding region causes problems for either LLVM or for 
Linux. This is because LLVM emits function headers like this:
```
NOP
NOP
...
MOV $01234567, %eax
:
```
But when the Linux image is loaded, it moves the type information to the 
*beginning* of the padding, to allow the pre-function padding to be used for 
hot patching:
```
MOV $01234567, %eax
...
NOP
NOP
:
```
Thus, when new metadata is added to the padding region, it introduces new 
challenges for either LLVM or for Linux:
- If the metadata is added *before* the hash, then Linux needs to know that it 
might need to look for the hash in a different location, i.e., in a different 
offset relative to the start of the function.
- If the metadata is added *after* the hash, then LLVM's KCFI code generator 
needs to correctly adjust the offsets for the call site KCFI checks, depending 
on whether the new metadata is being generated.

The discussions on LKML also led us to simplify the proposal, which originally 
had called for an 8-bit field to denote which registers are pointers, so that 
only pointers would be poisoned after a misprediction. The new proposal poisons 
all live registers and therefore requires a 3-bit field to record the arity 
(defined as the number of live registers). It occurred to me that arity might 
also be useful to harden the hash checks for KCFI, but that may require a 
consensus among KCFI stakeholders.

One other alternative could be to use the 3-bit reg field in the MOV to encode 
the arity. For example:
```
MOV $01234567, %eax# arity 0
MOV $01234567, %ecx# arity 1
MOV $01234567, %edx# arity 2
MOV $01234567, %ebx# arity 3
MOV $01234567, %esp# arity 4
MOV $01234567, %ebp# arity 5
MOV $01234567, %esi# arity 6
MOV $01234567, %edi# arity 7+
```

I believe that this alternative implementation would also meet the requirements 
for the FineIBT register poisoning enhancement.

> The rest of the CFI type ID scheme is arch-independent, but in order to know 
> which _registers_ are in use, you need arch dependent information, because 
> you care about the calling convention, packing rules, etc. This is part of 
> why Ramon thought your design choices were odd above - this isn't really a 
> CFI enhancement or modification, this is another piece of information you 
> need for speculation defenses that occur at a different abstraction level.

I agree with your interpretation. And thank you and @rcvalle again for the 
ongoing feedback and discussion.

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-12-04 Thread Scott Constable via cfe-commits

scottconstable wrote:

> If you are changing the hash scheme, please migrate from xxHash64 to xxh3. We 
> want to remove the legacy xxHash64 from LLVM.

It seems from the discussion above that this change would require a new flag. 
Suppose, for instance, that a user builds a kernel with a version of Clang that 
uses xxh3, and a version of rustc that still uses xxHash64. Then the hashes 
could be incompatible and the kernel will crash.

This PR is specific to X86-64 targets. Your suggestion seems like a general 
enhancement that should apply to all targets; I think it should be a separate 
PR.

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-12-09 Thread Scott Constable via cfe-commits


scottconstable wrote:

@rcvalle @maurer Do you have any feedback on the alternative proposal, 
https://github.com/llvm/llvm-project/pull/117121#issuecomment-2518240999?

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)

2024-12-26 Thread Scott Constable via cfe-commits



@@ -181,8 +181,26 @@ void X86AsmPrinter::emitKCFITypeId(const MachineFunction 
&MF) {
   // Embed the type hash in the X86::MOV32ri instruction to avoid special
   // casing object file parsers.
   EmitKCFITypePadding(MF);
+
+  Register MovReg = X86::EAX;
+  const auto &Triple = MF.getTarget().getTargetTriple();
+  if (Triple.isArch64Bit() && Triple.isOSLinux()) {
+// Determine the function's arity (i.e., the number of arguments) at the 
ABI
+// level by counting the number of parameters that are passed
+// as registers, such as pointers and 64-bit (or smaller) integers. The
+// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+// Additional parameters or parameters larger than 64 bits may be passed on
+// the stack, in which case the arity is denoted as 7.
+const unsigned ArityToRegMap[8] = {X86::EAX, X86::ECX, X86::EDX, X86::EBX,
+   X86::ESP, X86::EBP, X86::ESI, X86::EDI};

scottconstable wrote:

Hi @phoebewang, based on my understanding of kCFI, the `MOVri` instruction 
emitted by the compiler in each function header will never be executed. The 
purpose of this instruction is simply to insert the function's kCFI type ID 
into the header so that it can be verified at the call site before making the 
call. For example, this is how a kCFI-enforced indirect call looks when the 
kernel is configured with kCFI:
```
0010 :# foo loads the target address into rax
  ...
  25:   mov$0x70e5d6ba,%r10d # 0x70e5d6ba is the kCFI type ID that must 
be matched at the target
  2b:   add-0x4(%rax),%r10d # -0x4(%rax) refers to the immediate 
operand in the `MOVri` instruction, which holds the kCFI type ID
  2f:   je 33 # If the two type IDs match, then jump to the 
call; otherwise #UD
  31:   ud2
  33:   call   *%rax  # Note that this will jump directly to the target 
function, and will not execute the `MOVri`
```
Does this address your concern?

https://github.com/llvm/llvm-project/pull/121070
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)

2024-12-26 Thread Scott Constable via cfe-commits


scottconstable wrote:

> And you cannot use ESI when it's used to pass arguments.

@phoebewang I think that my 
https://github.com/llvm/llvm-project/pull/121070#discussion_r1898074522 also 
applies to this concern.

https://github.com/llvm/llvm-project/pull/121070
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)

2024-12-26 Thread Scott Constable via cfe-commits


https://github.com/scottconstable updated 
https://github.com/llvm/llvm-project/pull/121070

>From 73b6f6332cf699b7fa1bcf7cae90ef828cfa9158 Mon Sep 17 00:00:00 2001
From: Scott D Constable 
Date: Mon, 23 Dec 2024 13:48:48 -0800
Subject: [PATCH] Implement a new kcfi_x86_arity feature that encodes an
 indirect call target's arity (i.e., the number of live-in registers) in the
 function's __cfi header.

---
 clang/include/clang/Basic/Features.def|  1 +
 llvm/lib/Target/X86/X86AsmPrinter.cpp | 19 +-
 .../X86/kcfi-patchable-function-prefix.ll |  4 +-
 llvm/test/CodeGen/X86/kcfi.ll | 63 ++-
 4 files changed, 81 insertions(+), 6 deletions(-)

diff --git a/clang/include/clang/Basic/Features.def 
b/clang/include/clang/Basic/Features.def
index c82b6d9b5f6c10..dca8f4dc0fbf76 100644
--- a/clang/include/clang/Basic/Features.def
+++ b/clang/include/clang/Basic/Features.def
@@ -254,6 +254,7 @@ FEATURE(is_trivially_constructible, LangOpts.CPlusPlus)
 FEATURE(is_trivially_copyable, LangOpts.CPlusPlus)
 FEATURE(is_union, LangOpts.CPlusPlus)
 FEATURE(kcfi, LangOpts.Sanitize.has(SanitizerKind::KCFI))
+FEATURE(kcfi_x86_arity, LangOpts.Sanitize.has(SanitizerKind::KCFI))
 FEATURE(modules, LangOpts.Modules)
 FEATURE(safe_stack, LangOpts.Sanitize.has(SanitizerKind::SafeStack))
 FEATURE(shadow_call_stack,
diff --git a/llvm/lib/Target/X86/X86AsmPrinter.cpp 
b/llvm/lib/Target/X86/X86AsmPrinter.cpp
index f01e47b41cf5e4..fe2ce305dcf6e5 100644
--- a/llvm/lib/Target/X86/X86AsmPrinter.cpp
+++ b/llvm/lib/Target/X86/X86AsmPrinter.cpp
@@ -181,8 +181,25 @@ void X86AsmPrinter::emitKCFITypeId(const MachineFunction 
&MF) {
   // Embed the type hash in the X86::MOV32ri instruction to avoid special
   // casing object file parsers.
   EmitKCFITypePadding(MF);
+
+  // The ArityToRegMap assumes the 64-bit Linux kernel ABI, so verify the 
target
+  const auto &Triple = MF.getTarget().getTargetTriple();
+  assert(Triple.isArch64Bit() && Triple.isOSLinux());
+
+  // Determine the function's arity (i.e., the number of arguments) at the ABI
+  // level by counting the number of parameters that are passed
+  // as registers, such as pointers and 64-bit (or smaller) integers. The
+  // Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+  // Additional parameters or parameters larger than 64 bits may be passed on
+  // the stack, in which case the arity is denoted as 7.
+  const unsigned ArityToRegMap[8] = {X86::EAX, X86::ECX, X86::EDX, X86::EBX,
+ X86::ESP, X86::EBP, X86::ESI, X86::EDI};
+  int Arity = MF.getInfo()->getArgumentStackSize() > 0
+  ? 7
+  : MF.getRegInfo().liveins().size();
+
   EmitAndCountInstruction(MCInstBuilder(X86::MOV32ri)
-  .addReg(X86::EAX)
+  .addReg(ArityToRegMap[Arity])
   .addImm(MaskKCFIType(Type->getZExtValue(;
 
   if (MAI->hasDotTypeDotSizeDirective()) {
diff --git a/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll 
b/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll
index 1b7bd7835e890c..deababc7fd5379 100644
--- a/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll
+++ b/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll
@@ -3,7 +3,7 @@
 ; CHECK:  .p2align 4
 ; CHECK-LABEL:__cfi_f1:
 ; CHECK-COUNT-11:   nop
-; CHECK-NEXT:   movl $12345678, %eax
+; CHECK-NEXT:   movl $12345678, %ecx
 ; CHECK-LABEL:.Lcfi_func_end0:
 ; CHECK-NEXT: .size   __cfi_f1, .Lcfi_func_end0-__cfi_f1
 ; CHECK-LABEL:f1:
@@ -26,7 +26,7 @@ define void @f2(ptr noundef %x) {
 ; CHECK:  .p2align 4
 ; CHECK-LABEL:__cfi_f3:
 ; CHECK-NOT:nop
-; CHECK-NEXT:   movl $12345678, %eax
+; CHECK-NEXT:   movl $12345678, %ecx
 ; CHECK-COUNT-11:   nop
 ; CHECK-LABEL:f3:
 define void @f3(ptr noundef %x) #0 !kcfi_type !1 {
diff --git a/llvm/test/CodeGen/X86/kcfi.ll b/llvm/test/CodeGen/X86/kcfi.ll
index 059efcc71b0eb8..91d706796b8e99 100644
--- a/llvm/test/CodeGen/X86/kcfi.ll
+++ b/llvm/test/CodeGen/X86/kcfi.ll
@@ -16,7 +16,7 @@
 ; ASM-NEXT:nop
 ; ASM-NEXT:nop
 ; ASM-NEXT:nop
-; ASM-NEXT:movl $12345678, %eax
+; ASM-NEXT:movl $12345678, %ecx
 ; ASM-LABEL: .Lcfi_func_end0:
 ; ASM-NEXT:  .size   __cfi_f1, .Lcfi_func_end0-__cfi_f1
 define void @f1(ptr noundef %x) !kcfi_type !1 {
@@ -90,7 +90,7 @@ define void @f4(ptr noundef %x) #0 {
 
 ;; Ensure we emit Value + 1 for unwanted values (e.g. endbr64 == 4196274163).
 ; ASM-LABEL: __cfi_f5:
-; ASM: movl $4196274164, %eax # imm = 0xFA1E0FF4
+; ASM: movl $4196274164, %ecx # imm = 0xFA1E0FF4
 define void @f5(ptr noundef %x) !kcfi_type !2 {
 ; ASM-LABEL: f5:
 ; ASM: movl $98693132, %r10d # imm = 0x5E1F00C
@@ -100,7 +100,7 @@ define void @f5(ptr noundef %x) !kcfi_type !2 {
 
 ;; Ensure we emit Value + 1 for unwanted values (e.g. -endbr64 == 98693133).
 ; ASM-LABEL: __cfi_f6:
-; ASM: movl $98693134, %eax # imm =

[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)

2024-12-26 Thread Scott Constable via cfe-commits



@@ -181,8 +181,26 @@ void X86AsmPrinter::emitKCFITypeId(const MachineFunction 
&MF) {
   // Embed the type hash in the X86::MOV32ri instruction to avoid special
   // casing object file parsers.
   EmitKCFITypePadding(MF);
+
+  Register MovReg = X86::EAX;
+  const auto &Triple = MF.getTarget().getTargetTriple();
+  if (Triple.isArch64Bit() && Triple.isOSLinux()) {

scottconstable wrote:

@phoebewang Good suggestion! I updated the PR to use an assert.

https://github.com/llvm/llvm-project/pull/121070
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)

2025-02-04 Thread Scott Constable via cfe-commits


scottconstable wrote:

@lvwr @maurer @rcvalle A gentle reminder to please review this PR.

https://github.com/llvm/llvm-project/pull/121070
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-12-20 Thread Scott Constable via cfe-commits


https://github.com/scottconstable updated 
https://github.com/llvm/llvm-project/pull/117121

>From b787d6d7a9c0904c5e47e32556103b8a5220a7d1 Mon Sep 17 00:00:00 2001
From: Scott D Constable 
Date: Tue, 19 Nov 2024 15:51:05 -0800
Subject: [PATCH] Enhance KCFI type IDs with a 3-bit arity indicator.

---
 clang/lib/CodeGen/CodeGenModule.cpp   | 31 +--
 clang/test/CodeGen/kcfi-normalize.c   | 18 +++
 clang/test/CodeGen/kcfi.c | 22 +++--
 llvm/lib/Transforms/Utils/ModuleUtils.cpp | 31 +--
 .../GCOVProfiling/kcfi-normalize.ll   |  7 +++--
 llvm/test/Transforms/GCOVProfiling/kcfi.ll|  7 +++--
 6 files changed, 95 insertions(+), 21 deletions(-)

diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index b854eeb62a80ce..f9c01edf4f0953 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2183,7 +2183,8 @@ llvm::ConstantInt 
*CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) {
 }
 
 llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) {
-  if (auto *FnType = T->getAs())
+  auto *FnType = T->getAs();
+  if (FnType)
 T = getContext().getFunctionType(
 FnType->getReturnType(), FnType->getParamTypes(),
 FnType->getExtProtoInfo().withExceptionSpec(EST_None));
@@ -2196,8 +2197,32 @@ llvm::ConstantInt 
*CodeGenModule::CreateKCFITypeId(QualType T) {
   if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers)
 Out << ".normalized";
 
-  return llvm::ConstantInt::get(Int32Ty,
-
static_cast(llvm::xxHash64(OutName)));
+  uint32_t OutHash = static_cast(llvm::xxHash64(OutName));
+  const auto &Triple = getTarget().getTriple();
+  if (FnType && Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux()) {
+// Estimate the function's arity (i.e., the number of arguments) at the ABI
+// level by counting the number of parameters that are likely to be passed
+// as registers, such as pointers and 64-bit (or smaller) integers. The
+// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+// Additional parameters or parameters larger than 64 bits may be passed on
+// the stack, in which case the arity is denoted as 7.
+bool MayHaveStackArgs = FnType->getNumParams() > 6;
+
+for (unsigned int i = 0; !MayHaveStackArgs && i < FnType->getNumParams();
+ ++i) {
+  const Type *PT = FnType->getParamType(i).getTypePtr();
+  if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() &&
+getContext().getTypeSize(PT) <= 64)))
+MayHaveStackArgs = true;
+}
+
+// The 3-bit arity is concatenated with the lower 29 bits of the KCFI hash
+// to form an enhanced KCFI type ID. This can prevent, for example, a
+// 3-arity function's ID from ever colliding with a 2-arity function's ID.
+OutHash = (OutHash << 3) | (MayHaveStackArgs ? 7 : FnType->getNumParams());
+  }
+
+  return llvm::ConstantInt::get(Int32Ty, OutHash);
 }
 
 void CodeGenModule::SetLLVMFunctionAttributes(GlobalDecl GD,
diff --git a/clang/test/CodeGen/kcfi-normalize.c 
b/clang/test/CodeGen/kcfi-normalize.c
index b9150e88f6ab5f..8b7445fc85e490 100644
--- a/clang/test/CodeGen/kcfi-normalize.c
+++ b/clang/test/CodeGen/kcfi-normalize.c
@@ -10,25 +10,31 @@
 void foo(void (*fn)(int), int arg) {
 // CHECK-LABEL: define{{.*}}foo
 // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE1:[0-9]+]]
-// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 1162514891) ]
+// KCFI ID = 0x2A548E59
+// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 710184537) ]
 fn(arg);
 }
 
 void bar(void (*fn)(int, int), int arg1, int arg2) {
 // CHECK-LABEL: define{{.*}}bar
 // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE2:[0-9]+]]
-// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 
448046469) ]
+// KCFI ID = 0xD5A52C2A
+// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 
-710595542) ]
 fn(arg1, arg2);
 }
 
 void baz(void (*fn)(int, int, int), int arg1, int arg2, int arg3) {
 // CHECK-LABEL: define{{.*}}baz
 // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE3:[0-9]+]]
-// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef 
%3){{.*}}[ "kcfi"(i32 -2049681433) ]
+// KCFI ID = 0x2EA2BF3B
+// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef 
%3){{.*}}[ "kcfi"(i32 782417723) ]
 fn(arg1, arg2, arg3);
 }
 
 // CHECK: ![[#]] = !{i32 4, !"cfi-normalize-integers", i32 1}
-// CHECK: ![[TYPE1]] = !{i32 -1143117868}
-// CHECK: ![[TYPE2]] = !{i32 -460921415}
-// CHECK: ![[TYPE3]] = !{i32 -333839615}
+// KCFI ID = DEEB3EA2
+// CHECK: ![[TYPE1]] = !{i32 -555008350}
+// KCFI ID = 24372DCB
+// CHECK: ![[TYPE2]] = !{i32 607595979}
+// KCFI ID = 0x60D0180C
+// CHECK: ![[TYPE3]] = !{i32 1624250380}
diff --git a/clang/test/CodeGen/kcfi.c b/clang/test/CodeGen

[clang] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-21 Thread Scott Constable via cfe-commits


https://github.com/scottconstable edited 
https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-21 Thread Scott Constable via cfe-commits


https://github.com/scottconstable edited 
https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-21 Thread Scott Constable via cfe-commits


scottconstable wrote:

@phoebewang and @lvwr I also noticed that there is this code in LLVM: 
https://github.com/llvm/llvm-project/blob/9ba6672b9f0e82a1f6d4100dc832c84447ea545c/llvm/lib/Transforms/Utils/ModuleUtils.cpp#L202-L214.
 As far as I can tell, this code is not triggered when I build the Linux kernel 
with `-fsanitize=kcfi`.

When is this code triggered? And do you think it is necessary to additionally 
implement the arity-enhancement to this code?

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-22 Thread Scott Constable via cfe-commits


https://github.com/scottconstable updated 
https://github.com/llvm/llvm-project/pull/117121

>From ccb50df487043cde9414943f08988bc9e6ee Mon Sep 17 00:00:00 2001
From: Scott D Constable 
Date: Tue, 19 Nov 2024 15:51:05 -0800
Subject: [PATCH] Enhance KCFI type IDs with a 3-bit arity indicator.

---
 clang/lib/CodeGen/CodeGenModule.cpp   | 31 +--
 clang/test/CodeGen/kcfi-normalize.c   | 18 +++
 clang/test/CodeGen/kcfi.c | 22 +++--
 llvm/lib/Transforms/Utils/ModuleUtils.cpp | 29 +++--
 .../GCOVProfiling/kcfi-normalize.ll   |  7 +++--
 llvm/test/Transforms/GCOVProfiling/kcfi.ll|  7 +++--
 6 files changed, 94 insertions(+), 20 deletions(-)

diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index b854eeb62a80ce..f9c01edf4f0953 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2183,7 +2183,8 @@ llvm::ConstantInt 
*CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) {
 }
 
 llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) {
-  if (auto *FnType = T->getAs())
+  auto *FnType = T->getAs();
+  if (FnType)
 T = getContext().getFunctionType(
 FnType->getReturnType(), FnType->getParamTypes(),
 FnType->getExtProtoInfo().withExceptionSpec(EST_None));
@@ -2196,8 +2197,32 @@ llvm::ConstantInt 
*CodeGenModule::CreateKCFITypeId(QualType T) {
   if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers)
 Out << ".normalized";
 
-  return llvm::ConstantInt::get(Int32Ty,
-
static_cast(llvm::xxHash64(OutName)));
+  uint32_t OutHash = static_cast(llvm::xxHash64(OutName));
+  const auto &Triple = getTarget().getTriple();
+  if (FnType && Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux()) {
+// Estimate the function's arity (i.e., the number of arguments) at the ABI
+// level by counting the number of parameters that are likely to be passed
+// as registers, such as pointers and 64-bit (or smaller) integers. The
+// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+// Additional parameters or parameters larger than 64 bits may be passed on
+// the stack, in which case the arity is denoted as 7.
+bool MayHaveStackArgs = FnType->getNumParams() > 6;
+
+for (unsigned int i = 0; !MayHaveStackArgs && i < FnType->getNumParams();
+ ++i) {
+  const Type *PT = FnType->getParamType(i).getTypePtr();
+  if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() &&
+getContext().getTypeSize(PT) <= 64)))
+MayHaveStackArgs = true;
+}
+
+// The 3-bit arity is concatenated with the lower 29 bits of the KCFI hash
+// to form an enhanced KCFI type ID. This can prevent, for example, a
+// 3-arity function's ID from ever colliding with a 2-arity function's ID.
+OutHash = (OutHash << 3) | (MayHaveStackArgs ? 7 : FnType->getNumParams());
+  }
+
+  return llvm::ConstantInt::get(Int32Ty, OutHash);
 }
 
 void CodeGenModule::SetLLVMFunctionAttributes(GlobalDecl GD,
diff --git a/clang/test/CodeGen/kcfi-normalize.c 
b/clang/test/CodeGen/kcfi-normalize.c
index b9150e88f6ab5f..8b7445fc85e490 100644
--- a/clang/test/CodeGen/kcfi-normalize.c
+++ b/clang/test/CodeGen/kcfi-normalize.c
@@ -10,25 +10,31 @@
 void foo(void (*fn)(int), int arg) {
 // CHECK-LABEL: define{{.*}}foo
 // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE1:[0-9]+]]
-// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 1162514891) ]
+// KCFI ID = 0x2A548E59
+// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 710184537) ]
 fn(arg);
 }
 
 void bar(void (*fn)(int, int), int arg1, int arg2) {
 // CHECK-LABEL: define{{.*}}bar
 // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE2:[0-9]+]]
-// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 
448046469) ]
+// KCFI ID = 0xD5A52C2A
+// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 
-710595542) ]
 fn(arg1, arg2);
 }
 
 void baz(void (*fn)(int, int, int), int arg1, int arg2, int arg3) {
 // CHECK-LABEL: define{{.*}}baz
 // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE3:[0-9]+]]
-// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef 
%3){{.*}}[ "kcfi"(i32 -2049681433) ]
+// KCFI ID = 0x2EA2BF3B
+// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef 
%3){{.*}}[ "kcfi"(i32 782417723) ]
 fn(arg1, arg2, arg3);
 }
 
 // CHECK: ![[#]] = !{i32 4, !"cfi-normalize-integers", i32 1}
-// CHECK: ![[TYPE1]] = !{i32 -1143117868}
-// CHECK: ![[TYPE2]] = !{i32 -460921415}
-// CHECK: ![[TYPE3]] = !{i32 -333839615}
+// KCFI ID = DEEB3EA2
+// CHECK: ![[TYPE1]] = !{i32 -555008350}
+// KCFI ID = 24372DCB
+// CHECK: ![[TYPE2]] = !{i32 607595979}
+// KCFI ID = 0x60D0180C
+// CHECK: ![[TYPE3]] = !{i32 1624250380}
diff --git a/clang/test/CodeGen/kcfi.c b/clang/test/CodeGen/k

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-22 Thread Scott Constable via cfe-commits

scottconstable wrote:

> > @phoebewang and @lvwr I also noticed that there is this code in LLVM:
> > https://github.com/llvm/llvm-project/blob/9ba6672b9f0e82a1f6d4100dc832c84447ea545c/llvm/lib/Transforms/Utils/ModuleUtils.cpp#L202-L214
> > 
> > . As far as I can tell, this code is not triggered when I build the Linux 
> > kernel with `-fsanitize=kcfi`.
> > When is this code triggered? And do you think it is necessary to 
> > additionally implement the arity-enhancement to this code?
> 
> I'm not familar with KCFI. I find it's added by @samitolvanen in 
> [e1c36bd](https://github.com/llvm/llvm-project/commit/e1c36bde0551977d4b2efae032af6dfc4b2b3936).
>  I think you should triger it with attached test case.

It looks to me like this code might be triggered in some LTO configurations, 
and/or when linking code compiled from multiple source languages with the 
expectation that the KCFI type IDs will be compatible. Is my understanding 
correct?

The comment in the code says "Matches CodeGenModule::CreateKCFITypeId in 
Clang," which I interpret to mean that this code should produce identical KCFI 
type IDs for identical function types, which might be tricky if the target 
binary is compiled from different languages. I added some code to 
`llvm::setKCFIType` that I hope will produce consistent output, but admittedly 
I'm not sure that my treatment of `clang::Type` and `llvm::Type` is consistent.

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-22 Thread Scott Constable via cfe-commits



@@ -208,10 +209,34 @@ void llvm::setKCFIType(Module &M, Function &F, StringRef 
MangledType) {
   std::string Type = MangledType.str();
   if (M.getModuleFlag("cfi-normalize-integers"))
 Type += ".normalized";
+
+  uint32_t OutHash = static_cast(llvm::xxHash64(Type));
+  auto T = Triple(Twine(M.getTargetTriple()));
+  if (T.isX86() && T.isArch64Bit() && T.isOSLinux()) {
+// Estimate the function's arity (i.e., the number of arguments) at the ABI
+// level by counting the number of parameters that are likely to be passed
+// as registers, such as pointers and 64-bit (or smaller) integers. The
+// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+// Additional parameters or parameters larger than 64 bits may be passed on
+// the stack, in which case the arity is denoted as 7.
+size_t NumParams = F.arg_size();
+bool MayHaveStackArgs = NumParams > 6;
+
+for (unsigned int i = 0; !MayHaveStackArgs && i < NumParams; ++i) {
+  const llvm::Type *PT = F.getArg(i)->getType();
+  if (!(PT->isPointerTy() || PT->getIntegerBitWidth() <= 64))

scottconstable wrote:

Is this condition equivalent to what I wrote in 
`CodeGenModule::CreateKCFITypeId` with `typeof(*PT) = clang::Type`? 
Specifically:
```
  if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() &&
getContext().getTypeSize(PT) <= 64)))
```

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-22 Thread Scott Constable via cfe-commits


https://github.com/scottconstable edited 
https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-22 Thread Scott Constable via cfe-commits


https://github.com/scottconstable edited 
https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-22 Thread Scott Constable via cfe-commits



@@ -208,10 +209,34 @@ void llvm::setKCFIType(Module &M, Function &F, StringRef 
MangledType) {
   std::string Type = MangledType.str();
   if (M.getModuleFlag("cfi-normalize-integers"))
 Type += ".normalized";
+
+  uint32_t OutHash = static_cast(llvm::xxHash64(Type));
+  auto T = Triple(Twine(M.getTargetTriple()));

scottconstable wrote:

This looks awkward and I regret that I needed to include another header to 
enable this. Maybe there is a workable API within the existing headers, but I 
couldn't find one.

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-22 Thread Scott Constable via cfe-commits


https://github.com/scottconstable updated 
https://github.com/llvm/llvm-project/pull/117121

>From 4f21a0c817d221398bb93e25452518d265794265 Mon Sep 17 00:00:00 2001
From: Scott D Constable 
Date: Tue, 19 Nov 2024 15:51:05 -0800
Subject: [PATCH] Enhance KCFI type IDs with a 3-bit arity indicator.

---
 clang/lib/CodeGen/CodeGenModule.cpp   | 31 +--
 clang/test/CodeGen/kcfi-normalize.c   | 18 +++
 clang/test/CodeGen/kcfi.c | 22 +++--
 llvm/lib/Transforms/Utils/ModuleUtils.cpp | 31 +--
 .../GCOVProfiling/kcfi-normalize.ll   |  7 +++--
 llvm/test/Transforms/GCOVProfiling/kcfi.ll|  7 +++--
 6 files changed, 95 insertions(+), 21 deletions(-)

diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index b854eeb62a80ce..f9c01edf4f0953 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2183,7 +2183,8 @@ llvm::ConstantInt 
*CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) {
 }
 
 llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) {
-  if (auto *FnType = T->getAs())
+  auto *FnType = T->getAs();
+  if (FnType)
 T = getContext().getFunctionType(
 FnType->getReturnType(), FnType->getParamTypes(),
 FnType->getExtProtoInfo().withExceptionSpec(EST_None));
@@ -2196,8 +2197,32 @@ llvm::ConstantInt 
*CodeGenModule::CreateKCFITypeId(QualType T) {
   if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers)
 Out << ".normalized";
 
-  return llvm::ConstantInt::get(Int32Ty,
-
static_cast(llvm::xxHash64(OutName)));
+  uint32_t OutHash = static_cast(llvm::xxHash64(OutName));
+  const auto &Triple = getTarget().getTriple();
+  if (FnType && Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux()) {
+// Estimate the function's arity (i.e., the number of arguments) at the ABI
+// level by counting the number of parameters that are likely to be passed
+// as registers, such as pointers and 64-bit (or smaller) integers. The
+// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+// Additional parameters or parameters larger than 64 bits may be passed on
+// the stack, in which case the arity is denoted as 7.
+bool MayHaveStackArgs = FnType->getNumParams() > 6;
+
+for (unsigned int i = 0; !MayHaveStackArgs && i < FnType->getNumParams();
+ ++i) {
+  const Type *PT = FnType->getParamType(i).getTypePtr();
+  if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() &&
+getContext().getTypeSize(PT) <= 64)))
+MayHaveStackArgs = true;
+}
+
+// The 3-bit arity is concatenated with the lower 29 bits of the KCFI hash
+// to form an enhanced KCFI type ID. This can prevent, for example, a
+// 3-arity function's ID from ever colliding with a 2-arity function's ID.
+OutHash = (OutHash << 3) | (MayHaveStackArgs ? 7 : FnType->getNumParams());
+  }
+
+  return llvm::ConstantInt::get(Int32Ty, OutHash);
 }
 
 void CodeGenModule::SetLLVMFunctionAttributes(GlobalDecl GD,
diff --git a/clang/test/CodeGen/kcfi-normalize.c 
b/clang/test/CodeGen/kcfi-normalize.c
index b9150e88f6ab5f..8b7445fc85e490 100644
--- a/clang/test/CodeGen/kcfi-normalize.c
+++ b/clang/test/CodeGen/kcfi-normalize.c
@@ -10,25 +10,31 @@
 void foo(void (*fn)(int), int arg) {
 // CHECK-LABEL: define{{.*}}foo
 // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE1:[0-9]+]]
-// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 1162514891) ]
+// KCFI ID = 0x2A548E59
+// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 710184537) ]
 fn(arg);
 }
 
 void bar(void (*fn)(int, int), int arg1, int arg2) {
 // CHECK-LABEL: define{{.*}}bar
 // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE2:[0-9]+]]
-// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 
448046469) ]
+// KCFI ID = 0xD5A52C2A
+// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 
-710595542) ]
 fn(arg1, arg2);
 }
 
 void baz(void (*fn)(int, int, int), int arg1, int arg2, int arg3) {
 // CHECK-LABEL: define{{.*}}baz
 // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE3:[0-9]+]]
-// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef 
%3){{.*}}[ "kcfi"(i32 -2049681433) ]
+// KCFI ID = 0x2EA2BF3B
+// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef 
%3){{.*}}[ "kcfi"(i32 782417723) ]
 fn(arg1, arg2, arg3);
 }
 
 // CHECK: ![[#]] = !{i32 4, !"cfi-normalize-integers", i32 1}
-// CHECK: ![[TYPE1]] = !{i32 -1143117868}
-// CHECK: ![[TYPE2]] = !{i32 -460921415}
-// CHECK: ![[TYPE3]] = !{i32 -333839615}
+// KCFI ID = DEEB3EA2
+// CHECK: ![[TYPE1]] = !{i32 -555008350}
+// KCFI ID = 24372DCB
+// CHECK: ![[TYPE2]] = !{i32 607595979}
+// KCFI ID = 0x60D0180C
+// CHECK: ![[TYPE3]] = !{i32 1624250380}
diff --git a/clang/test/CodeGen/kcfi.c b/clang/test/CodeGen

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-22 Thread Scott Constable via cfe-commits


https://github.com/scottconstable edited 
https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-22 Thread Scott Constable via cfe-commits

scottconstable wrote:

> > Second, this scheme reduces the expected number of hash collisions within 
> > each arity, compared against the expected number of collisions (0.01383765) 
> > for the 32-bit hashing scheme that includes all arities. The table below 
> > shows the expected number of collisions for each arity, given the number of 
> > unique indirect callable function types within that arity in the same 
> > Ubuntu 24.04 server kernel discussed above.
> 
> The collisions vary a lot with different number of function types. It looks 
> to me more smooth if we use 2 bits to distinguish 4 cases: 1, 2, 3 and 0 or 
> others.

I re-ran the numbers with a 30-bit hash and 2-bit arity, and you are correct 
that the distribution of expected collisions is more smooth:

| Arity | Unique Indirect Callable Function Types | Number of Expected 
Collisions |
| - | --- | - |
| 0 or >3 | 2089 |  0.00201654 |
| 1 | 2492 |  0.00287330 |
| 2 | 3775 | 0.00660789 |
| 3 | 2547 | 0.00300181 |

However, a 2-bit arity would undermine what is arguably the more desirable 
property:

> This scheme enhances security in two ways. First, it prevents a j-arity 
> function pointer from being used to call a k-arity function, unless j=k. The 
> current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr from 
> calling a 3-arity target if the kCFI hashes collide. If this were to happen, 
> then potentially malicious stale/dead data in RDX at the call site could 
> suddenly become live as the third parameter at the call target.

For example, if the 30-bit hash of a 0-arity function type collides with the 
30-bit hash of the type of a 4-arity function type, then the `RDI`, `RSI`, 
`RDX`, and `RCX` registers that die when calling a function of the 0-arity type 
will unexpectedly become live if a COP attack redirects the call to a function 
of the 4-arity type.

Therefore, I believe that the 29-bit hash and 3-bit arity offers a more 
favorable security posture.

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-11-22 Thread Scott Constable via cfe-commits


https://github.com/scottconstable updated 
https://github.com/llvm/llvm-project/pull/117121

>From d5ec228a1d3e059e2a64ac1a150bec6f65d573e7 Mon Sep 17 00:00:00 2001
From: Scott D Constable 
Date: Tue, 19 Nov 2024 15:51:05 -0800
Subject: [PATCH] Enhance KCFI type IDs with a 3-bit arity indicator.

---
 clang/lib/CodeGen/CodeGenModule.cpp   | 31 +--
 clang/test/CodeGen/kcfi-normalize.c   | 18 +++
 clang/test/CodeGen/kcfi.c | 22 +++--
 llvm/lib/Transforms/Utils/ModuleUtils.cpp | 29 +++--
 .../GCOVProfiling/kcfi-normalize.ll   |  7 +++--
 llvm/test/Transforms/GCOVProfiling/kcfi.ll|  7 +++--
 6 files changed, 94 insertions(+), 20 deletions(-)

diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index b854eeb62a80ce..f9c01edf4f0953 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2183,7 +2183,8 @@ llvm::ConstantInt 
*CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) {
 }
 
 llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) {
-  if (auto *FnType = T->getAs())
+  auto *FnType = T->getAs();
+  if (FnType)
 T = getContext().getFunctionType(
 FnType->getReturnType(), FnType->getParamTypes(),
 FnType->getExtProtoInfo().withExceptionSpec(EST_None));
@@ -2196,8 +2197,32 @@ llvm::ConstantInt 
*CodeGenModule::CreateKCFITypeId(QualType T) {
   if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers)
 Out << ".normalized";
 
-  return llvm::ConstantInt::get(Int32Ty,
-
static_cast(llvm::xxHash64(OutName)));
+  uint32_t OutHash = static_cast(llvm::xxHash64(OutName));
+  const auto &Triple = getTarget().getTriple();
+  if (FnType && Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux()) {
+// Estimate the function's arity (i.e., the number of arguments) at the ABI
+// level by counting the number of parameters that are likely to be passed
+// as registers, such as pointers and 64-bit (or smaller) integers. The
+// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+// Additional parameters or parameters larger than 64 bits may be passed on
+// the stack, in which case the arity is denoted as 7.
+bool MayHaveStackArgs = FnType->getNumParams() > 6;
+
+for (unsigned int i = 0; !MayHaveStackArgs && i < FnType->getNumParams();
+ ++i) {
+  const Type *PT = FnType->getParamType(i).getTypePtr();
+  if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() &&
+getContext().getTypeSize(PT) <= 64)))
+MayHaveStackArgs = true;
+}
+
+// The 3-bit arity is concatenated with the lower 29 bits of the KCFI hash
+// to form an enhanced KCFI type ID. This can prevent, for example, a
+// 3-arity function's ID from ever colliding with a 2-arity function's ID.
+OutHash = (OutHash << 3) | (MayHaveStackArgs ? 7 : FnType->getNumParams());
+  }
+
+  return llvm::ConstantInt::get(Int32Ty, OutHash);
 }
 
 void CodeGenModule::SetLLVMFunctionAttributes(GlobalDecl GD,
diff --git a/clang/test/CodeGen/kcfi-normalize.c 
b/clang/test/CodeGen/kcfi-normalize.c
index b9150e88f6ab5f..8b7445fc85e490 100644
--- a/clang/test/CodeGen/kcfi-normalize.c
+++ b/clang/test/CodeGen/kcfi-normalize.c
@@ -10,25 +10,31 @@
 void foo(void (*fn)(int), int arg) {
 // CHECK-LABEL: define{{.*}}foo
 // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE1:[0-9]+]]
-// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 1162514891) ]
+// KCFI ID = 0x2A548E59
+// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 710184537) ]
 fn(arg);
 }
 
 void bar(void (*fn)(int, int), int arg1, int arg2) {
 // CHECK-LABEL: define{{.*}}bar
 // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE2:[0-9]+]]
-// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 
448046469) ]
+// KCFI ID = 0xD5A52C2A
+// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 
-710595542) ]
 fn(arg1, arg2);
 }
 
 void baz(void (*fn)(int, int, int), int arg1, int arg2, int arg3) {
 // CHECK-LABEL: define{{.*}}baz
 // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE3:[0-9]+]]
-// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef 
%3){{.*}}[ "kcfi"(i32 -2049681433) ]
+// KCFI ID = 0x2EA2BF3B
+// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef 
%3){{.*}}[ "kcfi"(i32 782417723) ]
 fn(arg1, arg2, arg3);
 }
 
 // CHECK: ![[#]] = !{i32 4, !"cfi-normalize-integers", i32 1}
-// CHECK: ![[TYPE1]] = !{i32 -1143117868}
-// CHECK: ![[TYPE2]] = !{i32 -460921415}
-// CHECK: ![[TYPE3]] = !{i32 -333839615}
+// KCFI ID = DEEB3EA2
+// CHECK: ![[TYPE1]] = !{i32 -555008350}
+// KCFI ID = 24372DCB
+// CHECK: ![[TYPE2]] = !{i32 607595979}
+// KCFI ID = 0x60D0180C
+// CHECK: ![[TYPE3]] = !{i32 1624250380}
diff --git a/clang/test/CodeGen/kcfi.c b/clang/test/CodeGen/k

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-12-03 Thread Scott Constable via cfe-commits


scottconstable wrote:

Hi @maurer, I can honestly say that I appreciate your attention to detail. 
While your observation that "attackers often have direct control of 
intentionally passed arguments" is true in general, I do not see evidence of 
this among indirect calls in the Linux kernel. 77% of all indirect call 
parameters are pointers, and AFAIK user-mode code cannot exert arbitrary 
control over pointers within the kernel (especially given that x86 Linux 
kernels enable SMAP and SMEP when it is available on the CPU). That leaves the 
other 23% of indirect call parameters to consider, though I admit I don't have 
a concrete breakdown of what fraction of these *might* be attacker controllable.

But let's assume for the sake of discussion that all of the non-pointer 
indirect call parameters are attacker controllable. And because we don't know 
which dead registers at a call site *might* be attacker-controllable, let's 
similarly assume for the sake of discussion that all of these dead registers 
are attacker-controllable. I built the table below to estimate for the 
arity-enhanced scheme a kind of exploitability score, i.e., the expected number 
of attacker-controlled parameters times the frequency of function types of that 
arity. For example, suppose that a hash collision between two function types 
occurs within the arity-1 partition. The expected number of attacker-controlled 
parameters at the first function is 0.23, and likewise for the second. So the 
total number of attacker-controlled parameters is 0.46, times the relative 
frequency of arity-1 function types (roughly 26%) gives a weighted 
exploitability of 0.105138035.

| Arity | Expected # of attacker-controlled args | Weighted exploitability |
| --- | --- | |
| 0 | 0 | 0 |
| 1 | 0.46 | 0.105138035 |
| 2 | 0.92 | 0.318536183 |
| 3 | 1.38 | 0.322375493 |
| 4 | 1.84 | 0.197281482 |
| 5 | 2.3 | 0.109483628 |
| 6+ | 2.76 | 0.093409153 |

Total exploitability score for the arity enhancement: **1.146223975**

A similar analysis can be applied to the existing 32-bit hash scheme. For 
example, suppose a 1-arity function collides with a 2-arity function. Then the 
expected number of attacker-controlled parameters that would be live at the 
mis-matched target would be 0.23 if the 2-arity function calls the 1-arity 
function, or 1.23 if the 1-arity function calls the 2-arity function (because 
the second parameter at the target would be a dead register at the caller). 
This data is captured in the table below, where the columns and rows refer to 
the respective arities of the colliding function types.

|   | 0 | 1 | 2 | 3 | 4 | 5 | 6+ |
|  |  |  |  |  |  |  |  |
| **0** | 0 |   |   |   |   |   | 
| **1** | 1 | 0.46  |   |   |   |   | 
| **2** | 2 | 1.46  | 0.92  |   |   |   |
| **3** | 3 | 2.46  | 1.92  | 1.38  |   |   | 
| **4** | 4 | 3.46  | 2.92  | 2.38  | 1.84  |   | 
| **5** | 5 | 4.46  | 3.92  | 3.38  | 2.84  |  2.3  | 
| **6+**| 6 |  5.46 | 4.92  | 4.38  | 3.84  | 3.3   | 2.76

Assuming that a collision occurs, this table shows the probability density of 
the arities (note that the probabilities do sum to 1):

| | 0   | 1   | 2   | 3   | 4   
| 5   | 6+  |
|-|-|-|-|-|-|-|-|
| **0**   | 8.61406E-06 | | | | 
| | |
| **1**   | 0.00134164  | 0.052240106 | | | 
| | |
| **2**   | 0.00203238  | 0.15827159  | 0.119878662 | | 
| | |
| **3**   | 0.001371251 | 0.106786156 | 0.161764743 | 0.054571497 | 
| | |
| **4**   | 0.000629365 | 0.049011785 | 0.074245381 | 0.050093506 | 0.011495742 
| | |
| **5**   | 0.000279419 | 0.021759723 | 0.032962663 | 0.022239974 | 0.010207511 
| 0.00226591  | |
| **6+**  | 0.000198662 | 0.015470786 | 0.023435882 | 0.015812236 | 0.007257363 
| 0.003222046 | 0.001145409 |

Then the weighted exploitability for each collision is:

| | 0   | 1   | 2   | 3   | 4   
| 5   | 6+ |
|-|-|-|-|-|-|-||
| **0**   | 0   | 0   | 0   | 0   | 0   
| 0   | 0  |
| **1**   | 0.00134164  | 0.024030449 | 0   | 0   | 0   
| 0   | 0  |
| **2**   | 0.00406476  | 0.231076521 | 0.110288369 | 0   | 0   
| 0   | 0  |
| **3**   | 0.004113752 | 0.262693944 | 0.3

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-12-03 Thread Scott Constable via cfe-commits

scottconstable wrote:

I also do not want to lose sight of one of the other obvious advantages that 
was mentioned in the PR description:

> One additional benefit of this patch is that it can benefit other CFI 
> approaches that build on kCFI, such as FineIBT. For example, this proposed 
> enhancement to FineIBT must be able to infer (at kernel init time) which 
> registers are live at an indirect call target: 
> https://lkml.org/lkml/2024/9/27/982. If the arity bits are available in the 
> kCFI type ID, then this information is trivial to infer.

To elaborate, we are concurrently working on a Linux kernel patch to enhance 
FineIBT (which is a KCFI-like solution that utilizes x86 Indirect Branch 
Tracking). The goal is to extend FineIBT to poison live argument registers if a 
hash check fails after a branch mis-prediction. This enhancement can help to 
mitigate a variety of Spectre attacks in the Linux kernel.

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)

2025-01-09 Thread Scott Constable via cfe-commits


scottconstable wrote:

@sirmc @samitolvanen @Darksonn @lvwr @maurer @rcvalle @MaskRay A gentle 
reminder to please review this PR.

https://github.com/llvm/llvm-project/pull/121070
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)

2025-01-28 Thread Scott Constable via cfe-commits


https://github.com/scottconstable updated 
https://github.com/llvm/llvm-project/pull/121070

>From 54296a3991e186629889611eff9e7cdd2c1cadca Mon Sep 17 00:00:00 2001
From: Scott D Constable 
Date: Mon, 23 Dec 2024 13:48:48 -0800
Subject: [PATCH] Implement a new kcfi_arity feature that encodes an indirect
 call target's arity (i.e., the number of live-in registers) in the function's
 __cfi header.

---
 clang/docs/ControlFlowIntegrity.rst   |   9 +
 clang/docs/UsersManual.rst|   6 +
 clang/include/clang/Basic/CodeGenOptions.def  |   1 +
 .../clang/Basic/DiagnosticDriverKinds.td  |   2 +
 clang/include/clang/Basic/Features.def|   1 +
 clang/include/clang/Driver/Options.td |   4 +
 clang/include/clang/Driver/SanitizerArgs.h|   1 +
 clang/lib/CodeGen/CodeGenModule.cpp   |   2 +
 clang/lib/Driver/SanitizerArgs.cpp|  10 +
 clang/test/CodeGen/kcfi-arity.c   |   7 +
 llvm/lib/Target/X86/X86AsmPrinter.cpp |  23 +-
 llvm/test/CodeGen/X86/kcfi-arity.ll   | 205 ++
 12 files changed, 270 insertions(+), 1 deletion(-)
 create mode 100644 clang/test/CodeGen/kcfi-arity.c
 create mode 100644 llvm/test/CodeGen/X86/kcfi-arity.ll

diff --git a/clang/docs/ControlFlowIntegrity.rst 
b/clang/docs/ControlFlowIntegrity.rst
index 7de805e2154db8..3c97f7da3d7d99 100644
--- a/clang/docs/ControlFlowIntegrity.rst
+++ b/clang/docs/ControlFlowIntegrity.rst
@@ -336,6 +336,15 @@ cross-DSO function address equality. These properties make 
KCFI easier to
 adopt in low-level software. KCFI is limited to checking only function
 pointers, and isn't compatible with executable-only memory.
 
+``-fsanitize-kcfi-arity``
+-
+
+For supported targets, this feature extends kCFI by telling the compiler to
+record information about each indirect-callable function's arity (i.e., the
+number of arguments passed in registers) into the binary. Some kernel CFI
+techniques, such as FineIBT, may be able to use this information to provide
+enhanced security.
+
 Member Function Pointer Call Checking
 =
 
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index 260e84910c6f78..e958e2fb353ef8 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2220,6 +2220,12 @@ are listed below.
 
This option is currently experimental.
 
+.. option:: -fsanitize-kcfi-arity
+
+   Extends kernel indirect call forward-edge control flow integrity with
+   additional function arity information (for supported targets). See
+   :doc:`ControlFlowIntegrity` for more details.
+
 .. option:: -fstrict-vtable-pointers
 
Enable optimizations based on the strict rules for overwriting polymorphic
diff --git a/clang/include/clang/Basic/CodeGenOptions.def 
b/clang/include/clang/Basic/CodeGenOptions.def
index 1ab8c7fb4d3c33..beb43b3ad8f121 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -277,6 +277,7 @@ CODEGENOPT(SanitizeCfiICallNormalizeIntegers, 1, 0) ///< 
Normalize integer types
 ///< CFI icall function 
signatures
 CODEGENOPT(SanitizeCfiCanonicalJumpTables, 1, 0) ///< Make jump table symbols 
canonical
  ///< instead of creating a 
local jump table.
+CODEGENOPT(SanitizeKcfiArity, 1, 0) ///< Embed arity in KCFI patchable 
function prefix
 CODEGENOPT(SanitizeCoverageType, 2, 0) ///< Type of sanitizer coverage
///< instrumentation.
 CODEGENOPT(SanitizeCoverageIndirectCalls, 1, 0) ///< Enable sanitizer coverage
diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td 
b/clang/include/clang/Basic/DiagnosticDriverKinds.td
index 288786b8ce9399..df40c1070376e6 100644
--- a/clang/include/clang/Basic/DiagnosticDriverKinds.td
+++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td
@@ -209,6 +209,8 @@ def err_drv_cannot_open_randomize_layout_seed_file : Error<
   "cannot read randomize layout seed file '%0'">;
 def err_drv_invalid_version_number : Error<
   "invalid version number in '%0'">;
+def err_drv_kcfi_arity_unsupported_target : Error<
+  "target '%0' is unsupported by -fsanitize-kcfi-arity">;
 def err_drv_no_linker_llvm_support : Error<
   "'%0': unable to pass LLVM bit-code files to linker">;
 def err_drv_no_ast_support : Error<
diff --git a/clang/include/clang/Basic/Features.def 
b/clang/include/clang/Basic/Features.def
index c82b6d9b5f6c10..e736b46411ed1e 100644
--- a/clang/include/clang/Basic/Features.def
+++ b/clang/include/clang/Basic/Features.def
@@ -254,6 +254,7 @@ FEATURE(is_trivially_constructible, LangOpts.CPlusPlus)
 FEATURE(is_trivially_copyable, LangOpts.CPlusPlus)
 FEATURE(is_union, LangOpts.CPlusPlus)
 FEATURE(kcfi, LangOpts.Sanitize.has(SanitizerKind::KCFI))
+FEATURE(kcfi_arity, LangOpts.Sanitize.has(SanitizerKind::KCFI))
 FEATURE(modules, LangOpts.M

[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)

2025-01-21 Thread Scott Constable via cfe-commits


scottconstable wrote:

@sirmc @samitolvanen @Darksonn @lvwr @maurer @rcvalle A gentle reminder to 
please review this PR.

https://github.com/llvm/llvm-project/pull/121070
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)

2025-01-13 Thread Scott Constable via cfe-commits

scottconstable wrote:

> > maurer: I think their point is that even if you are not changing the hash 
> > scheme, you are proposing breaking compatibility of the identifier with 
> > existing code. Since we don't want to do this many times, if we are 
> > breaking compatibility with existing code, they would like to batch it with 
> > another breaking update so that it doesn't need to be done again.
> > (This isn't me reviewing this PR, just trying to clear up some confusion.)
> 
> This:) Thanks for the explanation.
> 
> After my pending lld/MachO change, kcfi will be the only user of the legacy 
> `xxHash64`. I want to remove `xxHash64` from llvm-project.

Going back to @maurer 's comment about compatibility, I do not believe that 
this PR would break any compatibility with other compilers, or with other 
compiler versions. If a kernel is built with some of its functions having the 
3-bit arity extension and the rest of its functions not having the 3-bit arity 
extension (e.g., because the user's clang compiler has the extension, and their 
rust compiler does not), then the kernel will behave the same as it would if 
all functions or no functions have the 3-bit arity extension. If the kernel is 
hot-patched with the register hardening scheme in 
https://lkml.org/lkml/2024/9/27/982, then only the functions that have the 
3-bit arity extension will benefit from the register hardening scheme.

https://github.com/llvm/llvm-project/pull/121070
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)

2024-12-24 Thread Scott Constable via cfe-commits


https://github.com/scottconstable created 
https://github.com/llvm/llvm-project/pull/121070

Kernel Control Flow Integrity (kCFI) is a feature that hardens indirect calls 
by comparing a 32-bit hash of the function pointer's type against a hash of the 
target function's type. If the hashes do not match, the kernel may panic (or 
log the hash check failure, depending on the kernel's configuration). These 
hashes are computed at compile time by applying the xxHash64 algorithm to each 
mangled canonical function (or function pointer) type, then truncating the 
result to 32 bits. This hash is written into each indirect-callable function 
header by encoding it as the 32-bit immediate operand to a `MOVri` instruction, 
e.g.:
```
__cfi_foo:
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
movl$199571451, %eax# hash of foo's type = 0xBE537FB
foo:
...
```

This PR extends x86-based kCFI with a 3-bit arity indicator encoded in the 
`MOVri` instruction's register (reg) field as follows:

| Arity Indicator | Description | Encoding in reg field |
| --- | --- | --- |
| 0 | 0 parameters | EAX |
| 1 | 1 parameter in RDI | ECX |
| 2 | 2 parameters in RDI and RSI | EDX |
| 3 | 3 parameters in RDI, RSI, and RDX | EBX |
| 4 | 4 parameters in RDI, RSI, RDX, and RCX | ESP |
| 5 | 5 parameters in RDI, RSI, RDX, RCX, and R8 | EBP |
| 6 | 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 | ESI |
| 7 | At least one parameter may be passed on the stack | EDI |

For example, if `foo` takes 3 register arguments and no stack arguments then 
the `MOVri` instruction in its kCFI header would instead be written as:
```
movl$199571451, %ebx# hash of foo's type = 0xBE537FB
```

This PR will benefit other CFI approaches that build on kCFI, such as FineIBT. 
For example, this proposed enhancement to FineIBT must be able to infer (at 
kernel init time) which registers are live at an indirect call target: 
https://lkml.org/lkml/2024/9/27/982. If the arity bits are available in the 
kCFI function header, then this information is trivial to infer.

Note that there is another existing PR proposal that includes the 3-bit arity 
within the existing 32-bit immediate field, which introduces different security 
properties: https://github.com/llvm/llvm-project/pull/117121.

>From 81ff927ed34e9e8b61a432822f946544b411b3b4 Mon Sep 17 00:00:00 2001
From: Scott D Constable 
Date: Mon, 23 Dec 2024 13:48:48 -0800
Subject: [PATCH] Implement a new kcfi_x86_arity feature that encodes an
 indirect call target's arity (i.e., the number of live-in registers) in the
 function's __cfi header.

---
 clang/include/clang/Basic/Features.def|  1 +
 llvm/lib/Target/X86/X86AsmPrinter.cpp | 20 +-
 .../X86/kcfi-patchable-function-prefix.ll |  4 +-
 llvm/test/CodeGen/X86/kcfi.ll | 63 ++-
 4 files changed, 82 insertions(+), 6 deletions(-)

diff --git a/clang/include/clang/Basic/Features.def 
b/clang/include/clang/Basic/Features.def
index c82b6d9b5f6c10..dca8f4dc0fbf76 100644
--- a/clang/include/clang/Basic/Features.def
+++ b/clang/include/clang/Basic/Features.def
@@ -254,6 +254,7 @@ FEATURE(is_trivially_constructible, LangOpts.CPlusPlus)
 FEATURE(is_trivially_copyable, LangOpts.CPlusPlus)
 FEATURE(is_union, LangOpts.CPlusPlus)
 FEATURE(kcfi, LangOpts.Sanitize.has(SanitizerKind::KCFI))
+FEATURE(kcfi_x86_arity, LangOpts.Sanitize.has(SanitizerKind::KCFI))
 FEATURE(modules, LangOpts.Modules)
 FEATURE(safe_stack, LangOpts.Sanitize.has(SanitizerKind::SafeStack))
 FEATURE(shadow_call_stack,
diff --git a/llvm/lib/Target/X86/X86AsmPrinter.cpp 
b/llvm/lib/Target/X86/X86AsmPrinter.cpp
index f01e47b41cf5e4..cb21d9bb5f2879 100644
--- a/llvm/lib/Target/X86/X86AsmPrinter.cpp
+++ b/llvm/lib/Target/X86/X86AsmPrinter.cpp
@@ -181,8 +181,26 @@ void X86AsmPrinter::emitKCFITypeId(const MachineFunction 
&MF) {
   // Embed the type hash in the X86::MOV32ri instruction to avoid special
   // casing object file parsers.
   EmitKCFITypePadding(MF);
+
+  Register MovReg = X86::EAX;
+  const auto &Triple = MF.getTarget().getTargetTriple();
+  if (Triple.isArch64Bit() && Triple.isOSLinux()) {
+// Determine the function's arity (i.e., the number of arguments) at the 
ABI
+// level by counting the number of parameters that are passed
+// as registers, such as pointers and 64-bit (or smaller) integers. The
+// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+// Additional parameters or parameters larger than 64 bits may be passed on
+// the stack, in which case the arity is denoted as 7.
+const unsigned ArityToRegMap[8] = {X86::EAX, X86::ECX, X86::EDX, X86::EBX,
+  X86::ESP, X86::EBP, X86::ESI, X86::EDI};
+int Arity = MF.getInfo()->getArgumentStackSize() > 0
+? 7
+

[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)

2024-12-24 Thread Scott Constable via cfe-commits


https://github.com/scottconstable updated 
https://github.com/llvm/llvm-project/pull/121070

>From 70f1743e23ff8e218799e94b03a5bf033715666a Mon Sep 17 00:00:00 2001
From: Scott D Constable 
Date: Mon, 23 Dec 2024 13:48:48 -0800
Subject: [PATCH] Implement a new kcfi_x86_arity feature that encodes an
 indirect call target's arity (i.e., the number of live-in registers) in the
 function's __cfi header.

---
 clang/include/clang/Basic/Features.def|  1 +
 llvm/lib/Target/X86/X86AsmPrinter.cpp | 20 +-
 .../X86/kcfi-patchable-function-prefix.ll |  4 +-
 llvm/test/CodeGen/X86/kcfi.ll | 63 ++-
 4 files changed, 82 insertions(+), 6 deletions(-)

diff --git a/clang/include/clang/Basic/Features.def 
b/clang/include/clang/Basic/Features.def
index c82b6d9b5f6c10..dca8f4dc0fbf76 100644
--- a/clang/include/clang/Basic/Features.def
+++ b/clang/include/clang/Basic/Features.def
@@ -254,6 +254,7 @@ FEATURE(is_trivially_constructible, LangOpts.CPlusPlus)
 FEATURE(is_trivially_copyable, LangOpts.CPlusPlus)
 FEATURE(is_union, LangOpts.CPlusPlus)
 FEATURE(kcfi, LangOpts.Sanitize.has(SanitizerKind::KCFI))
+FEATURE(kcfi_x86_arity, LangOpts.Sanitize.has(SanitizerKind::KCFI))
 FEATURE(modules, LangOpts.Modules)
 FEATURE(safe_stack, LangOpts.Sanitize.has(SanitizerKind::SafeStack))
 FEATURE(shadow_call_stack,
diff --git a/llvm/lib/Target/X86/X86AsmPrinter.cpp 
b/llvm/lib/Target/X86/X86AsmPrinter.cpp
index f01e47b41cf5e4..aa5f1916d46ac1 100644
--- a/llvm/lib/Target/X86/X86AsmPrinter.cpp
+++ b/llvm/lib/Target/X86/X86AsmPrinter.cpp
@@ -181,8 +181,26 @@ void X86AsmPrinter::emitKCFITypeId(const MachineFunction 
&MF) {
   // Embed the type hash in the X86::MOV32ri instruction to avoid special
   // casing object file parsers.
   EmitKCFITypePadding(MF);
+
+  Register MovReg = X86::EAX;
+  const auto &Triple = MF.getTarget().getTargetTriple();
+  if (Triple.isArch64Bit() && Triple.isOSLinux()) {
+// Determine the function's arity (i.e., the number of arguments) at the 
ABI
+// level by counting the number of parameters that are passed
+// as registers, such as pointers and 64-bit (or smaller) integers. The
+// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+// Additional parameters or parameters larger than 64 bits may be passed on
+// the stack, in which case the arity is denoted as 7.
+const unsigned ArityToRegMap[8] = {X86::EAX, X86::ECX, X86::EDX, X86::EBX,
+   X86::ESP, X86::EBP, X86::ESI, X86::EDI};
+int Arity = MF.getInfo()->getArgumentStackSize() > 0
+? 7
+: MF.getRegInfo().liveins().size();
+MovReg = ArityToRegMap[Arity];
+  }
+
   EmitAndCountInstruction(MCInstBuilder(X86::MOV32ri)
-  .addReg(X86::EAX)
+  .addReg(MovReg)
   .addImm(MaskKCFIType(Type->getZExtValue(;
 
   if (MAI->hasDotTypeDotSizeDirective()) {
diff --git a/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll 
b/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll
index 1b7bd7835e890c..deababc7fd5379 100644
--- a/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll
+++ b/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll
@@ -3,7 +3,7 @@
 ; CHECK:  .p2align 4
 ; CHECK-LABEL:__cfi_f1:
 ; CHECK-COUNT-11:   nop
-; CHECK-NEXT:   movl $12345678, %eax
+; CHECK-NEXT:   movl $12345678, %ecx
 ; CHECK-LABEL:.Lcfi_func_end0:
 ; CHECK-NEXT: .size   __cfi_f1, .Lcfi_func_end0-__cfi_f1
 ; CHECK-LABEL:f1:
@@ -26,7 +26,7 @@ define void @f2(ptr noundef %x) {
 ; CHECK:  .p2align 4
 ; CHECK-LABEL:__cfi_f3:
 ; CHECK-NOT:nop
-; CHECK-NEXT:   movl $12345678, %eax
+; CHECK-NEXT:   movl $12345678, %ecx
 ; CHECK-COUNT-11:   nop
 ; CHECK-LABEL:f3:
 define void @f3(ptr noundef %x) #0 !kcfi_type !1 {
diff --git a/llvm/test/CodeGen/X86/kcfi.ll b/llvm/test/CodeGen/X86/kcfi.ll
index 059efcc71b0eb8..91d706796b8e99 100644
--- a/llvm/test/CodeGen/X86/kcfi.ll
+++ b/llvm/test/CodeGen/X86/kcfi.ll
@@ -16,7 +16,7 @@
 ; ASM-NEXT:nop
 ; ASM-NEXT:nop
 ; ASM-NEXT:nop
-; ASM-NEXT:movl $12345678, %eax
+; ASM-NEXT:movl $12345678, %ecx
 ; ASM-LABEL: .Lcfi_func_end0:
 ; ASM-NEXT:  .size   __cfi_f1, .Lcfi_func_end0-__cfi_f1
 define void @f1(ptr noundef %x) !kcfi_type !1 {
@@ -90,7 +90,7 @@ define void @f4(ptr noundef %x) #0 {
 
 ;; Ensure we emit Value + 1 for unwanted values (e.g. endbr64 == 4196274163).
 ; ASM-LABEL: __cfi_f5:
-; ASM: movl $4196274164, %eax # imm = 0xFA1E0FF4
+; ASM: movl $4196274164, %ecx # imm = 0xFA1E0FF4
 define void @f5(ptr noundef %x) !kcfi_type !2 {
 ; ASM-LABEL: f5:
 ; ASM: movl $98693132, %r10d # imm = 0x5E1F00C
@@ -100,7 +100,7 @@ define void @f5(ptr noundef %x) !kcfi_type !2 {
 
 ;; Ensure we emit Value + 1 for unwanted values (e.g. -endbr64 == 98693133).
 ; ASM-LABEL: __cfi_f6:
-; ASM: movl $98693134, %eax # imm = 0x5E1F

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-12-24 Thread Scott Constable via cfe-commits


scottconstable wrote:

@phoebewang @sirmc @samitolvanen @Darksonn @lvwr @ojeda @maurer @rcvalle 
@MaskRay

I have created https://github.com/llvm/llvm-project/pull/121070 to implement 
the alternate proposal summarized in this comment above: 
https://github.com/llvm/llvm-project/pull/117121#issuecomment-2518240999.

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)

2024-12-24 Thread Scott Constable via cfe-commits



@@ -181,8 +181,26 @@ void X86AsmPrinter::emitKCFITypeId(const MachineFunction 
&MF) {
   // Embed the type hash in the X86::MOV32ri instruction to avoid special
   // casing object file parsers.
   EmitKCFITypePadding(MF);
+
+  Register MovReg = X86::EAX;
+  const auto &Triple = MF.getTarget().getTargetTriple();
+  if (Triple.isArch64Bit() && Triple.isOSLinux()) {

scottconstable wrote:

Is this check necessary? I notice that all of the x86-based kCFI tests use the 
`x86_64-unknown-linux-gnu` triple. If 64-bit Linux is the only supported x86 
subtarget then I think this check is redundant. But I'm not 100% certain.

https://github.com/llvm/llvm-project/pull/121070
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

2024-12-24 Thread Scott Constable via cfe-commits

scottconstable wrote:

> I haven't received a reply for my concerns I commented at [#117121 
> (comment)](https://github.com/llvm/llvm-project/pull/117121#issuecomment-2502346476)
>  and [#117121 
> (comment)](https://github.com/llvm/llvm-project/pull/117121#issuecomment-2516251353),
>  and they still remain for the alternative proposal.
> 
> I'd recommend working on better defining the problem, and then working on a 
> well-defined solution for the problem that doesn't affect or conflict with a 
> currently well-defined solution to a different problem (i.e., KCFI).

@rcvalle I think that 
https://github.com/llvm/llvm-project/pull/117121#issuecomment-2516251353 isn't 
relevant for the alternative proposal in 
https://github.com/llvm/llvm-project/pull/121070 because that proposal does not 
attempt to prevent cross-arity collisions. Do you agree? Or have I 
mis-understood your concern?

I'm not sure whether 
https://github.com/llvm/llvm-project/pull/117121#issuecomment-2502346476 is 
relevant for the alternative proposal. Please feel free to discuss at 
https://github.com/llvm/llvm-project/pull/121070.

https://github.com/llvm/llvm-project/pull/117121
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)

2024-12-24 Thread Scott Constable via cfe-commits



@@ -254,6 +254,7 @@ FEATURE(is_trivially_constructible, LangOpts.CPlusPlus)
 FEATURE(is_trivially_copyable, LangOpts.CPlusPlus)
 FEATURE(is_union, LangOpts.CPlusPlus)
 FEATURE(kcfi, LangOpts.Sanitize.has(SanitizerKind::KCFI))
+FEATURE(kcfi_x86_arity, LangOpts.Sanitize.has(SanitizerKind::KCFI))

scottconstable wrote:

Peter Zijlstra, one of the Linux FineIBT maintainers, suggested to me that 
`__has_feature()` might be the best interface for Kconfig to discover that 
clang supports this new kCFI arity extension. If this feature were to be added 
to rustc, then I think rustc would need a similar interface to expose the 
feature.

https://github.com/llvm/llvm-project/pull/121070
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)

2025-01-10 Thread Scott Constable via cfe-commits


scottconstable wrote:

> If hashing is changed, consider replacing xxhash64 with xxh3+_64bits

@MaskRay This PR does not change the hashing scheme at all.

https://github.com/llvm/llvm-project/pull/121070
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)

2025-01-10 Thread Scott Constable via cfe-commits



@@ -254,6 +254,7 @@ FEATURE(is_trivially_constructible, LangOpts.CPlusPlus)
 FEATURE(is_trivially_copyable, LangOpts.CPlusPlus)
 FEATURE(is_union, LangOpts.CPlusPlus)
 FEATURE(kcfi, LangOpts.Sanitize.has(SanitizerKind::KCFI))
+FEATURE(kcfi_x86_arity, LangOpts.Sanitize.has(SanitizerKind::KCFI))

scottconstable wrote:

@Darksonn This PR differs from https://github.com/llvm/llvm-project/pull/117121 
in that this PR does not change the hash type. Instead, this PR uses the reg 
field in the `MOVri32` instruction to encode some additional meta data about 
the function's arity.

https://github.com/llvm/llvm-project/pull/121070
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

52 matches

Mail list logo