Fix for Paths with Whitespace
Dear Reviewer, The current doxygen.cfg.in will produce paths that cannot be understood by Doxygen/Graphviz, when any generated paths contain white space. This patch simply adds quotations to all generated paths so that white space cannot be misinterpreted by Doxygen/Graphviz. ~Scott Constable doxygen.cfg.in Description: Binary data ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
@@ -208,10 +209,34 @@ void llvm::setKCFIType(Module &M, Function &F, StringRef MangledType) { std::string Type = MangledType.str(); if (M.getModuleFlag("cfi-normalize-integers")) Type += ".normalized"; + + uint32_t OutHash = static_cast(llvm::xxHash64(Type)); + auto T = Triple(Twine(M.getTargetTriple())); + if (T.isX86() && T.isArch64Bit() && T.isOSLinux()) { +// Estimate the function's arity (i.e., the number of arguments) at the ABI +// level by counting the number of parameters that are likely to be passed +// as registers, such as pointers and 64-bit (or smaller) integers. The +// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs. +// Additional parameters or parameters larger than 64 bits may be passed on +// the stack, in which case the arity is denoted as 7. +size_t NumParams = F.arg_size(); +bool MayHaveStackArgs = NumParams > 6; + +for (unsigned int i = 0; !MayHaveStackArgs && i < NumParams; ++i) { + const llvm::Type *PT = F.getArg(i)->getType(); + if (!(PT->isPointerTy() || PT->getIntegerBitWidth() <= 64)) scottconstable wrote: Hi @phoebewang, KCFI only computes hashes for indirect calls, not direct ones. The example you cited uses `CallBase::getCalledFunction()`, whose documentation reads "Returns the function called, or null if this is an indirect function invocation or the function signature does not match the call signature." https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
scottconstable wrote: This PR is intended to improve security for X86 kernels, as the title suggests. I think that we can and should incorporate feedback from the other participants, several of whom have recommended that this new behavior should be explicitly enabled with a flag, rather than implicitly enabled by examining the target triple. So, for example, I could update the PR to introduce a new flag such as `-fsanitize-cfi-icall-experimental-arity-x86`. If the community would like to adopt a similar approach that is tuned to benefit Arm64 kernels, then that approach could be enabled with a different flag. Flag-based enabling would also help to address the concerns about compiler compatibility. For instance, if an x86 kernel is being built with both C and Rust code, then Kconfig could check whether the local clang and rustc compilers support the KCFI arity enhancement; if one or the other does not support KCFI arity then Kconfig would fall back to the default 32-bit hash. @rcvalle I don't see any evidence that this approach will require a plethora of implementations for different arch/ABI combinations. As far as I am aware, the vast majority of x86-64 Linux kernel code (including eBPF code) uses the standard ABI. Even if some kernel code does not conform to the standard ABI, this should not break compatibility with the arity enhancement because the arity enhancement's implementation derives the arity tag from the function type, not from the ABI. Hence, a valid function call that uses a non-standard ABI would still experience a matching KCFI type ID at both the call site and the call target. The downside is that call sites/targets with a non-standard ABI might be tagged with an inaccurate arity, which could matter if the function type's 29-bit hash collides with another function type's 29-bit hash, and that other function's arity tag is the same but the two functions' actual arities differ. IMO this seems very unlikely. https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
https://github.com/scottconstable edited https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
https://github.com/scottconstable edited https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
scottconstable wrote: > This is not a Rust concern, but re-reading the initial post, it _looks_ like > your own statistics suggest that consuming 3 bits for arity costs more than > it buys you. As stated, (didn't check your math, just going off what you > said) prior to your change, we expect 0.01383765 collisions in your sample > environment. If you had checked my math, you would have noticed that the values in the right-hand column were not exactly correct! I found and fixed a but in my Excel code and updated that chart, but the numbers didn't change by much. > After your change, we expect to have the _sum_ of your right hand column in > collisions, which comes out to 0.0266774 - nearly double the rate of > collisions we have with the basic implementation. In fact, I think that any > scheme like this will always going to increase the number of overall > collisions, given that the arity is implicitly hashed into the representation > already. Your analysis is correct that the total number of expected collisions would increase from 0.01375771 to 0.02660650 (with the updated statistics). > The main reason I could see to consider this is if for some reason a > cross-arity collision is more dangerous than a same-arity collision in terms > of exploitability, which I can't immediately argue, but perhaps you have > something for this that was just assumed in the initial post? I think that a cross-arity collision is clearly more dangerous than a same-arity collision. This is what I had written in the PR description: > > The current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr > > from calling a 3-arity target if the kCFI hashes collide. If this were to > > happen, then potentially malicious stale/dead data in RDX at the call site > > could suddenly become live as the third parameter at the call target. Including the arity in the hash does prevent or even reduce the likelihood that a cross-arity collision will occur, if the hash function is uniform. Suppose that two function types' hashes collide (under the current 32-bit hash scheme). The probability that the arities differ is equal to 1 minus the probability that the arities agree: 1 - ((32/10903)*(31/10902) + (2492/10903)*(2491/10902) + ... + (148/10903)*(147/10902)) = 0.75901 Thus, if there is a hash collision, there is roughly a 76% chance that it will be a cross-arity collision. https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
scottconstable wrote: @phoebewang @lvwr Please review this PR. https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
https://github.com/scottconstable created https://github.com/llvm/llvm-project/pull/117121 Kernel Control Flow Integrity (kCFI) is a feature that hardens indirect calls by comparing a 32-bit hash of the function pointer's type against a hash of the target function's type. If the hashes do not match, the kernel may panic (or log the hash check failure, depending on the kernel's configuration). These hashes are computed at compile time by applying the xxHash64 algorithm to each mangled canonical function (or function pointer) type, then truncating the result to 32 bits. Like any hashing scheme, hash collisions are possible. For example, a commodity Linux kernel configured for Ubuntu 24.04 server has 141,617 total indirect call targets, with 10,903 unique function types. With a 32-bit kCFI hash, the expected number of collisions is 10,903-2^32+2^32*(1-1/(2^32))^10,903 = 0.01383765 (see https://courses.cs.duke.edu/cps102/spring09/Lectures/L-18.pdf for the formula). This number can balloon with the addition of drivers and kernel modules. This patch reduces both the expected number of collisions and the potential impact of a collision by augmenting the hash with an arity value that indicates how many parameters the function has at the ABI level. Specifically, the patch further truncates the kCFI hash down to 29 bits, then concatenates a 3-bit arity indicator as follows: | Arity Indicator | Description | | --- | --- | | 0 | 0 parameters | | 1 | 1 parameter in RDI | | 2 | 2 parameters in RDI and RSI | | 3 | 3 parameters in RDI, RSI, and RDX | | 4 | 4 parameters in RDI, RSI, RDX, and RCX | | 5 | 5 parameters in RDI, RSI, RDX, RCX, and R8 | | 6 | 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 | | 7 | At least one parameter may be passed on the stack | This scheme enhances security in two ways. First, it prevents a j-arity function pointer from being used to call a k-arity function, unless j=k. The current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr from calling a 3-arity target if the kCFI hashes collide. If this were to happen, then potentially malicious stale/dead data in RDX at the call site could suddenly become live as the third parameter at the call target. Second, this scheme reduces the expected number of hash collisions within each arity, compared against the expected number of collisions (0.01383765) for the 32-bit hashing scheme that includes all arities. The table below shows the expected number of collisions for each arity, given the number of unique indirect callable function types within that arity in the same Ubuntu 24.04 server kernel discussed above. | Arity | Unique Indirect Callable Function Types | Number of Expected Collisions | | - | --- | - | | 0 | 32 | 0.0092 | | 1 | 2492 | 0.00578125 | | 2 | 3775 | 0.01326841 | | 3 | 2547 | 0.00603931 | | 4 | 1169 | 0.00127162 | | 5 | 519 | 0.00025038 | | 6 | 221 | 0.4528 | | 7 | 148 | 0.2026 | One additional benefit of this patch is that it can benefit other CFI approaches that build on kCFI, such as FineIBT. For example, this proposed enhancement to FineIBT must be able to infer (at kernel init time) which registers are live at an indirect call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are available in the kCFI type ID, then this information is trivial to infer. >From 5b144089b7cde125e93a6e147009cbab0983d71a Mon Sep 17 00:00:00 2001 From: Scott D Constable Date: Tue, 19 Nov 2024 15:51:05 -0800 Subject: [PATCH] Enhance KCFI type IDs with a 3-bit arity indicator. --- clang/lib/CodeGen/CodeGenModule.cpp | 31 ++--- clang/test/CodeGen/kcfi-normalize.c | 18 +++-- clang/test/CodeGen/kcfi.c | 22 +--- 3 files changed, 59 insertions(+), 12 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index b854eeb62a80ce..7cc6f120ec39a9 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -2183,7 +2183,8 @@ llvm::ConstantInt *CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) { } llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) { - if (auto *FnType = T->getAs()) + auto *FnType = T->getAs(); + if (FnType) T = getContext().getFunctionType( FnType->getReturnType(), FnType->getParamTypes(), FnType->getExtProtoInfo().withExceptionSpec(EST_None)); @@ -2196,8 +2197,32 @@ llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) { if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers) Out << ".normalized"; - return llvm::ConstantInt::get(Int32Ty, - static_cast(llvm::xxHash64(OutName))); + uint32_t OutHash = static_cast(llvm::xxHash64(OutName)); + const auto &Triple = getTarget().getTriple(); + if (Triple.isX86() && Triple.isArch64Bit() && Triple
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
scottconstable wrote: > Sorry, what I meant was we should be looking only at the number of return > (with void being zero, and everything else being one) and parameters from the > function declaration/definition and shouldn't be looking at the parameters' > type layout (i.e., size), and how and what registers are used to pass them > (i.e., we shouldn't mention or reference registers or assembly code at all in > the implementation and documentation). The implementation should be > completely ABI and architecture agnostic. Does it make sense? I think that the aspiration to be "completely ABI and architecture agnostic" is irreconcilable with the security objective to prevent a register that is dead at a call site from becoming live at a hash-collided call target. Your point about returns is interesting. AFAIK most x86 Linux kernels (including Ubuntu kernels) are being compiled with `-fzero-call-used-regs=used-gpr`, which in most cases should prevent a dead return register (`RAX`) from becoming live at a call site when the call site expects a return value. I can think of one corner case, but I'm not sure how much of a concern it would be in practice. Suppose that `int (*)(int)` collides with `void (*)(int)` and this allows a function `f` of the former type to call a function `g` of the latter type. Suppose further that `RAX` dies when `f` calls `g` and `g` does not touch `RAX` at all, which means that `-fzero-call-used-regs=used-gpr` will have no effect on `RAX` in `g`. Then when `g` returns to `f`, the stale value that `f` had left in `RAX` will become live and might be used for something malicious. https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
scottconstable wrote: > > Flag guarding this feature seems like it would also be good for any > > existing C users - for example, if trying to build a kernel module intended > > to load against a kernel image built with an older `clang`, you need to > > select the same type ID projection that the kernel did. > > +1 to this. Adding support for this to the Rust compiler shouldn't be a > problem and @maurer or I could take a look at it. However, I wonder if the > arity information should be from the high level information (i.e., from the > function declarations/definitions) instead of from the lower level calling > convention used/code generated (and possibly also affected by/after > optimizations). This would ensure compatibility with any calling convention > and any other possible differences that may come up later when using > cross-language KCFI. (The KCFI encoding is based on high level type > information from the function declarations/definitions.) The implementation in this PR does use high-level type information (but now that I look back at what I had written, I think my original PR description was a bit mis-leading). Here is an extended version of the arity table, with an additional column to show what the implementation is intended to do: | Arity Indicator | Description | Implementation | | --- | --- | --- | | 0 | 0 parameters | 0 parameters | | 1 | 1 parameter in RDI | 1 address-width (or smaller) parameter and no other parameters | | 2 | 2 parameters in RDI and RSI | 2 address-width (or smaller) parameters and no other parameters | | 3 | 3 parameters in RDI, RSI, and RDX | 3 address-width (or smaller) parameters and no other parameters | | 4 | 4 parameters in RDI, RSI, RDX, and RCX | 4 address-width (or smaller) parameters and no other parameters | | 5 | 5 parameters in RDI, RSI, RDX, RCX, and R8 | 5 address-width (or smaller) parameters and no other parameters | | 6 | 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 | 6 address-width (or smaller) parameters and no other parameters | | 7 | At least one parameter may be passed on the stack | The function type does not qualify as arity 0-6 | Hence, this implementation uses high-level type information from clang (`CodeGenModule.cpp`) or LLVM (`ModuleUtils.cpp`) to infer a better approximation of the default x86-64 calling convention. For example, if the implementation were to instead use the number of parameters as a proxy for the arity, then this could permit a scenario where a register that is dead at the call site becomes live at the call target, e.g.: ```C // test.c struct S { int *p1; int *p2; }; int foo(struct S s) { // 1 parameter return *s.p1 + *s.p2; } ``` But then the struct gets decomposed into two separate registers by the compiler: ``` clang -O1 test.c -S -o test.S; cat test.S .type foo,@function foo:# @foo .cfi_startproc # %bb.0:# %entry movl(%rsi), %eax addl(%rdi), %eax retq ``` Thus, if an arity-1 function type's hash collides with `foo`'s hash, then a dead `RSI` register at an arity-1 caller could become live at the target `foo`. https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
scottconstable wrote: > > > > @phoebewang and @lvwr I also noticed that there is this code in LLVM: > > > > https://github.com/llvm/llvm-project/blob/9ba6672b9f0e82a1f6d4100dc832c84447ea545c/llvm/lib/Transforms/Utils/ModuleUtils.cpp#L202-L214 > > > > > > > > . As far as I can tell, this code is not triggered when I build the > > > > Linux kernel with `-fsanitize=kcfi`. > > > > When is this code triggered? And do you think it is necessary to > > > > additionally implement the arity-enhancement to this code? > > > > > > > > > I'm not familar with KCFI. I find it's added by @samitolvanen in > > > [e1c36bd](https://github.com/llvm/llvm-project/commit/e1c36bde0551977d4b2efae032af6dfc4b2b3936). > > > I think you should triger it with attached test case. > > > > > > It looks to me like this code might be triggered in some LTO > > configurations, and/or when linking code compiled from multiple source > > languages with the expectation that the KCFI type IDs will be compatible. > > Is my understanding correct? > > Looks like the latter, see > [71c7313](https://github.com/llvm/llvm-project/commit/71c7313f42d2b6063fea09854cf4fc46fd0627e1) Actually, I think this code was introduced to address a compatibility issue with KASAN, which apparently must generate KCFI-enabled code without clang. I found this explanation at https://github.com/llvm/llvm-project/commit/3b14862f0a968dc079530acbce4f2ca4aa7c1492 and https://github.com/ClangBuiltLinux/linux/issues/1742. Regardless, it looks like `llvm::setKCFIType` is intended to always produce the same KCFI type ID as `CodeGenModule::CreateKCFITypeId` for equivalent function types. For this PR, this implies that `llvm::setKCFIType` and `CodeGenModule::CreateKCFITypeId` must always infer the same arity for the same function type. https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
@@ -208,10 +209,34 @@ void llvm::setKCFIType(Module &M, Function &F, StringRef MangledType) { std::string Type = MangledType.str(); if (M.getModuleFlag("cfi-normalize-integers")) Type += ".normalized"; + + uint32_t OutHash = static_cast(llvm::xxHash64(Type)); + auto T = Triple(Twine(M.getTargetTriple())); + if (T.isX86() && T.isArch64Bit() && T.isOSLinux()) { +// Estimate the function's arity (i.e., the number of arguments) at the ABI +// level by counting the number of parameters that are likely to be passed +// as registers, such as pointers and 64-bit (or smaller) integers. The +// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs. +// Additional parameters or parameters larger than 64 bits may be passed on +// the stack, in which case the arity is denoted as 7. +size_t NumParams = F.arg_size(); +bool MayHaveStackArgs = NumParams > 6; + +for (unsigned int i = 0; !MayHaveStackArgs && i < NumParams; ++i) { + const llvm::Type *PT = F.getArg(i)->getType(); + if (!(PT->isPointerTy() || PT->getIntegerBitWidth() <= 64)) scottconstable wrote: It appears that clang does not reserve stack for large arguments and instead this is done later by the LLVM X86 backend. For example: ```C struct S { int *p1; int *p2; int array[8]; }; int foo(struct S s, struct S *sp) { return *s.p1 + *s.p2 + *sp->p1 + *sp->p2; } ``` Then when I compile to LLVM IR I see: ``` define dso_local i32 @foo(ptr noundef byval(%struct.S) align 8 %s, ptr noundef %sp) #0 { ``` Which suggests an arity of 2. But the X86 backend transforms `foo` to pass `s` on the stack, and then `sp` becomes the sole argument and is passed in `rdi`. Hence, by the chart in the PR description, this should be treated as an arity-7 function: | Arity Indicator | Description | | --- | --- | | 0 | 0 parameters | | 1 | 1 parameter in RDI | | 2 | 2 parameters in RDI and RSI | | 3 | 3 parameters in RDI, RSI, and RDX | | 4 | 4 parameters in RDI, RSI, RDX, and RCX | | 5 | 5 parameters in RDI, RSI, RDX, RCX, and R8 | | 6 | 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 | | 7 | At least one parameter may be passed on the stack | This predicate: ``` // typeof(*PT) = llvm::Type if (!(PT->isPointerTy() || PT->getIntegerBitWidth() <= 64)) MayHaveStackArgs = true; ``` should prevent `s` from being counted as a register argument and correctly set the arity field to 7. https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
https://github.com/scottconstable updated https://github.com/llvm/llvm-project/pull/117121 >From b787d6d7a9c0904c5e47e32556103b8a5220a7d1 Mon Sep 17 00:00:00 2001 From: Scott D Constable Date: Tue, 19 Nov 2024 15:51:05 -0800 Subject: [PATCH] Enhance KCFI type IDs with a 3-bit arity indicator. --- clang/lib/CodeGen/CodeGenModule.cpp | 31 +-- clang/test/CodeGen/kcfi-normalize.c | 18 +++ clang/test/CodeGen/kcfi.c | 22 +++-- llvm/lib/Transforms/Utils/ModuleUtils.cpp | 31 +-- .../GCOVProfiling/kcfi-normalize.ll | 7 +++-- llvm/test/Transforms/GCOVProfiling/kcfi.ll| 7 +++-- 6 files changed, 95 insertions(+), 21 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index b854eeb62a80ce..f9c01edf4f0953 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -2183,7 +2183,8 @@ llvm::ConstantInt *CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) { } llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) { - if (auto *FnType = T->getAs()) + auto *FnType = T->getAs(); + if (FnType) T = getContext().getFunctionType( FnType->getReturnType(), FnType->getParamTypes(), FnType->getExtProtoInfo().withExceptionSpec(EST_None)); @@ -2196,8 +2197,32 @@ llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) { if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers) Out << ".normalized"; - return llvm::ConstantInt::get(Int32Ty, - static_cast(llvm::xxHash64(OutName))); + uint32_t OutHash = static_cast(llvm::xxHash64(OutName)); + const auto &Triple = getTarget().getTriple(); + if (FnType && Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux()) { +// Estimate the function's arity (i.e., the number of arguments) at the ABI +// level by counting the number of parameters that are likely to be passed +// as registers, such as pointers and 64-bit (or smaller) integers. The +// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs. +// Additional parameters or parameters larger than 64 bits may be passed on +// the stack, in which case the arity is denoted as 7. +bool MayHaveStackArgs = FnType->getNumParams() > 6; + +for (unsigned int i = 0; !MayHaveStackArgs && i < FnType->getNumParams(); + ++i) { + const Type *PT = FnType->getParamType(i).getTypePtr(); + if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() && +getContext().getTypeSize(PT) <= 64))) +MayHaveStackArgs = true; +} + +// The 3-bit arity is concatenated with the lower 29 bits of the KCFI hash +// to form an enhanced KCFI type ID. This can prevent, for example, a +// 3-arity function's ID from ever colliding with a 2-arity function's ID. +OutHash = (OutHash << 3) | (MayHaveStackArgs ? 7 : FnType->getNumParams()); + } + + return llvm::ConstantInt::get(Int32Ty, OutHash); } void CodeGenModule::SetLLVMFunctionAttributes(GlobalDecl GD, diff --git a/clang/test/CodeGen/kcfi-normalize.c b/clang/test/CodeGen/kcfi-normalize.c index b9150e88f6ab5f..8b7445fc85e490 100644 --- a/clang/test/CodeGen/kcfi-normalize.c +++ b/clang/test/CodeGen/kcfi-normalize.c @@ -10,25 +10,31 @@ void foo(void (*fn)(int), int arg) { // CHECK-LABEL: define{{.*}}foo // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE1:[0-9]+]] -// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 1162514891) ] +// KCFI ID = 0x2A548E59 +// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 710184537) ] fn(arg); } void bar(void (*fn)(int, int), int arg1, int arg2) { // CHECK-LABEL: define{{.*}}bar // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE2:[0-9]+]] -// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 448046469) ] +// KCFI ID = 0xD5A52C2A +// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 -710595542) ] fn(arg1, arg2); } void baz(void (*fn)(int, int, int), int arg1, int arg2, int arg3) { // CHECK-LABEL: define{{.*}}baz // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE3:[0-9]+]] -// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 -2049681433) ] +// KCFI ID = 0x2EA2BF3B +// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 782417723) ] fn(arg1, arg2, arg3); } // CHECK: ![[#]] = !{i32 4, !"cfi-normalize-integers", i32 1} -// CHECK: ![[TYPE1]] = !{i32 -1143117868} -// CHECK: ![[TYPE2]] = !{i32 -460921415} -// CHECK: ![[TYPE3]] = !{i32 -333839615} +// KCFI ID = DEEB3EA2 +// CHECK: ![[TYPE1]] = !{i32 -555008350} +// KCFI ID = 24372DCB +// CHECK: ![[TYPE2]] = !{i32 607595979} +// KCFI ID = 0x60D0180C +// CHECK: ![[TYPE3]] = !{i32 1624250380} diff --git a/clang/test/CodeGen/kcfi.c b/clang/test/CodeGen
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
@@ -208,10 +209,34 @@ void llvm::setKCFIType(Module &M, Function &F, StringRef MangledType) { std::string Type = MangledType.str(); if (M.getModuleFlag("cfi-normalize-integers")) Type += ".normalized"; + + uint32_t OutHash = static_cast(llvm::xxHash64(Type)); + auto T = Triple(Twine(M.getTargetTriple())); scottconstable wrote: Thank you for the suggestion! This does look tidier than what I had written. I updated the PR. https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
@@ -208,10 +209,34 @@ void llvm::setKCFIType(Module &M, Function &F, StringRef MangledType) { std::string Type = MangledType.str(); if (M.getModuleFlag("cfi-normalize-integers")) Type += ".normalized"; + + uint32_t OutHash = static_cast(llvm::xxHash64(Type)); + auto T = Triple(Twine(M.getTargetTriple())); + if (T.isX86() && T.isArch64Bit() && T.isOSLinux()) { +// Estimate the function's arity (i.e., the number of arguments) at the ABI +// level by counting the number of parameters that are likely to be passed +// as registers, such as pointers and 64-bit (or smaller) integers. The +// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs. +// Additional parameters or parameters larger than 64 bits may be passed on +// the stack, in which case the arity is denoted as 7. +size_t NumParams = F.arg_size(); +bool MayHaveStackArgs = NumParams > 6; + +for (unsigned int i = 0; !MayHaveStackArgs && i < NumParams; ++i) { + const llvm::Type *PT = F.getArg(i)->getType(); + if (!(PT->isPointerTy() || PT->getIntegerBitWidth() <= 64)) scottconstable wrote: Thank you for the suggestion. I looked at `llvm::Argument::hasPassPointeeByValueCopyAttr()`, but it looks like it is only available where a function is being defined. It does not appear to be available where a call is made through a function pointer. Therefore, I'm not sure that `llvm::Argument::hasPassPointeeByValueCopyAttr()` will be helpful since KCFI requires the ID to be computed identically at both the call site and the call target. Or, do you think I am overlooking something, and that there is a way to use `llvm::Argument::hasPassPointeeByValueCopyAttr()` or something similar at an indirect call site? As far as I can tell, the only information that is available at an indirect call site is the function pointer type, which does contain the number of arguments and their types, but does not appear to contain an indication as to whether an argument may be passed on the stack. https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
scottconstable wrote: > Although the default calling convention uses 6 registers, others like RegCall > uses more. Do you want to check calling convention as well? AFAIK the use case for KCFI is very narrow: the x86-64 Linux kernel. And I don't believe that the kernel uses (or even allows?) any calling convention other than the default. The kernel documentation also says that the eBPF calling convention "maps directly to ABIs used by the kernel on 64-bit architectures." But I admit I am not an expert on the Linux ABI nor am I an expert on the full scope of KCFI use cases. Maybe @lvwr can weigh in? https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
scottconstable wrote: > I think Scott's second point may be the relevant one - this may be similar > strength or slightly weaker, but having an indicator stating which registers > are live is potentially needed to allow FineIBT to poison them during > speculative execution. > > Given how much padding is used in the X86 kernel around function headers, > have you considered just stealing another byte to encode the data for your > arity stuff, and considering it to be a separate mitigation from KCFI tags? Yes, and this is what our first prototype does. There is a brief writeup here: https://lkml.org/lkml/2024/9/27/982. When I started to refine the prototype into a PR, I noticed that stealing another byte in the padding region causes problems for either LLVM or for Linux. This is because LLVM emits function headers like this: ``` NOP NOP ... MOV $01234567, %eax : ``` But when the Linux image is loaded, it moves the type information to the *beginning* of the padding, to allow the pre-function padding to be used for hot patching: ``` MOV $01234567, %eax ... NOP NOP : ``` Thus, when new metadata is added to the padding region, it introduces new challenges for either LLVM or for Linux: - If the metadata is added *before* the hash, then Linux needs to know that it might need to look for the hash in a different location, i.e., in a different offset relative to the start of the function. - If the metadata is added *after* the hash, then LLVM's KCFI code generator needs to correctly adjust the offsets for the call site KCFI checks, depending on whether the new metadata is being generated. The discussions on LKML also led us to simplify the proposal, which originally had called for an 8-bit field to denote which registers are pointers, so that only pointers would be poisoned after a misprediction. The new proposal poisons all live registers and therefore requires a 3-bit field to record the arity (defined as the number of live registers). It occurred to me that arity might also be useful to harden the hash checks for KCFI, but that may require a consensus among KCFI stakeholders. One other alternative could be to use the 3-bit reg field in the MOV to encode the arity. For example: ``` MOV $01234567, %eax# arity 0 MOV $01234567, %ecx# arity 1 MOV $01234567, %edx# arity 2 MOV $01234567, %ebx# arity 3 MOV $01234567, %esp# arity 4 MOV $01234567, %ebp# arity 5 MOV $01234567, %esi# arity 6 MOV $01234567, %edi# arity 7+ ``` I believe that this alternative implementation would also meet the requirements for the FineIBT register poisoning enhancement. > The rest of the CFI type ID scheme is arch-independent, but in order to know > which _registers_ are in use, you need arch dependent information, because > you care about the calling convention, packing rules, etc. This is part of > why Ramon thought your design choices were odd above - this isn't really a > CFI enhancement or modification, this is another piece of information you > need for speculation defenses that occur at a different abstraction level. I agree with your interpretation. And thank you and @rcvalle again for the ongoing feedback and discussion. https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
scottconstable wrote: > If you are changing the hash scheme, please migrate from xxHash64 to xxh3. We > want to remove the legacy xxHash64 from LLVM. It seems from the discussion above that this change would require a new flag. Suppose, for instance, that a user builds a kernel with a version of Clang that uses xxh3, and a version of rustc that still uses xxHash64. Then the hashes could be incompatible and the kernel will crash. This PR is specific to X86-64 targets. Your suggestion seems like a general enhancement that should apply to all targets; I think it should be a separate PR. https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
scottconstable wrote: @rcvalle @maurer Do you have any feedback on the alternative proposal, https://github.com/llvm/llvm-project/pull/117121#issuecomment-2518240999? https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)
@@ -181,8 +181,26 @@ void X86AsmPrinter::emitKCFITypeId(const MachineFunction &MF) { // Embed the type hash in the X86::MOV32ri instruction to avoid special // casing object file parsers. EmitKCFITypePadding(MF); + + Register MovReg = X86::EAX; + const auto &Triple = MF.getTarget().getTargetTriple(); + if (Triple.isArch64Bit() && Triple.isOSLinux()) { +// Determine the function's arity (i.e., the number of arguments) at the ABI +// level by counting the number of parameters that are passed +// as registers, such as pointers and 64-bit (or smaller) integers. The +// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs. +// Additional parameters or parameters larger than 64 bits may be passed on +// the stack, in which case the arity is denoted as 7. +const unsigned ArityToRegMap[8] = {X86::EAX, X86::ECX, X86::EDX, X86::EBX, + X86::ESP, X86::EBP, X86::ESI, X86::EDI}; scottconstable wrote: Hi @phoebewang, based on my understanding of kCFI, the `MOVri` instruction emitted by the compiler in each function header will never be executed. The purpose of this instruction is simply to insert the function's kCFI type ID into the header so that it can be verified at the call site before making the call. For example, this is how a kCFI-enforced indirect call looks when the kernel is configured with kCFI: ``` 0010 :# foo loads the target address into rax ... 25: mov$0x70e5d6ba,%r10d # 0x70e5d6ba is the kCFI type ID that must be matched at the target 2b: add-0x4(%rax),%r10d # -0x4(%rax) refers to the immediate operand in the `MOVri` instruction, which holds the kCFI type ID 2f: je 33 # If the two type IDs match, then jump to the call; otherwise #UD 31: ud2 33: call *%rax # Note that this will jump directly to the target function, and will not execute the `MOVri` ``` Does this address your concern? https://github.com/llvm/llvm-project/pull/121070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)
scottconstable wrote: > And you cannot use ESI when it's used to pass arguments. @phoebewang I think that my https://github.com/llvm/llvm-project/pull/121070#discussion_r1898074522 also applies to this concern. https://github.com/llvm/llvm-project/pull/121070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)
https://github.com/scottconstable updated https://github.com/llvm/llvm-project/pull/121070 >From 73b6f6332cf699b7fa1bcf7cae90ef828cfa9158 Mon Sep 17 00:00:00 2001 From: Scott D Constable Date: Mon, 23 Dec 2024 13:48:48 -0800 Subject: [PATCH] Implement a new kcfi_x86_arity feature that encodes an indirect call target's arity (i.e., the number of live-in registers) in the function's __cfi header. --- clang/include/clang/Basic/Features.def| 1 + llvm/lib/Target/X86/X86AsmPrinter.cpp | 19 +- .../X86/kcfi-patchable-function-prefix.ll | 4 +- llvm/test/CodeGen/X86/kcfi.ll | 63 ++- 4 files changed, 81 insertions(+), 6 deletions(-) diff --git a/clang/include/clang/Basic/Features.def b/clang/include/clang/Basic/Features.def index c82b6d9b5f6c10..dca8f4dc0fbf76 100644 --- a/clang/include/clang/Basic/Features.def +++ b/clang/include/clang/Basic/Features.def @@ -254,6 +254,7 @@ FEATURE(is_trivially_constructible, LangOpts.CPlusPlus) FEATURE(is_trivially_copyable, LangOpts.CPlusPlus) FEATURE(is_union, LangOpts.CPlusPlus) FEATURE(kcfi, LangOpts.Sanitize.has(SanitizerKind::KCFI)) +FEATURE(kcfi_x86_arity, LangOpts.Sanitize.has(SanitizerKind::KCFI)) FEATURE(modules, LangOpts.Modules) FEATURE(safe_stack, LangOpts.Sanitize.has(SanitizerKind::SafeStack)) FEATURE(shadow_call_stack, diff --git a/llvm/lib/Target/X86/X86AsmPrinter.cpp b/llvm/lib/Target/X86/X86AsmPrinter.cpp index f01e47b41cf5e4..fe2ce305dcf6e5 100644 --- a/llvm/lib/Target/X86/X86AsmPrinter.cpp +++ b/llvm/lib/Target/X86/X86AsmPrinter.cpp @@ -181,8 +181,25 @@ void X86AsmPrinter::emitKCFITypeId(const MachineFunction &MF) { // Embed the type hash in the X86::MOV32ri instruction to avoid special // casing object file parsers. EmitKCFITypePadding(MF); + + // The ArityToRegMap assumes the 64-bit Linux kernel ABI, so verify the target + const auto &Triple = MF.getTarget().getTargetTriple(); + assert(Triple.isArch64Bit() && Triple.isOSLinux()); + + // Determine the function's arity (i.e., the number of arguments) at the ABI + // level by counting the number of parameters that are passed + // as registers, such as pointers and 64-bit (or smaller) integers. The + // Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs. + // Additional parameters or parameters larger than 64 bits may be passed on + // the stack, in which case the arity is denoted as 7. + const unsigned ArityToRegMap[8] = {X86::EAX, X86::ECX, X86::EDX, X86::EBX, + X86::ESP, X86::EBP, X86::ESI, X86::EDI}; + int Arity = MF.getInfo()->getArgumentStackSize() > 0 + ? 7 + : MF.getRegInfo().liveins().size(); + EmitAndCountInstruction(MCInstBuilder(X86::MOV32ri) - .addReg(X86::EAX) + .addReg(ArityToRegMap[Arity]) .addImm(MaskKCFIType(Type->getZExtValue(; if (MAI->hasDotTypeDotSizeDirective()) { diff --git a/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll b/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll index 1b7bd7835e890c..deababc7fd5379 100644 --- a/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll +++ b/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll @@ -3,7 +3,7 @@ ; CHECK: .p2align 4 ; CHECK-LABEL:__cfi_f1: ; CHECK-COUNT-11: nop -; CHECK-NEXT: movl $12345678, %eax +; CHECK-NEXT: movl $12345678, %ecx ; CHECK-LABEL:.Lcfi_func_end0: ; CHECK-NEXT: .size __cfi_f1, .Lcfi_func_end0-__cfi_f1 ; CHECK-LABEL:f1: @@ -26,7 +26,7 @@ define void @f2(ptr noundef %x) { ; CHECK: .p2align 4 ; CHECK-LABEL:__cfi_f3: ; CHECK-NOT:nop -; CHECK-NEXT: movl $12345678, %eax +; CHECK-NEXT: movl $12345678, %ecx ; CHECK-COUNT-11: nop ; CHECK-LABEL:f3: define void @f3(ptr noundef %x) #0 !kcfi_type !1 { diff --git a/llvm/test/CodeGen/X86/kcfi.ll b/llvm/test/CodeGen/X86/kcfi.ll index 059efcc71b0eb8..91d706796b8e99 100644 --- a/llvm/test/CodeGen/X86/kcfi.ll +++ b/llvm/test/CodeGen/X86/kcfi.ll @@ -16,7 +16,7 @@ ; ASM-NEXT:nop ; ASM-NEXT:nop ; ASM-NEXT:nop -; ASM-NEXT:movl $12345678, %eax +; ASM-NEXT:movl $12345678, %ecx ; ASM-LABEL: .Lcfi_func_end0: ; ASM-NEXT: .size __cfi_f1, .Lcfi_func_end0-__cfi_f1 define void @f1(ptr noundef %x) !kcfi_type !1 { @@ -90,7 +90,7 @@ define void @f4(ptr noundef %x) #0 { ;; Ensure we emit Value + 1 for unwanted values (e.g. endbr64 == 4196274163). ; ASM-LABEL: __cfi_f5: -; ASM: movl $4196274164, %eax # imm = 0xFA1E0FF4 +; ASM: movl $4196274164, %ecx # imm = 0xFA1E0FF4 define void @f5(ptr noundef %x) !kcfi_type !2 { ; ASM-LABEL: f5: ; ASM: movl $98693132, %r10d # imm = 0x5E1F00C @@ -100,7 +100,7 @@ define void @f5(ptr noundef %x) !kcfi_type !2 { ;; Ensure we emit Value + 1 for unwanted values (e.g. -endbr64 == 98693133). ; ASM-LABEL: __cfi_f6: -; ASM: movl $98693134, %eax # imm =
[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)
@@ -181,8 +181,26 @@ void X86AsmPrinter::emitKCFITypeId(const MachineFunction &MF) { // Embed the type hash in the X86::MOV32ri instruction to avoid special // casing object file parsers. EmitKCFITypePadding(MF); + + Register MovReg = X86::EAX; + const auto &Triple = MF.getTarget().getTargetTriple(); + if (Triple.isArch64Bit() && Triple.isOSLinux()) { scottconstable wrote: @phoebewang Good suggestion! I updated the PR to use an assert. https://github.com/llvm/llvm-project/pull/121070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)
scottconstable wrote: @lvwr @maurer @rcvalle A gentle reminder to please review this PR. https://github.com/llvm/llvm-project/pull/121070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
https://github.com/scottconstable updated https://github.com/llvm/llvm-project/pull/117121 >From b787d6d7a9c0904c5e47e32556103b8a5220a7d1 Mon Sep 17 00:00:00 2001 From: Scott D Constable Date: Tue, 19 Nov 2024 15:51:05 -0800 Subject: [PATCH] Enhance KCFI type IDs with a 3-bit arity indicator. --- clang/lib/CodeGen/CodeGenModule.cpp | 31 +-- clang/test/CodeGen/kcfi-normalize.c | 18 +++ clang/test/CodeGen/kcfi.c | 22 +++-- llvm/lib/Transforms/Utils/ModuleUtils.cpp | 31 +-- .../GCOVProfiling/kcfi-normalize.ll | 7 +++-- llvm/test/Transforms/GCOVProfiling/kcfi.ll| 7 +++-- 6 files changed, 95 insertions(+), 21 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index b854eeb62a80ce..f9c01edf4f0953 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -2183,7 +2183,8 @@ llvm::ConstantInt *CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) { } llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) { - if (auto *FnType = T->getAs()) + auto *FnType = T->getAs(); + if (FnType) T = getContext().getFunctionType( FnType->getReturnType(), FnType->getParamTypes(), FnType->getExtProtoInfo().withExceptionSpec(EST_None)); @@ -2196,8 +2197,32 @@ llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) { if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers) Out << ".normalized"; - return llvm::ConstantInt::get(Int32Ty, - static_cast(llvm::xxHash64(OutName))); + uint32_t OutHash = static_cast(llvm::xxHash64(OutName)); + const auto &Triple = getTarget().getTriple(); + if (FnType && Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux()) { +// Estimate the function's arity (i.e., the number of arguments) at the ABI +// level by counting the number of parameters that are likely to be passed +// as registers, such as pointers and 64-bit (or smaller) integers. The +// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs. +// Additional parameters or parameters larger than 64 bits may be passed on +// the stack, in which case the arity is denoted as 7. +bool MayHaveStackArgs = FnType->getNumParams() > 6; + +for (unsigned int i = 0; !MayHaveStackArgs && i < FnType->getNumParams(); + ++i) { + const Type *PT = FnType->getParamType(i).getTypePtr(); + if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() && +getContext().getTypeSize(PT) <= 64))) +MayHaveStackArgs = true; +} + +// The 3-bit arity is concatenated with the lower 29 bits of the KCFI hash +// to form an enhanced KCFI type ID. This can prevent, for example, a +// 3-arity function's ID from ever colliding with a 2-arity function's ID. +OutHash = (OutHash << 3) | (MayHaveStackArgs ? 7 : FnType->getNumParams()); + } + + return llvm::ConstantInt::get(Int32Ty, OutHash); } void CodeGenModule::SetLLVMFunctionAttributes(GlobalDecl GD, diff --git a/clang/test/CodeGen/kcfi-normalize.c b/clang/test/CodeGen/kcfi-normalize.c index b9150e88f6ab5f..8b7445fc85e490 100644 --- a/clang/test/CodeGen/kcfi-normalize.c +++ b/clang/test/CodeGen/kcfi-normalize.c @@ -10,25 +10,31 @@ void foo(void (*fn)(int), int arg) { // CHECK-LABEL: define{{.*}}foo // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE1:[0-9]+]] -// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 1162514891) ] +// KCFI ID = 0x2A548E59 +// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 710184537) ] fn(arg); } void bar(void (*fn)(int, int), int arg1, int arg2) { // CHECK-LABEL: define{{.*}}bar // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE2:[0-9]+]] -// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 448046469) ] +// KCFI ID = 0xD5A52C2A +// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 -710595542) ] fn(arg1, arg2); } void baz(void (*fn)(int, int, int), int arg1, int arg2, int arg3) { // CHECK-LABEL: define{{.*}}baz // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE3:[0-9]+]] -// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 -2049681433) ] +// KCFI ID = 0x2EA2BF3B +// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 782417723) ] fn(arg1, arg2, arg3); } // CHECK: ![[#]] = !{i32 4, !"cfi-normalize-integers", i32 1} -// CHECK: ![[TYPE1]] = !{i32 -1143117868} -// CHECK: ![[TYPE2]] = !{i32 -460921415} -// CHECK: ![[TYPE3]] = !{i32 -333839615} +// KCFI ID = DEEB3EA2 +// CHECK: ![[TYPE1]] = !{i32 -555008350} +// KCFI ID = 24372DCB +// CHECK: ![[TYPE2]] = !{i32 607595979} +// KCFI ID = 0x60D0180C +// CHECK: ![[TYPE3]] = !{i32 1624250380} diff --git a/clang/test/CodeGen/kcfi.c b/clang/test/CodeGen
[clang] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
https://github.com/scottconstable edited https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
https://github.com/scottconstable edited https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
scottconstable wrote: @phoebewang and @lvwr I also noticed that there is this code in LLVM: https://github.com/llvm/llvm-project/blob/9ba6672b9f0e82a1f6d4100dc832c84447ea545c/llvm/lib/Transforms/Utils/ModuleUtils.cpp#L202-L214. As far as I can tell, this code is not triggered when I build the Linux kernel with `-fsanitize=kcfi`. When is this code triggered? And do you think it is necessary to additionally implement the arity-enhancement to this code? https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
https://github.com/scottconstable updated https://github.com/llvm/llvm-project/pull/117121 >From ccb50df487043cde9414943f08988bc9e6ee Mon Sep 17 00:00:00 2001 From: Scott D Constable Date: Tue, 19 Nov 2024 15:51:05 -0800 Subject: [PATCH] Enhance KCFI type IDs with a 3-bit arity indicator. --- clang/lib/CodeGen/CodeGenModule.cpp | 31 +-- clang/test/CodeGen/kcfi-normalize.c | 18 +++ clang/test/CodeGen/kcfi.c | 22 +++-- llvm/lib/Transforms/Utils/ModuleUtils.cpp | 29 +++-- .../GCOVProfiling/kcfi-normalize.ll | 7 +++-- llvm/test/Transforms/GCOVProfiling/kcfi.ll| 7 +++-- 6 files changed, 94 insertions(+), 20 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index b854eeb62a80ce..f9c01edf4f0953 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -2183,7 +2183,8 @@ llvm::ConstantInt *CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) { } llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) { - if (auto *FnType = T->getAs()) + auto *FnType = T->getAs(); + if (FnType) T = getContext().getFunctionType( FnType->getReturnType(), FnType->getParamTypes(), FnType->getExtProtoInfo().withExceptionSpec(EST_None)); @@ -2196,8 +2197,32 @@ llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) { if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers) Out << ".normalized"; - return llvm::ConstantInt::get(Int32Ty, - static_cast(llvm::xxHash64(OutName))); + uint32_t OutHash = static_cast(llvm::xxHash64(OutName)); + const auto &Triple = getTarget().getTriple(); + if (FnType && Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux()) { +// Estimate the function's arity (i.e., the number of arguments) at the ABI +// level by counting the number of parameters that are likely to be passed +// as registers, such as pointers and 64-bit (or smaller) integers. The +// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs. +// Additional parameters or parameters larger than 64 bits may be passed on +// the stack, in which case the arity is denoted as 7. +bool MayHaveStackArgs = FnType->getNumParams() > 6; + +for (unsigned int i = 0; !MayHaveStackArgs && i < FnType->getNumParams(); + ++i) { + const Type *PT = FnType->getParamType(i).getTypePtr(); + if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() && +getContext().getTypeSize(PT) <= 64))) +MayHaveStackArgs = true; +} + +// The 3-bit arity is concatenated with the lower 29 bits of the KCFI hash +// to form an enhanced KCFI type ID. This can prevent, for example, a +// 3-arity function's ID from ever colliding with a 2-arity function's ID. +OutHash = (OutHash << 3) | (MayHaveStackArgs ? 7 : FnType->getNumParams()); + } + + return llvm::ConstantInt::get(Int32Ty, OutHash); } void CodeGenModule::SetLLVMFunctionAttributes(GlobalDecl GD, diff --git a/clang/test/CodeGen/kcfi-normalize.c b/clang/test/CodeGen/kcfi-normalize.c index b9150e88f6ab5f..8b7445fc85e490 100644 --- a/clang/test/CodeGen/kcfi-normalize.c +++ b/clang/test/CodeGen/kcfi-normalize.c @@ -10,25 +10,31 @@ void foo(void (*fn)(int), int arg) { // CHECK-LABEL: define{{.*}}foo // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE1:[0-9]+]] -// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 1162514891) ] +// KCFI ID = 0x2A548E59 +// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 710184537) ] fn(arg); } void bar(void (*fn)(int, int), int arg1, int arg2) { // CHECK-LABEL: define{{.*}}bar // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE2:[0-9]+]] -// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 448046469) ] +// KCFI ID = 0xD5A52C2A +// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 -710595542) ] fn(arg1, arg2); } void baz(void (*fn)(int, int, int), int arg1, int arg2, int arg3) { // CHECK-LABEL: define{{.*}}baz // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE3:[0-9]+]] -// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 -2049681433) ] +// KCFI ID = 0x2EA2BF3B +// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 782417723) ] fn(arg1, arg2, arg3); } // CHECK: ![[#]] = !{i32 4, !"cfi-normalize-integers", i32 1} -// CHECK: ![[TYPE1]] = !{i32 -1143117868} -// CHECK: ![[TYPE2]] = !{i32 -460921415} -// CHECK: ![[TYPE3]] = !{i32 -333839615} +// KCFI ID = DEEB3EA2 +// CHECK: ![[TYPE1]] = !{i32 -555008350} +// KCFI ID = 24372DCB +// CHECK: ![[TYPE2]] = !{i32 607595979} +// KCFI ID = 0x60D0180C +// CHECK: ![[TYPE3]] = !{i32 1624250380} diff --git a/clang/test/CodeGen/kcfi.c b/clang/test/CodeGen/k
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
scottconstable wrote: > > @phoebewang and @lvwr I also noticed that there is this code in LLVM: > > https://github.com/llvm/llvm-project/blob/9ba6672b9f0e82a1f6d4100dc832c84447ea545c/llvm/lib/Transforms/Utils/ModuleUtils.cpp#L202-L214 > > > > . As far as I can tell, this code is not triggered when I build the Linux > > kernel with `-fsanitize=kcfi`. > > When is this code triggered? And do you think it is necessary to > > additionally implement the arity-enhancement to this code? > > I'm not familar with KCFI. I find it's added by @samitolvanen in > [e1c36bd](https://github.com/llvm/llvm-project/commit/e1c36bde0551977d4b2efae032af6dfc4b2b3936). > I think you should triger it with attached test case. It looks to me like this code might be triggered in some LTO configurations, and/or when linking code compiled from multiple source languages with the expectation that the KCFI type IDs will be compatible. Is my understanding correct? The comment in the code says "Matches CodeGenModule::CreateKCFITypeId in Clang," which I interpret to mean that this code should produce identical KCFI type IDs for identical function types, which might be tricky if the target binary is compiled from different languages. I added some code to `llvm::setKCFIType` that I hope will produce consistent output, but admittedly I'm not sure that my treatment of `clang::Type` and `llvm::Type` is consistent. https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
@@ -208,10 +209,34 @@ void llvm::setKCFIType(Module &M, Function &F, StringRef MangledType) { std::string Type = MangledType.str(); if (M.getModuleFlag("cfi-normalize-integers")) Type += ".normalized"; + + uint32_t OutHash = static_cast(llvm::xxHash64(Type)); + auto T = Triple(Twine(M.getTargetTriple())); + if (T.isX86() && T.isArch64Bit() && T.isOSLinux()) { +// Estimate the function's arity (i.e., the number of arguments) at the ABI +// level by counting the number of parameters that are likely to be passed +// as registers, such as pointers and 64-bit (or smaller) integers. The +// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs. +// Additional parameters or parameters larger than 64 bits may be passed on +// the stack, in which case the arity is denoted as 7. +size_t NumParams = F.arg_size(); +bool MayHaveStackArgs = NumParams > 6; + +for (unsigned int i = 0; !MayHaveStackArgs && i < NumParams; ++i) { + const llvm::Type *PT = F.getArg(i)->getType(); + if (!(PT->isPointerTy() || PT->getIntegerBitWidth() <= 64)) scottconstable wrote: Is this condition equivalent to what I wrote in `CodeGenModule::CreateKCFITypeId` with `typeof(*PT) = clang::Type`? Specifically: ``` if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() && getContext().getTypeSize(PT) <= 64))) ``` https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
https://github.com/scottconstable edited https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
https://github.com/scottconstable edited https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
@@ -208,10 +209,34 @@ void llvm::setKCFIType(Module &M, Function &F, StringRef MangledType) { std::string Type = MangledType.str(); if (M.getModuleFlag("cfi-normalize-integers")) Type += ".normalized"; + + uint32_t OutHash = static_cast(llvm::xxHash64(Type)); + auto T = Triple(Twine(M.getTargetTriple())); scottconstable wrote: This looks awkward and I regret that I needed to include another header to enable this. Maybe there is a workable API within the existing headers, but I couldn't find one. https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
https://github.com/scottconstable updated https://github.com/llvm/llvm-project/pull/117121 >From 4f21a0c817d221398bb93e25452518d265794265 Mon Sep 17 00:00:00 2001 From: Scott D Constable Date: Tue, 19 Nov 2024 15:51:05 -0800 Subject: [PATCH] Enhance KCFI type IDs with a 3-bit arity indicator. --- clang/lib/CodeGen/CodeGenModule.cpp | 31 +-- clang/test/CodeGen/kcfi-normalize.c | 18 +++ clang/test/CodeGen/kcfi.c | 22 +++-- llvm/lib/Transforms/Utils/ModuleUtils.cpp | 31 +-- .../GCOVProfiling/kcfi-normalize.ll | 7 +++-- llvm/test/Transforms/GCOVProfiling/kcfi.ll| 7 +++-- 6 files changed, 95 insertions(+), 21 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index b854eeb62a80ce..f9c01edf4f0953 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -2183,7 +2183,8 @@ llvm::ConstantInt *CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) { } llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) { - if (auto *FnType = T->getAs()) + auto *FnType = T->getAs(); + if (FnType) T = getContext().getFunctionType( FnType->getReturnType(), FnType->getParamTypes(), FnType->getExtProtoInfo().withExceptionSpec(EST_None)); @@ -2196,8 +2197,32 @@ llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) { if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers) Out << ".normalized"; - return llvm::ConstantInt::get(Int32Ty, - static_cast(llvm::xxHash64(OutName))); + uint32_t OutHash = static_cast(llvm::xxHash64(OutName)); + const auto &Triple = getTarget().getTriple(); + if (FnType && Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux()) { +// Estimate the function's arity (i.e., the number of arguments) at the ABI +// level by counting the number of parameters that are likely to be passed +// as registers, such as pointers and 64-bit (or smaller) integers. The +// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs. +// Additional parameters or parameters larger than 64 bits may be passed on +// the stack, in which case the arity is denoted as 7. +bool MayHaveStackArgs = FnType->getNumParams() > 6; + +for (unsigned int i = 0; !MayHaveStackArgs && i < FnType->getNumParams(); + ++i) { + const Type *PT = FnType->getParamType(i).getTypePtr(); + if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() && +getContext().getTypeSize(PT) <= 64))) +MayHaveStackArgs = true; +} + +// The 3-bit arity is concatenated with the lower 29 bits of the KCFI hash +// to form an enhanced KCFI type ID. This can prevent, for example, a +// 3-arity function's ID from ever colliding with a 2-arity function's ID. +OutHash = (OutHash << 3) | (MayHaveStackArgs ? 7 : FnType->getNumParams()); + } + + return llvm::ConstantInt::get(Int32Ty, OutHash); } void CodeGenModule::SetLLVMFunctionAttributes(GlobalDecl GD, diff --git a/clang/test/CodeGen/kcfi-normalize.c b/clang/test/CodeGen/kcfi-normalize.c index b9150e88f6ab5f..8b7445fc85e490 100644 --- a/clang/test/CodeGen/kcfi-normalize.c +++ b/clang/test/CodeGen/kcfi-normalize.c @@ -10,25 +10,31 @@ void foo(void (*fn)(int), int arg) { // CHECK-LABEL: define{{.*}}foo // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE1:[0-9]+]] -// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 1162514891) ] +// KCFI ID = 0x2A548E59 +// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 710184537) ] fn(arg); } void bar(void (*fn)(int, int), int arg1, int arg2) { // CHECK-LABEL: define{{.*}}bar // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE2:[0-9]+]] -// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 448046469) ] +// KCFI ID = 0xD5A52C2A +// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 -710595542) ] fn(arg1, arg2); } void baz(void (*fn)(int, int, int), int arg1, int arg2, int arg3) { // CHECK-LABEL: define{{.*}}baz // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE3:[0-9]+]] -// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 -2049681433) ] +// KCFI ID = 0x2EA2BF3B +// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 782417723) ] fn(arg1, arg2, arg3); } // CHECK: ![[#]] = !{i32 4, !"cfi-normalize-integers", i32 1} -// CHECK: ![[TYPE1]] = !{i32 -1143117868} -// CHECK: ![[TYPE2]] = !{i32 -460921415} -// CHECK: ![[TYPE3]] = !{i32 -333839615} +// KCFI ID = DEEB3EA2 +// CHECK: ![[TYPE1]] = !{i32 -555008350} +// KCFI ID = 24372DCB +// CHECK: ![[TYPE2]] = !{i32 607595979} +// KCFI ID = 0x60D0180C +// CHECK: ![[TYPE3]] = !{i32 1624250380} diff --git a/clang/test/CodeGen/kcfi.c b/clang/test/CodeGen
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
https://github.com/scottconstable edited https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
scottconstable wrote: > > Second, this scheme reduces the expected number of hash collisions within > > each arity, compared against the expected number of collisions (0.01383765) > > for the 32-bit hashing scheme that includes all arities. The table below > > shows the expected number of collisions for each arity, given the number of > > unique indirect callable function types within that arity in the same > > Ubuntu 24.04 server kernel discussed above. > > The collisions vary a lot with different number of function types. It looks > to me more smooth if we use 2 bits to distinguish 4 cases: 1, 2, 3 and 0 or > others. I re-ran the numbers with a 30-bit hash and 2-bit arity, and you are correct that the distribution of expected collisions is more smooth: | Arity | Unique Indirect Callable Function Types | Number of Expected Collisions | | - | --- | - | | 0 or >3 | 2089 | 0.00201654 | | 1 | 2492 | 0.00287330 | | 2 | 3775 | 0.00660789 | | 3 | 2547 | 0.00300181 | However, a 2-bit arity would undermine what is arguably the more desirable property: > This scheme enhances security in two ways. First, it prevents a j-arity > function pointer from being used to call a k-arity function, unless j=k. The > current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr from > calling a 3-arity target if the kCFI hashes collide. If this were to happen, > then potentially malicious stale/dead data in RDX at the call site could > suddenly become live as the third parameter at the call target. For example, if the 30-bit hash of a 0-arity function type collides with the 30-bit hash of the type of a 4-arity function type, then the `RDI`, `RSI`, `RDX`, and `RCX` registers that die when calling a function of the 0-arity type will unexpectedly become live if a COP attack redirects the call to a function of the 4-arity type. Therefore, I believe that the 29-bit hash and 3-bit arity offers a more favorable security posture. https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
https://github.com/scottconstable updated https://github.com/llvm/llvm-project/pull/117121 >From d5ec228a1d3e059e2a64ac1a150bec6f65d573e7 Mon Sep 17 00:00:00 2001 From: Scott D Constable Date: Tue, 19 Nov 2024 15:51:05 -0800 Subject: [PATCH] Enhance KCFI type IDs with a 3-bit arity indicator. --- clang/lib/CodeGen/CodeGenModule.cpp | 31 +-- clang/test/CodeGen/kcfi-normalize.c | 18 +++ clang/test/CodeGen/kcfi.c | 22 +++-- llvm/lib/Transforms/Utils/ModuleUtils.cpp | 29 +++-- .../GCOVProfiling/kcfi-normalize.ll | 7 +++-- llvm/test/Transforms/GCOVProfiling/kcfi.ll| 7 +++-- 6 files changed, 94 insertions(+), 20 deletions(-) diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index b854eeb62a80ce..f9c01edf4f0953 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -2183,7 +2183,8 @@ llvm::ConstantInt *CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) { } llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) { - if (auto *FnType = T->getAs()) + auto *FnType = T->getAs(); + if (FnType) T = getContext().getFunctionType( FnType->getReturnType(), FnType->getParamTypes(), FnType->getExtProtoInfo().withExceptionSpec(EST_None)); @@ -2196,8 +2197,32 @@ llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) { if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers) Out << ".normalized"; - return llvm::ConstantInt::get(Int32Ty, - static_cast(llvm::xxHash64(OutName))); + uint32_t OutHash = static_cast(llvm::xxHash64(OutName)); + const auto &Triple = getTarget().getTriple(); + if (FnType && Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux()) { +// Estimate the function's arity (i.e., the number of arguments) at the ABI +// level by counting the number of parameters that are likely to be passed +// as registers, such as pointers and 64-bit (or smaller) integers. The +// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs. +// Additional parameters or parameters larger than 64 bits may be passed on +// the stack, in which case the arity is denoted as 7. +bool MayHaveStackArgs = FnType->getNumParams() > 6; + +for (unsigned int i = 0; !MayHaveStackArgs && i < FnType->getNumParams(); + ++i) { + const Type *PT = FnType->getParamType(i).getTypePtr(); + if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() && +getContext().getTypeSize(PT) <= 64))) +MayHaveStackArgs = true; +} + +// The 3-bit arity is concatenated with the lower 29 bits of the KCFI hash +// to form an enhanced KCFI type ID. This can prevent, for example, a +// 3-arity function's ID from ever colliding with a 2-arity function's ID. +OutHash = (OutHash << 3) | (MayHaveStackArgs ? 7 : FnType->getNumParams()); + } + + return llvm::ConstantInt::get(Int32Ty, OutHash); } void CodeGenModule::SetLLVMFunctionAttributes(GlobalDecl GD, diff --git a/clang/test/CodeGen/kcfi-normalize.c b/clang/test/CodeGen/kcfi-normalize.c index b9150e88f6ab5f..8b7445fc85e490 100644 --- a/clang/test/CodeGen/kcfi-normalize.c +++ b/clang/test/CodeGen/kcfi-normalize.c @@ -10,25 +10,31 @@ void foo(void (*fn)(int), int arg) { // CHECK-LABEL: define{{.*}}foo // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE1:[0-9]+]] -// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 1162514891) ] +// KCFI ID = 0x2A548E59 +// CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 710184537) ] fn(arg); } void bar(void (*fn)(int, int), int arg1, int arg2) { // CHECK-LABEL: define{{.*}}bar // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE2:[0-9]+]] -// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 448046469) ] +// KCFI ID = 0xD5A52C2A +// CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 -710595542) ] fn(arg1, arg2); } void baz(void (*fn)(int, int, int), int arg1, int arg2, int arg3) { // CHECK-LABEL: define{{.*}}baz // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE3:[0-9]+]] -// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 -2049681433) ] +// KCFI ID = 0x2EA2BF3B +// CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 782417723) ] fn(arg1, arg2, arg3); } // CHECK: ![[#]] = !{i32 4, !"cfi-normalize-integers", i32 1} -// CHECK: ![[TYPE1]] = !{i32 -1143117868} -// CHECK: ![[TYPE2]] = !{i32 -460921415} -// CHECK: ![[TYPE3]] = !{i32 -333839615} +// KCFI ID = DEEB3EA2 +// CHECK: ![[TYPE1]] = !{i32 -555008350} +// KCFI ID = 24372DCB +// CHECK: ![[TYPE2]] = !{i32 607595979} +// KCFI ID = 0x60D0180C +// CHECK: ![[TYPE3]] = !{i32 1624250380} diff --git a/clang/test/CodeGen/kcfi.c b/clang/test/CodeGen/k
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
scottconstable wrote: Hi @maurer, I can honestly say that I appreciate your attention to detail. While your observation that "attackers often have direct control of intentionally passed arguments" is true in general, I do not see evidence of this among indirect calls in the Linux kernel. 77% of all indirect call parameters are pointers, and AFAIK user-mode code cannot exert arbitrary control over pointers within the kernel (especially given that x86 Linux kernels enable SMAP and SMEP when it is available on the CPU). That leaves the other 23% of indirect call parameters to consider, though I admit I don't have a concrete breakdown of what fraction of these *might* be attacker controllable. But let's assume for the sake of discussion that all of the non-pointer indirect call parameters are attacker controllable. And because we don't know which dead registers at a call site *might* be attacker-controllable, let's similarly assume for the sake of discussion that all of these dead registers are attacker-controllable. I built the table below to estimate for the arity-enhanced scheme a kind of exploitability score, i.e., the expected number of attacker-controlled parameters times the frequency of function types of that arity. For example, suppose that a hash collision between two function types occurs within the arity-1 partition. The expected number of attacker-controlled parameters at the first function is 0.23, and likewise for the second. So the total number of attacker-controlled parameters is 0.46, times the relative frequency of arity-1 function types (roughly 26%) gives a weighted exploitability of 0.105138035. | Arity | Expected # of attacker-controlled args | Weighted exploitability | | --- | --- | | | 0 | 0 | 0 | | 1 | 0.46 | 0.105138035 | | 2 | 0.92 | 0.318536183 | | 3 | 1.38 | 0.322375493 | | 4 | 1.84 | 0.197281482 | | 5 | 2.3 | 0.109483628 | | 6+ | 2.76 | 0.093409153 | Total exploitability score for the arity enhancement: **1.146223975** A similar analysis can be applied to the existing 32-bit hash scheme. For example, suppose a 1-arity function collides with a 2-arity function. Then the expected number of attacker-controlled parameters that would be live at the mis-matched target would be 0.23 if the 2-arity function calls the 1-arity function, or 1.23 if the 1-arity function calls the 2-arity function (because the second parameter at the target would be a dead register at the caller). This data is captured in the table below, where the columns and rows refer to the respective arities of the colliding function types. | | 0 | 1 | 2 | 3 | 4 | 5 | 6+ | | | | | | | | | | | **0** | 0 | | | | | | | **1** | 1 | 0.46 | | | | | | **2** | 2 | 1.46 | 0.92 | | | | | **3** | 3 | 2.46 | 1.92 | 1.38 | | | | **4** | 4 | 3.46 | 2.92 | 2.38 | 1.84 | | | **5** | 5 | 4.46 | 3.92 | 3.38 | 2.84 | 2.3 | | **6+**| 6 | 5.46 | 4.92 | 4.38 | 3.84 | 3.3 | 2.76 Assuming that a collision occurs, this table shows the probability density of the arities (note that the probabilities do sum to 1): | | 0 | 1 | 2 | 3 | 4 | 5 | 6+ | |-|-|-|-|-|-|-|-| | **0** | 8.61406E-06 | | | | | | | | **1** | 0.00134164 | 0.052240106 | | | | | | | **2** | 0.00203238 | 0.15827159 | 0.119878662 | | | | | | **3** | 0.001371251 | 0.106786156 | 0.161764743 | 0.054571497 | | | | | **4** | 0.000629365 | 0.049011785 | 0.074245381 | 0.050093506 | 0.011495742 | | | | **5** | 0.000279419 | 0.021759723 | 0.032962663 | 0.022239974 | 0.010207511 | 0.00226591 | | | **6+** | 0.000198662 | 0.015470786 | 0.023435882 | 0.015812236 | 0.007257363 | 0.003222046 | 0.001145409 | Then the weighted exploitability for each collision is: | | 0 | 1 | 2 | 3 | 4 | 5 | 6+ | |-|-|-|-|-|-|-|| | **0** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | **1** | 0.00134164 | 0.024030449 | 0 | 0 | 0 | 0 | 0 | | **2** | 0.00406476 | 0.231076521 | 0.110288369 | 0 | 0 | 0 | 0 | | **3** | 0.004113752 | 0.262693944 | 0.3
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
scottconstable wrote: I also do not want to lose sight of one of the other obvious advantages that was mentioned in the PR description: > One additional benefit of this patch is that it can benefit other CFI > approaches that build on kCFI, such as FineIBT. For example, this proposed > enhancement to FineIBT must be able to infer (at kernel init time) which > registers are live at an indirect call target: > https://lkml.org/lkml/2024/9/27/982. If the arity bits are available in the > kCFI type ID, then this information is trivial to infer. To elaborate, we are concurrently working on a Linux kernel patch to enhance FineIBT (which is a KCFI-like solution that utilizes x86 Indirect Branch Tracking). The goal is to extend FineIBT to poison live argument registers if a hash check fails after a branch mis-prediction. This enhancement can help to mitigate a variety of Spectre attacks in the Linux kernel. https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)
scottconstable wrote: @sirmc @samitolvanen @Darksonn @lvwr @maurer @rcvalle @MaskRay A gentle reminder to please review this PR. https://github.com/llvm/llvm-project/pull/121070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)
https://github.com/scottconstable updated https://github.com/llvm/llvm-project/pull/121070 >From 54296a3991e186629889611eff9e7cdd2c1cadca Mon Sep 17 00:00:00 2001 From: Scott D Constable Date: Mon, 23 Dec 2024 13:48:48 -0800 Subject: [PATCH] Implement a new kcfi_arity feature that encodes an indirect call target's arity (i.e., the number of live-in registers) in the function's __cfi header. --- clang/docs/ControlFlowIntegrity.rst | 9 + clang/docs/UsersManual.rst| 6 + clang/include/clang/Basic/CodeGenOptions.def | 1 + .../clang/Basic/DiagnosticDriverKinds.td | 2 + clang/include/clang/Basic/Features.def| 1 + clang/include/clang/Driver/Options.td | 4 + clang/include/clang/Driver/SanitizerArgs.h| 1 + clang/lib/CodeGen/CodeGenModule.cpp | 2 + clang/lib/Driver/SanitizerArgs.cpp| 10 + clang/test/CodeGen/kcfi-arity.c | 7 + llvm/lib/Target/X86/X86AsmPrinter.cpp | 23 +- llvm/test/CodeGen/X86/kcfi-arity.ll | 205 ++ 12 files changed, 270 insertions(+), 1 deletion(-) create mode 100644 clang/test/CodeGen/kcfi-arity.c create mode 100644 llvm/test/CodeGen/X86/kcfi-arity.ll diff --git a/clang/docs/ControlFlowIntegrity.rst b/clang/docs/ControlFlowIntegrity.rst index 7de805e2154db8..3c97f7da3d7d99 100644 --- a/clang/docs/ControlFlowIntegrity.rst +++ b/clang/docs/ControlFlowIntegrity.rst @@ -336,6 +336,15 @@ cross-DSO function address equality. These properties make KCFI easier to adopt in low-level software. KCFI is limited to checking only function pointers, and isn't compatible with executable-only memory. +``-fsanitize-kcfi-arity`` +- + +For supported targets, this feature extends kCFI by telling the compiler to +record information about each indirect-callable function's arity (i.e., the +number of arguments passed in registers) into the binary. Some kernel CFI +techniques, such as FineIBT, may be able to use this information to provide +enhanced security. + Member Function Pointer Call Checking = diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst index 260e84910c6f78..e958e2fb353ef8 100644 --- a/clang/docs/UsersManual.rst +++ b/clang/docs/UsersManual.rst @@ -2220,6 +2220,12 @@ are listed below. This option is currently experimental. +.. option:: -fsanitize-kcfi-arity + + Extends kernel indirect call forward-edge control flow integrity with + additional function arity information (for supported targets). See + :doc:`ControlFlowIntegrity` for more details. + .. option:: -fstrict-vtable-pointers Enable optimizations based on the strict rules for overwriting polymorphic diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def index 1ab8c7fb4d3c33..beb43b3ad8f121 100644 --- a/clang/include/clang/Basic/CodeGenOptions.def +++ b/clang/include/clang/Basic/CodeGenOptions.def @@ -277,6 +277,7 @@ CODEGENOPT(SanitizeCfiICallNormalizeIntegers, 1, 0) ///< Normalize integer types ///< CFI icall function signatures CODEGENOPT(SanitizeCfiCanonicalJumpTables, 1, 0) ///< Make jump table symbols canonical ///< instead of creating a local jump table. +CODEGENOPT(SanitizeKcfiArity, 1, 0) ///< Embed arity in KCFI patchable function prefix CODEGENOPT(SanitizeCoverageType, 2, 0) ///< Type of sanitizer coverage ///< instrumentation. CODEGENOPT(SanitizeCoverageIndirectCalls, 1, 0) ///< Enable sanitizer coverage diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td b/clang/include/clang/Basic/DiagnosticDriverKinds.td index 288786b8ce9399..df40c1070376e6 100644 --- a/clang/include/clang/Basic/DiagnosticDriverKinds.td +++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td @@ -209,6 +209,8 @@ def err_drv_cannot_open_randomize_layout_seed_file : Error< "cannot read randomize layout seed file '%0'">; def err_drv_invalid_version_number : Error< "invalid version number in '%0'">; +def err_drv_kcfi_arity_unsupported_target : Error< + "target '%0' is unsupported by -fsanitize-kcfi-arity">; def err_drv_no_linker_llvm_support : Error< "'%0': unable to pass LLVM bit-code files to linker">; def err_drv_no_ast_support : Error< diff --git a/clang/include/clang/Basic/Features.def b/clang/include/clang/Basic/Features.def index c82b6d9b5f6c10..e736b46411ed1e 100644 --- a/clang/include/clang/Basic/Features.def +++ b/clang/include/clang/Basic/Features.def @@ -254,6 +254,7 @@ FEATURE(is_trivially_constructible, LangOpts.CPlusPlus) FEATURE(is_trivially_copyable, LangOpts.CPlusPlus) FEATURE(is_union, LangOpts.CPlusPlus) FEATURE(kcfi, LangOpts.Sanitize.has(SanitizerKind::KCFI)) +FEATURE(kcfi_arity, LangOpts.Sanitize.has(SanitizerKind::KCFI)) FEATURE(modules, LangOpts.M
[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)
scottconstable wrote: @sirmc @samitolvanen @Darksonn @lvwr @maurer @rcvalle A gentle reminder to please review this PR. https://github.com/llvm/llvm-project/pull/121070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)
scottconstable wrote: > > maurer: I think their point is that even if you are not changing the hash > > scheme, you are proposing breaking compatibility of the identifier with > > existing code. Since we don't want to do this many times, if we are > > breaking compatibility with existing code, they would like to batch it with > > another breaking update so that it doesn't need to be done again. > > (This isn't me reviewing this PR, just trying to clear up some confusion.) > > This:) Thanks for the explanation. > > After my pending lld/MachO change, kcfi will be the only user of the legacy > `xxHash64`. I want to remove `xxHash64` from llvm-project. Going back to @maurer 's comment about compatibility, I do not believe that this PR would break any compatibility with other compilers, or with other compiler versions. If a kernel is built with some of its functions having the 3-bit arity extension and the rest of its functions not having the 3-bit arity extension (e.g., because the user's clang compiler has the extension, and their rust compiler does not), then the kernel will behave the same as it would if all functions or no functions have the 3-bit arity extension. If the kernel is hot-patched with the register hardening scheme in https://lkml.org/lkml/2024/9/27/982, then only the functions that have the 3-bit arity extension will benefit from the register hardening scheme. https://github.com/llvm/llvm-project/pull/121070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)
https://github.com/scottconstable created https://github.com/llvm/llvm-project/pull/121070 Kernel Control Flow Integrity (kCFI) is a feature that hardens indirect calls by comparing a 32-bit hash of the function pointer's type against a hash of the target function's type. If the hashes do not match, the kernel may panic (or log the hash check failure, depending on the kernel's configuration). These hashes are computed at compile time by applying the xxHash64 algorithm to each mangled canonical function (or function pointer) type, then truncating the result to 32 bits. This hash is written into each indirect-callable function header by encoding it as the 32-bit immediate operand to a `MOVri` instruction, e.g.: ``` __cfi_foo: nop nop nop nop nop nop nop nop nop nop nop movl$199571451, %eax# hash of foo's type = 0xBE537FB foo: ... ``` This PR extends x86-based kCFI with a 3-bit arity indicator encoded in the `MOVri` instruction's register (reg) field as follows: | Arity Indicator | Description | Encoding in reg field | | --- | --- | --- | | 0 | 0 parameters | EAX | | 1 | 1 parameter in RDI | ECX | | 2 | 2 parameters in RDI and RSI | EDX | | 3 | 3 parameters in RDI, RSI, and RDX | EBX | | 4 | 4 parameters in RDI, RSI, RDX, and RCX | ESP | | 5 | 5 parameters in RDI, RSI, RDX, RCX, and R8 | EBP | | 6 | 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 | ESI | | 7 | At least one parameter may be passed on the stack | EDI | For example, if `foo` takes 3 register arguments and no stack arguments then the `MOVri` instruction in its kCFI header would instead be written as: ``` movl$199571451, %ebx# hash of foo's type = 0xBE537FB ``` This PR will benefit other CFI approaches that build on kCFI, such as FineIBT. For example, this proposed enhancement to FineIBT must be able to infer (at kernel init time) which registers are live at an indirect call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are available in the kCFI function header, then this information is trivial to infer. Note that there is another existing PR proposal that includes the 3-bit arity within the existing 32-bit immediate field, which introduces different security properties: https://github.com/llvm/llvm-project/pull/117121. >From 81ff927ed34e9e8b61a432822f946544b411b3b4 Mon Sep 17 00:00:00 2001 From: Scott D Constable Date: Mon, 23 Dec 2024 13:48:48 -0800 Subject: [PATCH] Implement a new kcfi_x86_arity feature that encodes an indirect call target's arity (i.e., the number of live-in registers) in the function's __cfi header. --- clang/include/clang/Basic/Features.def| 1 + llvm/lib/Target/X86/X86AsmPrinter.cpp | 20 +- .../X86/kcfi-patchable-function-prefix.ll | 4 +- llvm/test/CodeGen/X86/kcfi.ll | 63 ++- 4 files changed, 82 insertions(+), 6 deletions(-) diff --git a/clang/include/clang/Basic/Features.def b/clang/include/clang/Basic/Features.def index c82b6d9b5f6c10..dca8f4dc0fbf76 100644 --- a/clang/include/clang/Basic/Features.def +++ b/clang/include/clang/Basic/Features.def @@ -254,6 +254,7 @@ FEATURE(is_trivially_constructible, LangOpts.CPlusPlus) FEATURE(is_trivially_copyable, LangOpts.CPlusPlus) FEATURE(is_union, LangOpts.CPlusPlus) FEATURE(kcfi, LangOpts.Sanitize.has(SanitizerKind::KCFI)) +FEATURE(kcfi_x86_arity, LangOpts.Sanitize.has(SanitizerKind::KCFI)) FEATURE(modules, LangOpts.Modules) FEATURE(safe_stack, LangOpts.Sanitize.has(SanitizerKind::SafeStack)) FEATURE(shadow_call_stack, diff --git a/llvm/lib/Target/X86/X86AsmPrinter.cpp b/llvm/lib/Target/X86/X86AsmPrinter.cpp index f01e47b41cf5e4..cb21d9bb5f2879 100644 --- a/llvm/lib/Target/X86/X86AsmPrinter.cpp +++ b/llvm/lib/Target/X86/X86AsmPrinter.cpp @@ -181,8 +181,26 @@ void X86AsmPrinter::emitKCFITypeId(const MachineFunction &MF) { // Embed the type hash in the X86::MOV32ri instruction to avoid special // casing object file parsers. EmitKCFITypePadding(MF); + + Register MovReg = X86::EAX; + const auto &Triple = MF.getTarget().getTargetTriple(); + if (Triple.isArch64Bit() && Triple.isOSLinux()) { +// Determine the function's arity (i.e., the number of arguments) at the ABI +// level by counting the number of parameters that are passed +// as registers, such as pointers and 64-bit (or smaller) integers. The +// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs. +// Additional parameters or parameters larger than 64 bits may be passed on +// the stack, in which case the arity is denoted as 7. +const unsigned ArityToRegMap[8] = {X86::EAX, X86::ECX, X86::EDX, X86::EBX, + X86::ESP, X86::EBP, X86::ESI, X86::EDI}; +int Arity = MF.getInfo()->getArgumentStackSize() > 0 +? 7 +
[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)
https://github.com/scottconstable updated https://github.com/llvm/llvm-project/pull/121070 >From 70f1743e23ff8e218799e94b03a5bf033715666a Mon Sep 17 00:00:00 2001 From: Scott D Constable Date: Mon, 23 Dec 2024 13:48:48 -0800 Subject: [PATCH] Implement a new kcfi_x86_arity feature that encodes an indirect call target's arity (i.e., the number of live-in registers) in the function's __cfi header. --- clang/include/clang/Basic/Features.def| 1 + llvm/lib/Target/X86/X86AsmPrinter.cpp | 20 +- .../X86/kcfi-patchable-function-prefix.ll | 4 +- llvm/test/CodeGen/X86/kcfi.ll | 63 ++- 4 files changed, 82 insertions(+), 6 deletions(-) diff --git a/clang/include/clang/Basic/Features.def b/clang/include/clang/Basic/Features.def index c82b6d9b5f6c10..dca8f4dc0fbf76 100644 --- a/clang/include/clang/Basic/Features.def +++ b/clang/include/clang/Basic/Features.def @@ -254,6 +254,7 @@ FEATURE(is_trivially_constructible, LangOpts.CPlusPlus) FEATURE(is_trivially_copyable, LangOpts.CPlusPlus) FEATURE(is_union, LangOpts.CPlusPlus) FEATURE(kcfi, LangOpts.Sanitize.has(SanitizerKind::KCFI)) +FEATURE(kcfi_x86_arity, LangOpts.Sanitize.has(SanitizerKind::KCFI)) FEATURE(modules, LangOpts.Modules) FEATURE(safe_stack, LangOpts.Sanitize.has(SanitizerKind::SafeStack)) FEATURE(shadow_call_stack, diff --git a/llvm/lib/Target/X86/X86AsmPrinter.cpp b/llvm/lib/Target/X86/X86AsmPrinter.cpp index f01e47b41cf5e4..aa5f1916d46ac1 100644 --- a/llvm/lib/Target/X86/X86AsmPrinter.cpp +++ b/llvm/lib/Target/X86/X86AsmPrinter.cpp @@ -181,8 +181,26 @@ void X86AsmPrinter::emitKCFITypeId(const MachineFunction &MF) { // Embed the type hash in the X86::MOV32ri instruction to avoid special // casing object file parsers. EmitKCFITypePadding(MF); + + Register MovReg = X86::EAX; + const auto &Triple = MF.getTarget().getTargetTriple(); + if (Triple.isArch64Bit() && Triple.isOSLinux()) { +// Determine the function's arity (i.e., the number of arguments) at the ABI +// level by counting the number of parameters that are passed +// as registers, such as pointers and 64-bit (or smaller) integers. The +// Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs. +// Additional parameters or parameters larger than 64 bits may be passed on +// the stack, in which case the arity is denoted as 7. +const unsigned ArityToRegMap[8] = {X86::EAX, X86::ECX, X86::EDX, X86::EBX, + X86::ESP, X86::EBP, X86::ESI, X86::EDI}; +int Arity = MF.getInfo()->getArgumentStackSize() > 0 +? 7 +: MF.getRegInfo().liveins().size(); +MovReg = ArityToRegMap[Arity]; + } + EmitAndCountInstruction(MCInstBuilder(X86::MOV32ri) - .addReg(X86::EAX) + .addReg(MovReg) .addImm(MaskKCFIType(Type->getZExtValue(; if (MAI->hasDotTypeDotSizeDirective()) { diff --git a/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll b/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll index 1b7bd7835e890c..deababc7fd5379 100644 --- a/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll +++ b/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll @@ -3,7 +3,7 @@ ; CHECK: .p2align 4 ; CHECK-LABEL:__cfi_f1: ; CHECK-COUNT-11: nop -; CHECK-NEXT: movl $12345678, %eax +; CHECK-NEXT: movl $12345678, %ecx ; CHECK-LABEL:.Lcfi_func_end0: ; CHECK-NEXT: .size __cfi_f1, .Lcfi_func_end0-__cfi_f1 ; CHECK-LABEL:f1: @@ -26,7 +26,7 @@ define void @f2(ptr noundef %x) { ; CHECK: .p2align 4 ; CHECK-LABEL:__cfi_f3: ; CHECK-NOT:nop -; CHECK-NEXT: movl $12345678, %eax +; CHECK-NEXT: movl $12345678, %ecx ; CHECK-COUNT-11: nop ; CHECK-LABEL:f3: define void @f3(ptr noundef %x) #0 !kcfi_type !1 { diff --git a/llvm/test/CodeGen/X86/kcfi.ll b/llvm/test/CodeGen/X86/kcfi.ll index 059efcc71b0eb8..91d706796b8e99 100644 --- a/llvm/test/CodeGen/X86/kcfi.ll +++ b/llvm/test/CodeGen/X86/kcfi.ll @@ -16,7 +16,7 @@ ; ASM-NEXT:nop ; ASM-NEXT:nop ; ASM-NEXT:nop -; ASM-NEXT:movl $12345678, %eax +; ASM-NEXT:movl $12345678, %ecx ; ASM-LABEL: .Lcfi_func_end0: ; ASM-NEXT: .size __cfi_f1, .Lcfi_func_end0-__cfi_f1 define void @f1(ptr noundef %x) !kcfi_type !1 { @@ -90,7 +90,7 @@ define void @f4(ptr noundef %x) #0 { ;; Ensure we emit Value + 1 for unwanted values (e.g. endbr64 == 4196274163). ; ASM-LABEL: __cfi_f5: -; ASM: movl $4196274164, %eax # imm = 0xFA1E0FF4 +; ASM: movl $4196274164, %ecx # imm = 0xFA1E0FF4 define void @f5(ptr noundef %x) !kcfi_type !2 { ; ASM-LABEL: f5: ; ASM: movl $98693132, %r10d # imm = 0x5E1F00C @@ -100,7 +100,7 @@ define void @f5(ptr noundef %x) !kcfi_type !2 { ;; Ensure we emit Value + 1 for unwanted values (e.g. -endbr64 == 98693133). ; ASM-LABEL: __cfi_f6: -; ASM: movl $98693134, %eax # imm = 0x5E1F
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
scottconstable wrote: @phoebewang @sirmc @samitolvanen @Darksonn @lvwr @ojeda @maurer @rcvalle @MaskRay I have created https://github.com/llvm/llvm-project/pull/121070 to implement the alternate proposal summarized in this comment above: https://github.com/llvm/llvm-project/pull/117121#issuecomment-2518240999. https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)
@@ -181,8 +181,26 @@ void X86AsmPrinter::emitKCFITypeId(const MachineFunction &MF) { // Embed the type hash in the X86::MOV32ri instruction to avoid special // casing object file parsers. EmitKCFITypePadding(MF); + + Register MovReg = X86::EAX; + const auto &Triple = MF.getTarget().getTargetTriple(); + if (Triple.isArch64Bit() && Triple.isOSLinux()) { scottconstable wrote: Is this check necessary? I notice that all of the x86-based kCFI tests use the `x86_64-unknown-linux-gnu` triple. If 64-bit Linux is the only supported x86 subtarget then I think this check is redundant. But I'm not 100% certain. https://github.com/llvm/llvm-project/pull/121070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
scottconstable wrote: > I haven't received a reply for my concerns I commented at [#117121 > (comment)](https://github.com/llvm/llvm-project/pull/117121#issuecomment-2502346476) > and [#117121 > (comment)](https://github.com/llvm/llvm-project/pull/117121#issuecomment-2516251353), > and they still remain for the alternative proposal. > > I'd recommend working on better defining the problem, and then working on a > well-defined solution for the problem that doesn't affect or conflict with a > currently well-defined solution to a different problem (i.e., KCFI). @rcvalle I think that https://github.com/llvm/llvm-project/pull/117121#issuecomment-2516251353 isn't relevant for the alternative proposal in https://github.com/llvm/llvm-project/pull/121070 because that proposal does not attempt to prevent cross-arity collisions. Do you agree? Or have I mis-understood your concern? I'm not sure whether https://github.com/llvm/llvm-project/pull/117121#issuecomment-2502346476 is relevant for the alternative proposal. Please feel free to discuss at https://github.com/llvm/llvm-project/pull/121070. https://github.com/llvm/llvm-project/pull/117121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)
@@ -254,6 +254,7 @@ FEATURE(is_trivially_constructible, LangOpts.CPlusPlus) FEATURE(is_trivially_copyable, LangOpts.CPlusPlus) FEATURE(is_union, LangOpts.CPlusPlus) FEATURE(kcfi, LangOpts.Sanitize.has(SanitizerKind::KCFI)) +FEATURE(kcfi_x86_arity, LangOpts.Sanitize.has(SanitizerKind::KCFI)) scottconstable wrote: Peter Zijlstra, one of the Linux FineIBT maintainers, suggested to me that `__has_feature()` might be the best interface for Kconfig to discover that clang supports this new kCFI arity extension. If this feature were to be added to rustc, then I think rustc would need a similar interface to expose the feature. https://github.com/llvm/llvm-project/pull/121070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)
scottconstable wrote: > If hashing is changed, consider replacing xxhash64 with xxh3+_64bits @MaskRay This PR does not change the hashing scheme at all. https://github.com/llvm/llvm-project/pull/121070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86] Extend kCFI with a 3-bit arity indicator (PR #121070)
@@ -254,6 +254,7 @@ FEATURE(is_trivially_constructible, LangOpts.CPlusPlus) FEATURE(is_trivially_copyable, LangOpts.CPlusPlus) FEATURE(is_union, LangOpts.CPlusPlus) FEATURE(kcfi, LangOpts.Sanitize.has(SanitizerKind::KCFI)) +FEATURE(kcfi_x86_arity, LangOpts.Sanitize.has(SanitizerKind::KCFI)) scottconstable wrote: @Darksonn This PR differs from https://github.com/llvm/llvm-project/pull/117121 in that this PR does not change the hash type. Instead, this PR uses the reg field in the `MOVri32` instruction to encode some additional meta data about the function's arity. https://github.com/llvm/llvm-project/pull/121070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits