[Bug target/104816] -fcf-protection=branch should generate endbr instead of notrack jumps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104816 --- Comment #6 from peterz at infradead dot org --- (In reply to H.J. Lu from comment #5) > (In reply to Andrew Cooper from comment #4) > > I've worked around this in Xen with: > > https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff; > > h=9d4a44380d273de22d5753883cbf5581795ff24d and > > https://lore.kernel.org/lkml/YiXpv0q88paPHPqF@hirez.programming.kicks-ass. > > net/ is pending for Linux. > > > > IMO, it's an error that -fcf-protection=branch is not obeyed for jump > > tables, and we don't want to end up in a situation where jump tables are > > unusable with CET. > > Are you suggesting to add an option to generate jump table with ENDBR? I would suggest having -fcf-protection=branch generate ENDBR for jump-tables and never generate NOTRACK prefix. Then add a mode that allows NOTRACK prefixes, perhaps -fcf-protection=branch,notrack. IBT without NOTRACK is the strongest form; it would be daft to require additional parameters for that.
[Bug ipa/105160] [12 regression] ipa modref marks functions with asm volatile as const or pure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160 peterz at infradead dot org changed: What|Removed |Added CC||peterz at infradead dot org --- Comment #6 from peterz at infradead dot org --- Happy accident; I've been wanting to allow doing something like: static __always_inline __pure bool arch_static_branch(struct static_key * const key) { asm_volatile_goto("1:" ".byte " __stringify(BYTES_NOP5) "\n\t" ".pushsection __jump_table, \"aw\" \n\t" _ASM_ALIGN "\n\t" ".long 1b - ., %l[l_yes] - . \n\t" _ASM_PTR "%c0 - . \n\t" ".popsecion \n\t" : : "i" (key) : : l_yes); return false; l_yes: return true; } Now, since asm-goto is implicitly volatile, and asm volatile (per this thread) destroys __pure due to side-effects, this doesn't actually work. But I'd like to still have the explicit pure attribute override this. If the programmer gets it wrong like this, he/she gets to keep the pieces. Specifically in this case, he result of the function really only depends on the @key argument. Any call to this will have the exact same outcome and merging multiple instances is *good*. Also see this thread on linux-toolchains: https://lore.kernel.org/all/yg80wg%2f2izjxf...@hirez.programming.kicks-ass.net/
[Bug ipa/105160] [12 regression] ipa modref marks functions with asm volatile as const or pure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160 --- Comment #7 from peterz at infradead dot org --- (In reply to peterz from comment #6) > Happy accident; I've been wanting to allow doing something like: > > static __always_inline __pure bool arch_static_branch(struct static_key * > const key) > { > asm_volatile_goto("1:" > ".byte " __stringify(BYTES_NOP5) "\n\t" > ".pushsection __jump_table, \"aw\" \n\t" > _ASM_ALIGN "\n\t" > ".long 1b - ., %l[l_yes] - . \n\t" > _ASM_PTR "%c0 - . \n\t" > ".popsecion \n\t" > : : "i" (key) : : l_yes); > return false; > l_yes: > return true; > } > > > Now, since asm-goto is implicitly volatile, and asm volatile (per this > thread) destroys __pure due to side-effects, this doesn't actually work. But > I'd like to still have the explicit pure attribute override this. If the > programmer gets it wrong like this, he/she gets to keep the pieces. > > Specifically in this case, he result of the function really only depends on > the @key argument. Any call to this will have the exact same outcome and > merging multiple instances is *good*. > > Also see this thread on linux-toolchains: > > > https://lore.kernel.org/all/YG80wg%2F2iZjXfCDJ@hirez.programming.kicks-ass. > net/ Possible test case here: https://lore.kernel.org/all/yib0vyqmes9hr...@hirez.programming.kicks-ass.net/
[Bug c/107048] New: GCC lacks -fsanitize=kcfi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107048 Bug ID: 107048 Summary: GCC lacks -fsanitize=kcfi Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: peterz at infradead dot org Target Milestone: --- Please implement -fsanitize=kcfi to match llvm/clang: https://github.com/samitolvanen/llvm-project/commit/f7bf6a87c4fd945800115a17b8b61390541fabd0 The Linux kernel patches are queued and slated for the next merge window. Things like FineIBT rely on having this feature enabled.
[Bug target/102952] New code-gen options for retpolines and straight line speculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 --- Comment #25 from peterz at infradead dot org --- (In reply to H.J. Lu from comment #24) > Should I submit the current patches? Yes, I'd say so. Once merged I'll send a kernel patch to use -mindirect-branch-cs-prefix for all RETPOLINE builds. I posted an SLS patch but haven't really had much feedback on that yet, we'll see.
[Bug target/102953] Improvements to CET-IBT and ENDBR generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953 peterz at infradead dot org changed: What|Removed |Added CC||peterz at infradead dot org --- Comment #5 from peterz at infradead dot org --- (In reply to H.J. Lu from comment #1) > (In reply to Andrew Cooper from comment #0) > > > > Finally, one minor code generation improvement. When GCC emits a direct > > call/jmp to an ENDBR'd symbol, it can actually use sym+4 as an optimisation > > to skip the ENDBR instruction (not needed for direct call/jmp's) and save on > > decode bandwidth. > > It is only safe for calling a static function whose address has been taken. Could be done wider using LTO, or pushed into the linker if you're willing to change the ELF format and augment it with STT_FUNC_BTI and R_*_BTI32, there the new relocation would mean +4 (or rather, skip landing pad) when aimed at STT_FUNC_BTI and be identical to _PC32 otherwise. The ELF thing doesn't help with reducing ENDBR emissions for global symbols since we can't ever tell who will take the address, but it will help with directly calling avoiding the landing pad. It also allows for a 'pseudo' LTO thing to 'seal' an executable, where it will strip the ENDBRs for all symbols that do not have a _PC32 rela. We can use EXPORT_SYMBOL*() to generate one such on purpose to avoid exported functions from being sealed. Anyway, I think there's much value in reducing the number of ENDBR instructions as much as possible and we should investigate how the toolchains can best help there.
[Bug target/102952] New code-gen options for retpolines and straight line speculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 peterz at infradead dot org changed: What|Removed |Added CC||peterz at infradead dot org --- Comment #7 from peterz at infradead dot org --- (In reply to H.J. Lu from comment #3) > Created attachment 51678 [details] > A patch to add -mharden-sls= > > x86: Add -mharden-sls=[none|all|return|indirect-branch] > > Generate code to mitigate against straight line speculation. I'm getting (a lot) spurious int3 instructions with this, for example: 0280 : 280: 48 81 8f 90 00 00 00 00 02 00 00orq$0x200,0x90(%rdi) 28b: 48 8b 47 20 mov0x20(%rdi),%rax 28f: 48 89 87 98 00 00 00mov%rax,0x98(%rdi) 296: e9 75 ff ff ff jmp210 29b: cc int3 That's not an *indirect* jump there.
[Bug target/102952] New code-gen options for retpolines and straight line speculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 --- Comment #9 from peterz at infradead dot org --- Created attachment 51683 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51683&action=edit kernel patch to test -mharden-sls=all $ make O=defconfig CC=gcc-12.0.0 arch/x86/entry/common.o ... arch/x86/entry/common.o: warning: objtool: do_SYSENTER_32()+0x1b: unreachable instruction
[Bug target/102952] New code-gen options for retpolines and straight line speculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 --- Comment #10 from peterz at infradead dot org --- (In reply to H.J. Lu from comment #4) > Created attachment 51679 [details] > A patch to add -mindirect-branch-cs-prefix > > It adds CS prefix to call and jmp to thunk when converting indirect call > and jump. This seems to work as expected, I get CS prefix on all high reg thunk calls and no CS prefix on the low reg calls. Example: 29241: e8 00 00 00 00 call 29246 29242: R_X86_64_PLT32 __x86_indirect_thunk_rax-0x4 292c8: 2e e8 00 00 00 00 cs call 292ce 292ca: R_X86_64_PLT32 __x86_indirect_thunk_r10-0x4
[Bug target/102952] New code-gen options for retpolines and straight line speculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 --- Comment #13 from peterz at infradead dot org --- (In reply to H.J. Lu from comment #12) > (In reply to peterz from comment #9) > > Created attachment 51683 [details] > > kernel patch to test -mharden-sls=all > > > > $ make O=defconfig CC=gcc-12.0.0 arch/x86/entry/common.o > > ... > > arch/x86/entry/common.o: warning: objtool: do_SYSENTER_32()+0x1b: > > unreachable instruction > > Please try the v2 patch. Per comment #6 this should be v3, no? Anyway, the good news is that I now seem to have a kernel image with lots of extra int3 instructions, but all in the right place. *However*, I seem to be missing a few: 36f4: 41 5f pop%r15 36f6: e9 00 00 00 00 jmp36fb <__do_set_cpus_allowed+0x5b> 36f7: R_X86_64_PLT32__x86_indirect_thunk_rax-0x4 36fb: 48 8b 87 90 02 00 00mov0x290(%rdi),%rax There should be one after the jmp __x86_indirect_thunk_* thingy. I'll do an objtool patch to search for missing int3, but that'll have to wait until tomorrow, it's past midnight.
[Bug target/102952] New code-gen options for retpolines and straight line speculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 --- Comment #20 from peterz at infradead dot org --- (In reply to H.J. Lu from comment #19) > Created attachment 51685 [details] > The v4 patch to add -mharden-sls= That looks to do the right thing! Let me go write more validation stuff to double check. Thanks!
[Bug target/102952] New code-gen options for retpolines and straight line speculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 --- Comment #21 from peterz at infradead dot org --- (In reply to H.J. Lu from comment #19) > Created attachment 51685 [details] > The v4 patch to add -mharden-sls= I seem to have found one 'funny': kernel/bpf/core.o: warning: objtool: ___bpf_prog_run()+0x23: unreachable instruction 27e6: 48 8b 14 d5 00 00 00mov0x0(,%rdx,8),%rdx 27ed: 00 27ea: R_X86_64_32S .rodata..c_jump_table 27ee: e9 00 00 00 00 jmp27f3 <___bpf_prog_run+0x23> 27ef: R_X86_64_PLT32__x86_indirect_thunk_rdx-0x4 27f3: cc int3 27f4: cc int3 27f5: 8b 53 04mov0x4(%rbx),%edx Other than that, it seems I need to go clean up our asm ...
[Bug target/104816] -fcf-protection=branch should generate endbr instead of notrack jumps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104816 --- Comment #12 from peterz at infradead dot org --- On Tue, May 24, 2022 at 04:06:08PM +, cvs-commit at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104816 > > --- Comment #11 from CVS Commits --- > The master branch has been updated by H.J. Lu : > > https://gcc.gnu.org/g:2f4f7de787e5844515d27b2269fc472f95a9916a > > commit r13-744-g2f4f7de787e5844515d27b2269fc472f95a9916a > Author: H.J. Lu > Date: Fri Mar 11 12:51:34 2022 -0800 > > x86: Document -mcet-switch > > When -fcf-protection=branch is used, the compiler will generate jump > tables for switch statements where the indirect jump is prefixed with > the NOTRACK prefix, so it can jump to non-ENDBR targets. Since the > indirect jump targets are generated by the compiler and stored in > read-only memory, this does not result in a direct loss of hardening. > But if the jump table index is attacker-controlled, the indirect jump > may not be constrained by CET. Notrack indirect jumps are fully susceptible to speculation attacks.
[Bug middle-end/119279] Specifying frame pointer dependency in inline asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119279 --- Comment #8 from peterz at infradead dot org --- On Fri, Mar 14, 2025 at 08:44:19AM +, torva...@linux-foundation.org wrote: > Now, I agree that the kernel could likely solve issues by just "don't check so > much, then". But our build-time object file checking does actually find real > issues, because we tend to have various constraints that normal programs don't > have (and we do various rewriting of the object files due to them, which is > another big reason for checking: making sure the rewriting tool understands > the > code it rewrites!). There is the additional constraint that as long as the frame pointer unwinder does not have to guess, it is assumed to be 100% correct. By having calls before frame setup, we get functions missing from the unwind. This means that unwind can no longer be relied upon to determine live-ness of a function. Live Patch in particular relies on this; it needs to determine if a function is in-use. Notably replacing functions that are in-use is a very bad idea.
[Bug middle-end/119279] Specifying frame pointer dependency in inline asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119279 --- Comment #10 from peterz at infradead dot org --- On Fri, Mar 14, 2025 at 09:41:07AM +, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119279 > > --- Comment #9 from Jakub Jelinek --- > (In reply to peterz from comment #8) > > There is the additional constraint that as long as the frame pointer > > unwinder does not have to guess, it is assumed to be 100% correct. > > > > By having calls before frame setup, we get functions missing from the > > unwind. This means that unwind can no longer be relied upon to determine > > live-ness of a function. > > > > Live Patch in particular relies on this; it needs to determine if a > > function is in-use. Notably replacing functions that are in-use is a > > very bad idea. > > So you build with -fno-optimize-sibling-calls and somehow ensure that the > kernel never uses [[gnu::musttail]], [[clang::musttail]] or > __attribute__((musttail)) (those will tail call even if > -fno-optimize-sibling-calls)? No, tail calls are not a problem. The JMP is done instead of RET, this frame is done. The new function will set up a new frame. There is absolutely no difference between doing RET or JMP from the in-use perspective. After either, the function is done and no longer used. For some purposes (call-graph reconstruction etc.) tail-calls are a pain in the arse because they make the original function go missing from unwindws, but from a live-ness pov it doesn't matter, the function is complete.