rnk added a comment. For maintenance reasons, I'd really prefer it if we could find a way to reuse the existing code that calls an external stack probe function. What do you think about taking a look at X86RetpolineThunks.cpp and doing something similar to that? Basically, when the user sets `-fstack-clash-protection`, LLVM will emit a small comdat+weak function into every object file that has the same ABI as the existing stack probe mechanism. For other prior art, you can also look at how `__clang_call_terminate` works.
================ Comment at: llvm/lib/Target/X86/X86FrameLowering.cpp:400 + !(STI.isOSWindows() && !STI.isTargetMachO()); + if (InlineStackClashProtector && !InEpilogue) { + const uint64_t PageSize = TLI.getStackProbeSize(MF); ---------------- serge-sans-paille wrote: > efriedma wrote: > > Why is this code in a different location from the stack probing code that > > generates a call? > Because `BuildStackAdjustment` has more callsites than just emitPrologue and > we want to capture all stack manipulation. If you care about those cases, we should have tests for all of them. These are all the cases: 1. TailCallReturnAddrDelta: When doing guaranteed TCO for a callee with >4K of argument memory 2. eliminateCallFramePseudoInstr: When doing stack adjustments to set up a call with more than 4K of arguments Both of these cases involve setting up arguments to calls, and I think we can guarantee that the compiler will emit writes to every argument stack slot. So, to set up one of these cases, we'd have this code: ``` struct EightK { int a[2048]; }; void passEightK(EightK); void foo() { EightK x; memset(&x, 0, sizeof(x)); passEightK(x); // some targets will pass indirect, 32-bit probably uses byval. } ``` In this case, LLVM happens to use rep;movsl to copy the argument memory. It could use memcpy, but either way, it will probe all those bytes. I think that means it's safe to move to emitSPUpdate. I would greatly prefer that `BuildStackAdjustment` remain a simple primitive that generates a single instruction. ================ Comment at: llvm/lib/Target/X86/X86FrameLowering.cpp:353 +static const DenseMap<unsigned, unsigned> OpcodeToSPOperandIndex = { + {X86::MOV8mi, 3}, {X86::MOV16mi, 3}, {X86::MOV32mi, 3}, {X86::MOV64mi32, 3}, ---------------- Please no dynamically initialized globals. The LLVM-y thing would be to use a switch. ================ Comment at: llvm/test/CodeGen/X86/stack-clash-dynamic-alloca.ll:9-11 +; CHECK: subq $4096, %rsp +; CHECK: movq $0, (%rsp) + %a = alloca i32, i32 %n, align 16 ---------------- This seems like a good use case for update_llc_test_checks.py. I'd want to see the body of the loop, the cmp, the jl, etc. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D68720/new/ https://reviews.llvm.org/D68720 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits