rnk added a comment.

For maintenance reasons, I'd really prefer it if we could find a way to reuse 
the existing code that calls an external stack probe function. What do you 
think about taking a look at X86RetpolineThunks.cpp and doing something similar 
to that? Basically, when the user sets `-fstack-clash-protection`, LLVM will 
emit a small comdat+weak function into every object file that has the same ABI 
as the existing stack probe mechanism. For other prior art, you can also look 
at how `__clang_call_terminate` works.



================
Comment at: llvm/lib/Target/X86/X86FrameLowering.cpp:400
+        !(STI.isOSWindows() && !STI.isTargetMachO());
+    if (InlineStackClashProtector && !InEpilogue) {
+      const uint64_t PageSize = TLI.getStackProbeSize(MF);
----------------
serge-sans-paille wrote:
> efriedma wrote:
> > Why is this code in a different location from the stack probing code that 
> > generates a call?
> Because `BuildStackAdjustment` has more callsites than just emitPrologue and 
> we want to capture all stack manipulation.
If you care about those cases, we should have tests for all of them. These are 
all the cases:
1. TailCallReturnAddrDelta: When doing guaranteed TCO for a callee with >4K of 
argument memory
2. eliminateCallFramePseudoInstr: When doing stack adjustments to set up a call 
with more than 4K of arguments

Both of these cases involve setting up arguments to calls, and I think we can 
guarantee that the compiler will emit writes to every argument stack slot. So, 
to set up one of these cases, we'd have this code:
```
struct EightK { int a[2048]; };
void passEightK(EightK);
void foo() {
  EightK x;
  memset(&x, 0, sizeof(x));
  passEightK(x); // some targets will pass indirect, 32-bit probably uses byval.
}
```

In this case, LLVM happens to use rep;movsl to copy the argument memory. It 
could use memcpy, but either way, it will probe all those bytes.

I think that means it's safe to move to emitSPUpdate. I would greatly prefer 
that `BuildStackAdjustment` remain a simple primitive that generates a single 
instruction.


================
Comment at: llvm/lib/Target/X86/X86FrameLowering.cpp:353
 
+static const DenseMap<unsigned, unsigned> OpcodeToSPOperandIndex = {
+    {X86::MOV8mi, 3}, {X86::MOV16mi, 3}, {X86::MOV32mi, 3}, {X86::MOV64mi32, 
3},
----------------
Please no dynamically initialized globals. The LLVM-y thing would be to use a 
switch.


================
Comment at: llvm/test/CodeGen/X86/stack-clash-dynamic-alloca.ll:9-11
+; CHECK: subq   $4096, %rsp
+; CHECK: movq   $0, (%rsp)
+  %a = alloca i32, i32 %n, align 16
----------------
This seems like a good use case for update_llc_test_checks.py. I'd want to see 
the body of the loop, the cmp, the jl, etc.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68720/new/

https://reviews.llvm.org/D68720



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to