nickdesaulniers added a comment. It would be really nice if we could limit this to a specific function somehow.
Quick feedback from giving Diff 488727 a spin on the Linux kernel: via `ARCH=arm make LLVM=1 -j128 -s allyesconfig all` I found: CC drivers/net/ethernet/mellanox/mlx5/core/en_main.o drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3597:12: error: stack frame size (1256) exceeds limit (1024) in 'mlx5e_setup_tc' [-Werror,-Wframe-larger-than] static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type, ^ When I rebuild with: ARCH=arm make LLVM=1 -j128 drivers/net/ethernet/mellanox/mlx5/core/en_main.o KCFLAGS=-Rpass-analysis=stack-frame-layout I get a printout of the stack usage of every function in this TU. That's 1929 lines of text output...I only care about `mlx5e_setup_tc`. Is there a way to limit that? Filtering through the logs though, I do see: drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3599:1: remark: Function: mlx5e_setup_tc ... Offset: [SP-400], Type: Variable, Align: 8, Size: 352 Offset: [SP-752], Type: Variable, Align: 8, Size: 352 Offset: [SP-1088], Type: Variable, Align: 8, Size: 336 which is good! If I flip on debug info and rebuild, this looks like: drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3599:1: remark: Function: mlx5e_setup_tc ... Offset: [SP-400], Type: Variable, Align: 8, Size: 352 old_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2934 old_chs @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2958 new_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3539 Offset: [SP-752], Type: Variable, Align: 8, Size: 352 new_chs @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3000 Offset: [SP-1088], Type: Variable, Align: 8, Size: 336 new_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3424 Which is pretty awesome! https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_main.c#n3424 for the last one, which shows we should probably be heap allocating instances of `struct mlx5e_params` rather than stack allocating them! https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_main.c#n3000 shows the same for `struct mlx5e_channels`. Cool stuff! Now we can finally debug `-Wframe-larger-than=`!!! ================ Comment at: clang/test/Frontend/stack-layout-remark.c:8 +// RUN: mkdir -p %t +// RUN: %clang_cc1 %s -emit-codegen-only -triple x86_64-unknown-linux-gnu -target-cpu corei7 -Rpass-analysis=stack-frame-layout -o /dev/null -O0 2>&1 | FileCheck %s --check-prefix=O0-NODEBUG +// RUN: %clang_cc1 %s -emit-codegen-only -triple x86_64-unknown-linux-gnu -target-cpu corei7 -Rpass-analysis=stack-frame-layout -o /dev/null -O0 -debug-info-kind=constructor -dwarf-version=5 -debugger-tuning=gdb 2>&1 | FileCheck %s --check-prefix=O0-DEBUG ---------------- Please update: 1. the patch description/commit message 2. clang/docs/ReleaseNotes.rst to mention this new flag. I kind of wish that `-Wstack-frame-larger-than=` alluded to this somehow. ================ Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:191 + // initialize slot info + for (int Idx = ObjBeg, EndIdx = ObjEnd; Idx != EndIdx; ++Idx) { + // This is a dead slot, so skip it ---------------- do we need `EndIdx`, or can we simply use `Idx != ObjEnd` as the loop terminating condition? Doesn't look like we use `ObjBeg` afterwards either; is that necessary? Seems like the call to `MFI.getObjectIndexBegin();` could happen in the initialization list of this `for `loop? ================ Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:200-202 + sort(SlotInfo, [](const SlotData &A, const SlotData &B) { + return A.Offset > B.Offset; + }); ---------------- Consider implementing `operator<` on `SlotData`; then I think you can simply call std::sort on SlotInfo and drop this lamda. ================ Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:206-208 + for (const DILocalVariable *N : SlotMap[Info.Slot]) { + emitSourceLocRemark(MF, N, Rem); + } ---------------- remove braces https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements ================ Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:216 + SlotDbgMap genSlotDbgMapping(MachineFunction &MF) { + SlotDbgMap SlotDebugMap; + SmallVector<MachineInstr *> Dbg; ---------------- If you sink this def into the for loop, then you don't need to clear it. ================ Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:220-222 + for (MachineFunction::VariableDbgInfo &DI : MF.getVariableDbgInfo()) { + SlotDebugMap[DI.Slot].insert(DI.Var); + } ---------------- remove braces https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements ================ Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:227-228 + for (MachineInstr &MI : MBB) { + if (MI.getNumMemOperands() == 0) + continue; + for (MachineMemOperand *MO : MI.memoperands()) { ---------------- I guess this can be removed since the `for` loop below wont do anything if there's 0 memoperands? ================ Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:65 + + SlotData(const MachineFrameInfo &MFI, const int ValOffset, const int Idx) { + Slot = Idx; ---------------- thegameg wrote: > @paulkirth make sure to mark these code review comments as done when implemented, please! ================ Comment at: llvm/test/CodeGen/AArch64/O0-pipeline.ll:76-77 ; CHECK-NEXT: Insert CFI remember/restore state instructions +; CHECK-NEXT: Lazy Machine Block Frequency Analysis +; CHECK-NEXT: Machine Optimization Remark Emitter +; CHECK-NEXT: Stack Frame Layout Analysis ---------------- Dang, this adds a bunch of passes to O0 pipelines...any creative ideas on how to not do that? ================ Comment at: llvm/test/CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.ll:43-46 +; HOTNESS: Executing Pass 'Stack Frame Layout Analysis' +; HOTNESS-NEXT: Freeing Pass 'Machine Optimization Remark Emitter' +; HOTNESS-NEXT: Freeing Pass 'Lazy Machine Block Frequency Analysis' +; HOTNESS-NEXT: Freeing Pass 'Stack Frame Layout Analysis' ---------------- what's going on in this test? Looks like the pass is being run twice or something? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D135488/new/ https://reviews.llvm.org/D135488 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits