[llvm-branch-commits] [flang] [flang][OpenMP] Update `do concurrent` mapping pass to use `fir.do_concurrent` op (PR #138489)
llvmbot wrote: @llvm/pr-subscribers-flang-fir-hlfir Author: Kareem Ergawy (ergawy) Changes This PR updates the `do concurrent` to OpenMP mapping pass to use the newly added `fir.do_concurrent` ops that were recently added upstream instead of handling nests of `fir.do_loop ... unordered` ops. Parent PR: https://github.com/llvm/llvm-project/pull/137928. --- Patch is 34.21 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138489.diff 9 Files Affected: - (modified) flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp (+84-278) - (modified) flang/test/Transforms/DoConcurrent/basic_device.mlir (+13-11) - (modified) flang/test/Transforms/DoConcurrent/basic_host.f90 (-3) - (modified) flang/test/Transforms/DoConcurrent/basic_host.mlir (+14-12) - (modified) flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90 (-3) - (removed) flang/test/Transforms/DoConcurrent/loop_nest_test.f90 (-92) - (modified) flang/test/Transforms/DoConcurrent/multiple_iteration_ranges.f90 (-3) - (modified) flang/test/Transforms/DoConcurrent/non_const_bounds.f90 (-3) - (modified) flang/test/Transforms/DoConcurrent/not_perfectly_nested.f90 (+11-13) ``diff diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp index 2c069860ffdca..0fdb302fe10ca 100644 --- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp +++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp @@ -6,6 +6,7 @@ // //===--===// +#include "flang/Optimizer/Builder/FIRBuilder.h" #include "flang/Optimizer/Dialect/FIROps.h" #include "flang/Optimizer/OpenMP/Passes.h" #include "flang/Optimizer/OpenMP/Utils.h" @@ -28,8 +29,10 @@ namespace looputils { /// Stores info needed about the induction/iteration variable for each `do /// concurrent` in a loop nest. struct InductionVariableInfo { - InductionVariableInfo(fir::DoLoopOp doLoop) { populateInfo(doLoop); } - + InductionVariableInfo(fir::DoConcurrentLoopOp loop, +mlir::Value inductionVar) { +populateInfo(loop, inductionVar); + } /// The operation allocating memory for iteration variable. mlir::Operation *iterVarMemDef; /// the operation(s) updating the iteration variable with the current @@ -45,7 +48,7 @@ struct InductionVariableInfo { /// ... /// %i:2 = hlfir.declare %0 {uniq_name = "_QFEi"} : ... /// ... - /// fir.do_loop %ind_var = %lb to %ub step %s unordered { + /// fir.do_concurrent.loop (%ind_var) = (%lb) to (%ub) step (%s) { /// %ind_var_conv = fir.convert %ind_var : (index) -> i32 /// fir.store %ind_var_conv to %i#1 : !fir.ref /// ... @@ -62,14 +65,14 @@ struct InductionVariableInfo { /// Note: The current implementation is dependent on how flang emits loop /// bodies; which is sufficient for the current simple test/use cases. If this /// proves to be insufficient, this should be made more generic. - void populateInfo(fir::DoLoopOp doLoop) { + void populateInfo(fir::DoConcurrentLoopOp loop, mlir::Value inductionVar) { mlir::Value result = nullptr; // Checks if a StoreOp is updating the memref of the loop's iteration // variable. auto isStoringIV = [&](fir::StoreOp storeOp) { // Direct store into the IV memref. - if (storeOp.getValue() == doLoop.getInductionVar()) { + if (storeOp.getValue() == inductionVar) { indVarUpdateOps.push_back(storeOp); return true; } @@ -77,7 +80,7 @@ struct InductionVariableInfo { // Indirect store into the IV memref. if (auto convertOp = mlir::dyn_cast( storeOp.getValue().getDefiningOp())) { -if (convertOp.getOperand() == doLoop.getInductionVar()) { +if (convertOp.getOperand() == inductionVar) { indVarUpdateOps.push_back(convertOp); indVarUpdateOps.push_back(storeOp); return true; @@ -87,7 +90,7 @@ struct InductionVariableInfo { return false; }; -for (mlir::Operation &op : doLoop) { +for (mlir::Operation &op : loop) { if (auto storeOp = mlir::dyn_cast(op)) if (isStoringIV(storeOp)) { result = storeOp.getMemref(); @@ -100,219 +103,7 @@ struct InductionVariableInfo { } }; -using LoopNestToIndVarMap = -llvm::MapVector; - -/// Loop \p innerLoop is considered perfectly-nested inside \p outerLoop iff -/// there are no operations in \p outerloop's body other than: -/// -/// 1. the operations needed to assign/update \p outerLoop's induction variable. -/// 2. \p innerLoop itself. -/// -/// \p return true if \p innerLoop is perfectly nested inside \p outerLoop -/// according to the above definition. -bool isPerfectlyNested(fir::DoLoopOp outerLoop, fir::DoLoopOp innerLoop) { - mlir::ForwardSliceOptions forwardSliceOptions; - forwardSliceOptions.inclusive = true; - // The following
[llvm-branch-commits] [flang] [flang][OpenMP] Update `do concurrent` mapping pass to use `fir.do_concurrent` op (PR #138489)
https://github.com/ergawy created https://github.com/llvm/llvm-project/pull/138489 This PR updates the `do concurrent` to OpenMP mapping pass to use the newly added `fir.do_concurrent` ops that were recently added upstream instead of handling nests of `fir.do_loop ... unordered` ops. Parent PR: https://github.com/llvm/llvm-project/pull/137928. Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][NPM] Complete optimized regalloc pipeline (PR #138491)
https://github.com/optimisan created https://github.com/llvm/llvm-project/pull/138491 Also fill in some other passes. >From d415354c8ef7e761a8bcbc83420501bca8abc2f3 Mon Sep 17 00:00:00 2001 From: Akshat Oke Date: Mon, 5 May 2025 06:30:03 + Subject: [PATCH] [AMDGPU][NPM] Complete optimized regalloc pipeline Also fill in some other passes. --- llvm/include/llvm/Passes/CodeGenPassBuilder.h | 2 +- .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 41 +-- llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h | 1 + llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll | 8 +--- 4 files changed, 42 insertions(+), 10 deletions(-) diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h b/llvm/include/llvm/Passes/CodeGenPassBuilder.h index ddd258c21f593..982bb16e71eab 100644 --- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h +++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h @@ -574,7 +574,7 @@ template class CodeGenPassBuilder { /// Insert InsertedPass pass after TargetPass pass. /// Only machine function passes are supported. template - void insertPass(InsertedPassT &&Pass) { + void insertPass(InsertedPassT &&Pass) const { AfterCallbacks.emplace_back( [&](StringRef Name, MachineFunctionPassManager &MFPM) mutable { if (Name == TargetPassT::name()) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp index 56f808a553388..076440f869cd0 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp @@ -2162,7 +2162,44 @@ void AMDGPUCodeGenPassBuilder::addMachineSSAOptimization( addPass(SIShrinkInstructionsPass()); } +void AMDGPUCodeGenPassBuilder::addOptimizedRegAlloc( +AddMachinePass &addPass) const { + if (EnableDCEInRA) +insertPass(DeadMachineInstructionElimPass()); + + // FIXME: when an instruction has a Killed operand, and the instruction is + // inside a bundle, seems only the BUNDLE instruction appears as the Kills of + // the register in LiveVariables, this would trigger a failure in verifier, + // we should fix it and enable the verifier. + if (OptVGPRLiveRange) +insertPass>( +SIOptimizeVGPRLiveRangePass()); + + // This must be run immediately after phi elimination and before + // TwoAddressInstructions, otherwise the processing of the tied operand of + // SI_ELSE will introduce a copy of the tied operand source after the else. + insertPass(SILowerControlFlowPass()); + + if (EnableRewritePartialRegUses) +insertPass(GCNRewritePartialRegUsesPass()); + + if (isPassEnabled(EnablePreRAOptimizations)) +insertPass(GCNPreRAOptimizationsPass()); + // Allow the scheduler to run before SIWholeQuadMode inserts exec manipulation + // instructions that cause scheduling barriers. + insertPass(SIWholeQuadModePass()); + + if (OptExecMaskPreRA) +insertPass(SIOptimizeExecMaskingPreRAPass()); + + // This is not an essential optimization and it has a noticeable impact on + // compilation time, so we only enable it from O2. + if (TM.getOptLevel() > CodeGenOptLevel::Less) +insertPass(SIFormMemoryClausesPass()); + + Base::addOptimizedRegAlloc(addPass); +} Error AMDGPUCodeGenPassBuilder::addRegAssignmentOptimized( AddMachinePass &addPass) const { @@ -2190,21 +2227,19 @@ Error AMDGPUCodeGenPassBuilder::addRegAssignmentOptimized( addPass(SIPreAllocateWWMRegsPass()); // For allocating other wwm register operands. - // addRegAlloc(addPass, RegAllocPhase::WWM); addPass(RAGreedyPass({onlyAllocateWWMRegs, "wwm"})); addPass(SILowerWWMCopiesPass()); addPass(VirtRegRewriterPass(false)); addPass(AMDGPUReserveWWMRegsPass()); // For allocating per-thread VGPRs. - // addRegAlloc(addPass, RegAllocPhase::VGPR); addPass(RAGreedyPass({onlyAllocateVGPRs, "vgpr"})); addPreRewrite(addPass); addPass(VirtRegRewriterPass(true)); - // TODO: addPass(AMDGPUMarkLastScratchLoadPass()); + addPass(AMDGPUMarkLastScratchLoadPass()); return Error::success(); } diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h index 589123274d0f5..3c62cd19c6e57 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h @@ -182,6 +182,7 @@ class AMDGPUCodeGenPassBuilder void addPostRegAlloc(AddMachinePass &) const; void addPreEmitPass(AddMachinePass &) const; Error addRegAssignmentOptimized(AddMachinePass &) const; + void addOptimizedRegAlloc(AddMachinePass &) const; /// Check if a pass is enabled given \p Opt option. The option always /// overrides defaults if explicitly used. Otherwise its default will be used diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll index e9b57515e71e0..e0b9e2c0500e6 100644 --- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll +++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll @@ -7,15 +7,11 @@ ;
[llvm-branch-commits] [llvm] [AMDGPU][NPM] Complete optimized regalloc pipeline (PR #138491)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Akshat Oke (optimisan) Changes Also fill in some other passes. --- Full diff: https://github.com/llvm/llvm-project/pull/138491.diff 4 Files Affected: - (modified) llvm/include/llvm/Passes/CodeGenPassBuilder.h (+1-1) - (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+38-3) - (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h (+1) - (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll (+2-6) ``diff diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h b/llvm/include/llvm/Passes/CodeGenPassBuilder.h index ddd258c21f593..982bb16e71eab 100644 --- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h +++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h @@ -574,7 +574,7 @@ template class CodeGenPassBuilder { /// Insert InsertedPass pass after TargetPass pass. /// Only machine function passes are supported. template - void insertPass(InsertedPassT &&Pass) { + void insertPass(InsertedPassT &&Pass) const { AfterCallbacks.emplace_back( [&](StringRef Name, MachineFunctionPassManager &MFPM) mutable { if (Name == TargetPassT::name()) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp index 56f808a553388..076440f869cd0 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp @@ -2162,7 +2162,44 @@ void AMDGPUCodeGenPassBuilder::addMachineSSAOptimization( addPass(SIShrinkInstructionsPass()); } +void AMDGPUCodeGenPassBuilder::addOptimizedRegAlloc( +AddMachinePass &addPass) const { + if (EnableDCEInRA) +insertPass(DeadMachineInstructionElimPass()); + + // FIXME: when an instruction has a Killed operand, and the instruction is + // inside a bundle, seems only the BUNDLE instruction appears as the Kills of + // the register in LiveVariables, this would trigger a failure in verifier, + // we should fix it and enable the verifier. + if (OptVGPRLiveRange) +insertPass>( +SIOptimizeVGPRLiveRangePass()); + + // This must be run immediately after phi elimination and before + // TwoAddressInstructions, otherwise the processing of the tied operand of + // SI_ELSE will introduce a copy of the tied operand source after the else. + insertPass(SILowerControlFlowPass()); + + if (EnableRewritePartialRegUses) +insertPass(GCNRewritePartialRegUsesPass()); + + if (isPassEnabled(EnablePreRAOptimizations)) +insertPass(GCNPreRAOptimizationsPass()); + // Allow the scheduler to run before SIWholeQuadMode inserts exec manipulation + // instructions that cause scheduling barriers. + insertPass(SIWholeQuadModePass()); + + if (OptExecMaskPreRA) +insertPass(SIOptimizeExecMaskingPreRAPass()); + + // This is not an essential optimization and it has a noticeable impact on + // compilation time, so we only enable it from O2. + if (TM.getOptLevel() > CodeGenOptLevel::Less) +insertPass(SIFormMemoryClausesPass()); + + Base::addOptimizedRegAlloc(addPass); +} Error AMDGPUCodeGenPassBuilder::addRegAssignmentOptimized( AddMachinePass &addPass) const { @@ -2190,21 +2227,19 @@ Error AMDGPUCodeGenPassBuilder::addRegAssignmentOptimized( addPass(SIPreAllocateWWMRegsPass()); // For allocating other wwm register operands. - // addRegAlloc(addPass, RegAllocPhase::WWM); addPass(RAGreedyPass({onlyAllocateWWMRegs, "wwm"})); addPass(SILowerWWMCopiesPass()); addPass(VirtRegRewriterPass(false)); addPass(AMDGPUReserveWWMRegsPass()); // For allocating per-thread VGPRs. - // addRegAlloc(addPass, RegAllocPhase::VGPR); addPass(RAGreedyPass({onlyAllocateVGPRs, "vgpr"})); addPreRewrite(addPass); addPass(VirtRegRewriterPass(true)); - // TODO: addPass(AMDGPUMarkLastScratchLoadPass()); + addPass(AMDGPUMarkLastScratchLoadPass()); return Error::success(); } diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h index 589123274d0f5..3c62cd19c6e57 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h @@ -182,6 +182,7 @@ class AMDGPUCodeGenPassBuilder void addPostRegAlloc(AddMachinePass &) const; void addPreEmitPass(AddMachinePass &) const; Error addRegAssignmentOptimized(AddMachinePass &) const; + void addOptimizedRegAlloc(AddMachinePass &) const; /// Check if a pass is enabled given \p Opt option. The option always /// overrides defaults if explicitly used. Otherwise its default will be used diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll index e9b57515e71e0..e0b9e2c0500e6 100644 --- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll +++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll @@ -7,15 +7,11 @@ ; RUN: llc -O3 -enable-new-pm -mtriple=amdgcn--amdhsa -print-pipeline-passes < %s 2>&1 \ ; RUN: | FileCheck -check-prefix=GCN-O3 %s - ;
[llvm-branch-commits] [llvm] [AMDGPU][NPM] Complete optimized regalloc pipeline (PR #138491)
https://github.com/optimisan ready_for_review https://github.com/llvm/llvm-project/pull/138491 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][NPM] Complete optimized regalloc pipeline (PR #138491)
optimisan wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/138491?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#138491** https://app.graphite.dev/github/pr/llvm/llvm-project/138491?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> ๐ https://app.graphite.dev/github/pr/llvm/llvm-project/138491?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#136818** https://app.graphite.dev/github/pr/llvm/llvm-project/136818?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/138491 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)
@@ -0,0 +1,19 @@ +;; Test if the callee_type metadata is dropped when it is attached +;; to a direct function call during instcombine. + +; RUN: opt < %s -passes="instcombine" -disable-verify -S | FileCheck %s arsenm wrote: ```suggestion ; RUN: opt -S -passes=instcombine -S < %s | FileCheck %s ``` Definitely do not disable the verifier https://github.com/llvm/llvm-project/pull/87573 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)
@@ -0,0 +1,19 @@ +;; Test if the callee_type metadata is dropped when it is attached +;; to a direct function call during instcombine. + +; RUN: opt < %s -passes="instcombine" -disable-verify -S | FileCheck %s + +define i32 @_Z3barv() local_unnamed_addr !type !3 { +entry: + ; CHECK: %call = call i32 @_Z3fooc(i8 97) + ; CHECK-NOT: %call = call i32 @_Z3fooc(i8 97), !callee_type !1 arsenm wrote: NOT checks are fragile, and this is far too specific, Generate full checks https://github.com/llvm/llvm-project/pull/87573 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] release/20.x: [wasm-ld] Refactor WasmSym from static globals to per-link context (#134970) (PR #137620)
https://github.com/vgvassilev milestoned https://github.com/llvm/llvm-project/pull/137620 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Update `do concurrent` mapping pass to use `fir.do_concurrent` op (PR #138489)
https://github.com/ergawy updated https://github.com/llvm/llvm-project/pull/138489 >From ac0cde7576c49a00e3591254d6891879b52bee81 Mon Sep 17 00:00:00 2001 From: ergawy Date: Mon, 5 May 2025 02:23:04 -0500 Subject: [PATCH] [flang][OpenMP] Update `do concurrent` mapping pass to use `fir.do_concurrent` op This PR updates the `do concurrent` to OpenMP mapping pass to use the newly added `fir.do_concurrent` ops that were recently added upstream instead of handling nests of `fir.do_loop ... unordered` ops. Parent PR: https://github.com/llvm/llvm-project/pull/137928. --- .../OpenMP/DoConcurrentConversion.cpp | 362 -- .../Transforms/DoConcurrent/basic_device.mlir | 24 +- .../Transforms/DoConcurrent/basic_host.f90| 3 - .../Transforms/DoConcurrent/basic_host.mlir | 26 +- .../DoConcurrent/locally_destroyed_temp.f90 | 3 - .../DoConcurrent/loop_nest_test.f90 | 92 - .../multiple_iteration_ranges.f90 | 3 - .../DoConcurrent/non_const_bounds.f90 | 3 - .../DoConcurrent/not_perfectly_nested.f90 | 24 +- 9 files changed, 122 insertions(+), 418 deletions(-) delete mode 100644 flang/test/Transforms/DoConcurrent/loop_nest_test.f90 diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp index 2c069860ffdca..0fdb302fe10ca 100644 --- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp +++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp @@ -6,6 +6,7 @@ // //===--===// +#include "flang/Optimizer/Builder/FIRBuilder.h" #include "flang/Optimizer/Dialect/FIROps.h" #include "flang/Optimizer/OpenMP/Passes.h" #include "flang/Optimizer/OpenMP/Utils.h" @@ -28,8 +29,10 @@ namespace looputils { /// Stores info needed about the induction/iteration variable for each `do /// concurrent` in a loop nest. struct InductionVariableInfo { - InductionVariableInfo(fir::DoLoopOp doLoop) { populateInfo(doLoop); } - + InductionVariableInfo(fir::DoConcurrentLoopOp loop, +mlir::Value inductionVar) { +populateInfo(loop, inductionVar); + } /// The operation allocating memory for iteration variable. mlir::Operation *iterVarMemDef; /// the operation(s) updating the iteration variable with the current @@ -45,7 +48,7 @@ struct InductionVariableInfo { /// ... /// %i:2 = hlfir.declare %0 {uniq_name = "_QFEi"} : ... /// ... - /// fir.do_loop %ind_var = %lb to %ub step %s unordered { + /// fir.do_concurrent.loop (%ind_var) = (%lb) to (%ub) step (%s) { /// %ind_var_conv = fir.convert %ind_var : (index) -> i32 /// fir.store %ind_var_conv to %i#1 : !fir.ref /// ... @@ -62,14 +65,14 @@ struct InductionVariableInfo { /// Note: The current implementation is dependent on how flang emits loop /// bodies; which is sufficient for the current simple test/use cases. If this /// proves to be insufficient, this should be made more generic. - void populateInfo(fir::DoLoopOp doLoop) { + void populateInfo(fir::DoConcurrentLoopOp loop, mlir::Value inductionVar) { mlir::Value result = nullptr; // Checks if a StoreOp is updating the memref of the loop's iteration // variable. auto isStoringIV = [&](fir::StoreOp storeOp) { // Direct store into the IV memref. - if (storeOp.getValue() == doLoop.getInductionVar()) { + if (storeOp.getValue() == inductionVar) { indVarUpdateOps.push_back(storeOp); return true; } @@ -77,7 +80,7 @@ struct InductionVariableInfo { // Indirect store into the IV memref. if (auto convertOp = mlir::dyn_cast( storeOp.getValue().getDefiningOp())) { -if (convertOp.getOperand() == doLoop.getInductionVar()) { +if (convertOp.getOperand() == inductionVar) { indVarUpdateOps.push_back(convertOp); indVarUpdateOps.push_back(storeOp); return true; @@ -87,7 +90,7 @@ struct InductionVariableInfo { return false; }; -for (mlir::Operation &op : doLoop) { +for (mlir::Operation &op : loop) { if (auto storeOp = mlir::dyn_cast(op)) if (isStoringIV(storeOp)) { result = storeOp.getMemref(); @@ -100,219 +103,7 @@ struct InductionVariableInfo { } }; -using LoopNestToIndVarMap = -llvm::MapVector; - -/// Loop \p innerLoop is considered perfectly-nested inside \p outerLoop iff -/// there are no operations in \p outerloop's body other than: -/// -/// 1. the operations needed to assign/update \p outerLoop's induction variable. -/// 2. \p innerLoop itself. -/// -/// \p return true if \p innerLoop is perfectly nested inside \p outerLoop -/// according to the above definition. -bool isPerfectlyNested(fir::DoLoopOp outerLoop, fir::DoLoopOp innerLoop) { - mlir::ForwardSliceOptions forwardSliceOptions; - forwardSliceOptions.inclusive = true; - // The follow
[llvm-branch-commits] [SPARC] Use op-then-neg instructions when we have VIS3 (PR #135717)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/135717 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [SPARC] Use op-then-neg instructions when we have VIS3 (PR #135717)
https://github.com/arsenm approved this pull request. Not sure isFNegFree is doing anything in this patch, would be best to drop it and separately test the combines it enables https://github.com/llvm/llvm-project/pull/135717 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [SPARC] Use op-then-neg instructions when we have VIS3 (PR #135717)
@@ -3605,6 +3605,12 @@ bool SparcTargetLowering::useLoadStackGuardNode(const Module &M) const { return true; } +bool SparcTargetLowering::isFNegFree(EVT VT) const { + if (Subtarget->isVIS3()) +return VT == MVT::f32 || VT == MVT::f64; + return false; +} arsenm wrote: This ideally would be completely separate patch from the patterns. This enables independent combines https://github.com/llvm/llvm-project/pull/135717 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)
@@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SetSplitVector(SDValue(N, ResNo), Lo, Hi); } +void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, + SDValue &Hi) { + EVT LoVT, HiVT; + SDLoc dl(LD); + std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0)); + + ISD::LoadExtType ExtType = LD->getExtensionType(); + SDValue Ch = LD->getChain(); + SDValue Ptr = LD->getBasePtr(); + EVT MemoryVT = LD->getMemoryVT(); + + EVT LoMemVT, HiMemVT; + std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT); + + EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), LD->getValueType(0).getSizeInBits()); + EVT MemIntVT = EVT::getIntegerVT(*DAG.getContext(), LD->getMemoryVT().getSizeInBits()); + SDValue ALD = DAG.getAtomicLoad(ExtType, dl, MemIntVT, IntVT, Ch, Ptr, + LD->getMemOperand()); jofrn wrote: This will fail for some reason, on at least atomic_vec2_ptr270 and atomic_vec2_half. https://github.com/llvm/llvm-project/pull/120640 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 >From d600ac360ed1e19693cc99f7785fd3f4cef579da Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 20 Dec 2024 06:14:28 -0500 Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector AtomicExpand fails for aligned `load atomic ` because it does not find a compatible library call. This change adds appropriate bitcasts so that the call can be lowered. commit-id:f430c1af --- llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +- llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 30 + .../X86/expand-atomic-non-integer.ll | 65 +++ 4 files changed, 163 insertions(+), 3 deletions(-) diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index c376de877ac7d..b6f1e9db9ce35 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + if (I->getType()->getScalarType()->isPointerTy() && + I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) { +unsigned AS = + cast(I->getType()->getScalarType())->getAddressSpace(); +ElementCount EC = cast(I->getType())->getElementCount(); +Value *BC = Builder.CreateBitCast( +Result, +VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)), +EC)); +Value *IntToPtr = Builder.CreateIntToPtr( +BC, VectorType::get(PointerType::get(Ctx, AS), EC)); +V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType()); + } else +V = Builder.CreateBitOrPointerCast(Result, I->getType()); +} else { V = Builder.CreateAlignedLoad(I->getType(), AllocaResult, AllocaAlignment); Builder.CreateLifetimeEnd(AllocaResult, SizeVal64); diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll b/llvm/test/CodeGen/ARM/atomic-load-store.ll index 560dfde356c29..36c1305a7c5df 100644 --- a/llvm/test/CodeGen/ARM/atomic-load-store.ll +++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll @@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double %val1) { store atomic double %val1, ptr %ptr seq_cst, align 8 ret void } + +define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 { +; ARM-LABEL: atomic_vec1_ptr: +; ARM: @ %bb.0: +; ARM-NEXT:ldr r0, [r0] +; ARM-NEXT:dmb ish +; ARM-NEXT:bx lr +; +; ARMOPTNONE-LABEL: atomic_vec1_ptr: +; ARMOPTNONE: @ %bb.0: +; ARMOPTNONE-NEXT:ldr r0, [r0] +; ARMOPTNONE-NEXT:dmb ish +; ARMOPTNONE-NEXT:bx lr +; +; THUMBTWO-LABEL: atomic_vec1_ptr: +; THUMBTWO: @ %bb.0: +; THUMBTWO-NEXT:ldr r0, [r0] +; THUMBTWO-NEXT:dmb ish +; THUMBTWO-NEXT:bx lr +; +; THUMBONE-LABEL: atomic_vec1_ptr: +; THUMBONE: @ %bb.0: +; THUMBONE-NEXT:push {r7, lr} +; THUMBONE-NEXT:movs r1, #0 +; THUMBONE-NEXT:mov r2, r1 +; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4 +; THUMBONE-NEXT:pop {r7, pc} +; +; ARMV4-LABEL: atomic_vec1_ptr: +; ARMV4: @ %bb.0: +; ARMV4-NEXT:push {r11, lr} +; ARMV4-NEXT:mov r1, #2 +; ARMV4-NEXT:bl __atomic_load_4 +; ARMV4-NEXT:pop {r11, lr} +; ARMV4-NEXT:mov pc, lr +; +; ARMV6-LABEL: atomic_vec1_ptr: +; ARMV6: @ %bb.0: +; ARMV6-NEXT:mov r1, #0 +; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5 +; ARMV6-NEXT:ldr r0, [r0] +; ARMV6-NEXT:bx lr +; +; THUMBM-LABEL: atomic_vec1_ptr: +; THUMBM: @ %bb.0: +; THUMBM-NEXT:ldr r0, [r0] +; THUMBM-NEXT:dmb sy +; THUMBM-NEXT:bx lr + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 08d0405345f57..4293df8c13571 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { ret <2 x i32> %ret } +define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec2_ptr_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT:popq %rax +; CHECK-NEXT:retq + %ret = load atomic <2 x ptr>, ptr %x acquire, align 16 + ret <2 x ptr> %ret +} + define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { ; CHECK3-LAB
[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120640 >From 28f6bf33268da6361d71070d5bf75f93a85b03eb Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 16:25:55 -0500 Subject: [PATCH] [SelectionDAG] Split vector types for atomic load Vector types that aren't widened are split so that a single ATOMIC_LOAD is issued for the entire vector at once. This change utilizes the load vectorization infrastructure in SelectionDAG in order to group the vectors. This enables SelectionDAG to translate vectors with type bfloat,half. commit-id:3a045357 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 39 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++ 3 files changed, 211 insertions(+) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index bdfa5f7741ad3..d8f402f529632 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi); + void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo, diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index 0edbbe8af623a..3e364276d1bcf 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SplitVecRes_STEP_VECTOR(N, Lo, Hi); break; case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break; + case ISD::ATOMIC_LOAD: +SplitVecRes_ATOMIC_LOAD(cast(N), Lo, Hi); +break; case ISD::LOAD: SplitVecRes_LOAD(cast(N), Lo, Hi); break; @@ -1421,6 +1424,42 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SetSplitVector(SDValue(N, ResNo), Lo, Hi); } +void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, + SDValue &Hi) { + EVT LoVT, HiVT; + SDLoc dl(LD); + std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0)); + + ISD::LoadExtType ExtType = LD->getExtensionType(); + SDValue Ch = LD->getChain(); + SDValue Ptr = LD->getBasePtr(); + EVT MemoryVT = LD->getMemoryVT(); + + EVT LoMemVT, HiMemVT; + std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT); + + EVT IntVT = + EVT::getIntegerVT(*DAG.getContext(), LD->getValueType(0).getSizeInBits()); + EVT MemIntVT = + EVT::getIntegerVT(*DAG.getContext(), 2 * LoMemVT.getSizeInBits()); + SDValue ALD = DAG.getAtomicLoad(ExtType, dl, MemIntVT, IntVT, Ch, Ptr, + LD->getMemOperand()); + + EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits()); + EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits()); + SDValue ExtractLo = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, LoIntVT, ALD, + DAG.getIntPtrConstant(0, dl)); + SDValue ExtractHi = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiIntVT, ALD, + DAG.getIntPtrConstant(1, dl)); + + Lo = DAG.getBitcast(LoVT, ExtractLo); + Hi = DAG.getBitcast(HiVT, ExtractHi); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(LD, 1), ALD.getValue(1)); +} + void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT, MachinePointerInfo &MPI, SDValue &Ptr, uint64_t *ScaledOffset) { diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 935d058a52f8f..42b0955824293 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -204,6 +204,68 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) { ret <2 x float> %ret } +define <2 x half> @atomic_vec2_half(ptr %x) { +; CHECK3-LABEL: atomic_vec2_half: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movl (%rdi), %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK3-NEXT:shrl $16, %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm1 +; CHECK3-NEXT:punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3] +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec2_half: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movl
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/125432 >From 6897d31ad46d5be2a46ef5c852d82675ddd65cd2 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 31 Jan 2025 13:12:56 -0500 Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic vector. After splitting, all elements are created. The two components must be found by looking at the upper and lower half of EXTRACT_ELEMENT. This change extends EltsFromConsecutiveLoads to understand AtomicSDNode so that unused elements can be removed. commit-id:b83937a8 --- llvm/include/llvm/CodeGen/SelectionDAG.h | 4 +- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 20 ++- .../SelectionDAGAddressAnalysis.cpp | 30 ++-- .../SelectionDAG/SelectionDAGBuilder.cpp | 6 +- llvm/lib/Target/X86/X86ISelLowering.cpp | 43 +++-- llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++ 6 files changed, 83 insertions(+), 187 deletions(-) diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h index ba11ddbb5b731..d3cd81c146280 100644 --- a/llvm/include/llvm/CodeGen/SelectionDAG.h +++ b/llvm/include/llvm/CodeGen/SelectionDAG.h @@ -1843,7 +1843,7 @@ class SelectionDAG { /// chain to the token factor. This ensures that the new memory node will have /// the same relative memory dependency position as the old load. Returns the /// new merged load chain. - SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp); + SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp); /// Topological-sort the AllNodes list and a /// assign a unique node id for each node in the DAG based on their @@ -2281,7 +2281,7 @@ class SelectionDAG { /// merged. Check that both are nonvolatile and if LD is loading /// 'Bytes' bytes from a location that is 'Dist' units away from the /// location that the 'Base' load is loading from. - bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base, + bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base, unsigned Bytes, int Dist) const; /// Infer alignment of a load / store address. Return std::nullopt if it diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index 2a68903c34cef..8e77a542ab029 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -12218,7 +12218,7 @@ SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain, return TokenFactor; } -SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, +SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp) { assert(isa(NewMemOp.getNode()) && "Expected a memop node"); SDValue OldChain = SDValue(OldLoad, 1); @@ -12911,17 +12911,21 @@ std::pair SelectionDAG::UnrollVectorOverflowOp( getBuildVector(NewOvVT, dl, OvScalars)); } -bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD, - LoadSDNode *Base, +bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD, + MemSDNode *Base, unsigned Bytes, int Dist) const { if (LD->isVolatile() || Base->isVolatile()) return false; - // TODO: probably too restrictive for atomics, revisit - if (!LD->isSimple()) -return false; - if (LD->isIndexed() || Base->isIndexed()) -return false; + if (auto Ld = dyn_cast(LD)) { +if (!Ld->isSimple()) + return false; +if (Ld->isIndexed()) + return false; + } + if (auto Ld = dyn_cast(Base)) +if (Ld->isIndexed()) + return false; if (LD->getChain() != Base->getChain()) return false; EVT VT = LD->getMemoryVT(); diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp index f2ab88851b780..c29cb424c7a4c 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp @@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, int64_t BitSize, } /// Parses tree in Ptr for base, index, offset addresses. -static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, - const SelectionDAG &DAG) { +template +static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) { SDValue Ptr = N->getBasePtr(); // (((B + I*M) + c)) + c ... @@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, bool IsIndexSignExt = false; // pre-inc/pre-dec ops are components of EA. - if (N->getAddressingMode() == ISD::P
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 >From d600ac360ed1e19693cc99f7785fd3f4cef579da Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 20 Dec 2024 06:14:28 -0500 Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector AtomicExpand fails for aligned `load atomic ` because it does not find a compatible library call. This change adds appropriate bitcasts so that the call can be lowered. commit-id:f430c1af --- llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +- llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 30 + .../X86/expand-atomic-non-integer.ll | 65 +++ 4 files changed, 163 insertions(+), 3 deletions(-) diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index c376de877ac7d..b6f1e9db9ce35 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + if (I->getType()->getScalarType()->isPointerTy() && + I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) { +unsigned AS = + cast(I->getType()->getScalarType())->getAddressSpace(); +ElementCount EC = cast(I->getType())->getElementCount(); +Value *BC = Builder.CreateBitCast( +Result, +VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)), +EC)); +Value *IntToPtr = Builder.CreateIntToPtr( +BC, VectorType::get(PointerType::get(Ctx, AS), EC)); +V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType()); + } else +V = Builder.CreateBitOrPointerCast(Result, I->getType()); +} else { V = Builder.CreateAlignedLoad(I->getType(), AllocaResult, AllocaAlignment); Builder.CreateLifetimeEnd(AllocaResult, SizeVal64); diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll b/llvm/test/CodeGen/ARM/atomic-load-store.ll index 560dfde356c29..36c1305a7c5df 100644 --- a/llvm/test/CodeGen/ARM/atomic-load-store.ll +++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll @@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double %val1) { store atomic double %val1, ptr %ptr seq_cst, align 8 ret void } + +define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 { +; ARM-LABEL: atomic_vec1_ptr: +; ARM: @ %bb.0: +; ARM-NEXT:ldr r0, [r0] +; ARM-NEXT:dmb ish +; ARM-NEXT:bx lr +; +; ARMOPTNONE-LABEL: atomic_vec1_ptr: +; ARMOPTNONE: @ %bb.0: +; ARMOPTNONE-NEXT:ldr r0, [r0] +; ARMOPTNONE-NEXT:dmb ish +; ARMOPTNONE-NEXT:bx lr +; +; THUMBTWO-LABEL: atomic_vec1_ptr: +; THUMBTWO: @ %bb.0: +; THUMBTWO-NEXT:ldr r0, [r0] +; THUMBTWO-NEXT:dmb ish +; THUMBTWO-NEXT:bx lr +; +; THUMBONE-LABEL: atomic_vec1_ptr: +; THUMBONE: @ %bb.0: +; THUMBONE-NEXT:push {r7, lr} +; THUMBONE-NEXT:movs r1, #0 +; THUMBONE-NEXT:mov r2, r1 +; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4 +; THUMBONE-NEXT:pop {r7, pc} +; +; ARMV4-LABEL: atomic_vec1_ptr: +; ARMV4: @ %bb.0: +; ARMV4-NEXT:push {r11, lr} +; ARMV4-NEXT:mov r1, #2 +; ARMV4-NEXT:bl __atomic_load_4 +; ARMV4-NEXT:pop {r11, lr} +; ARMV4-NEXT:mov pc, lr +; +; ARMV6-LABEL: atomic_vec1_ptr: +; ARMV6: @ %bb.0: +; ARMV6-NEXT:mov r1, #0 +; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5 +; ARMV6-NEXT:ldr r0, [r0] +; ARMV6-NEXT:bx lr +; +; THUMBM-LABEL: atomic_vec1_ptr: +; THUMBM: @ %bb.0: +; THUMBM-NEXT:ldr r0, [r0] +; THUMBM-NEXT:dmb sy +; THUMBM-NEXT:bx lr + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 08d0405345f57..4293df8c13571 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { ret <2 x i32> %ret } +define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec2_ptr_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT:popq %rax +; CHECK-NEXT:retq + %ret = load atomic <2 x ptr>, ptr %x acquire, align 16 + ret <2 x ptr> %ret +} + define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { ; CHECK3-LAB
[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120640 >From 28f6bf33268da6361d71070d5bf75f93a85b03eb Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 16:25:55 -0500 Subject: [PATCH] [SelectionDAG] Split vector types for atomic load Vector types that aren't widened are split so that a single ATOMIC_LOAD is issued for the entire vector at once. This change utilizes the load vectorization infrastructure in SelectionDAG in order to group the vectors. This enables SelectionDAG to translate vectors with type bfloat,half. commit-id:3a045357 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 39 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++ 3 files changed, 211 insertions(+) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index bdfa5f7741ad3..d8f402f529632 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi); + void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo, diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index 0edbbe8af623a..3e364276d1bcf 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SplitVecRes_STEP_VECTOR(N, Lo, Hi); break; case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break; + case ISD::ATOMIC_LOAD: +SplitVecRes_ATOMIC_LOAD(cast(N), Lo, Hi); +break; case ISD::LOAD: SplitVecRes_LOAD(cast(N), Lo, Hi); break; @@ -1421,6 +1424,42 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SetSplitVector(SDValue(N, ResNo), Lo, Hi); } +void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, + SDValue &Hi) { + EVT LoVT, HiVT; + SDLoc dl(LD); + std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0)); + + ISD::LoadExtType ExtType = LD->getExtensionType(); + SDValue Ch = LD->getChain(); + SDValue Ptr = LD->getBasePtr(); + EVT MemoryVT = LD->getMemoryVT(); + + EVT LoMemVT, HiMemVT; + std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT); + + EVT IntVT = + EVT::getIntegerVT(*DAG.getContext(), LD->getValueType(0).getSizeInBits()); + EVT MemIntVT = + EVT::getIntegerVT(*DAG.getContext(), 2 * LoMemVT.getSizeInBits()); + SDValue ALD = DAG.getAtomicLoad(ExtType, dl, MemIntVT, IntVT, Ch, Ptr, + LD->getMemOperand()); + + EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits()); + EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits()); + SDValue ExtractLo = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, LoIntVT, ALD, + DAG.getIntPtrConstant(0, dl)); + SDValue ExtractHi = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiIntVT, ALD, + DAG.getIntPtrConstant(1, dl)); + + Lo = DAG.getBitcast(LoVT, ExtractLo); + Hi = DAG.getBitcast(HiVT, ExtractHi); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(LD, 1), ALD.getValue(1)); +} + void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT, MachinePointerInfo &MPI, SDValue &Ptr, uint64_t *ScaledOffset) { diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 935d058a52f8f..42b0955824293 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -204,6 +204,68 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) { ret <2 x float> %ret } +define <2 x half> @atomic_vec2_half(ptr %x) { +; CHECK3-LABEL: atomic_vec2_half: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movl (%rdi), %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK3-NEXT:shrl $16, %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm1 +; CHECK3-NEXT:punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3] +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec2_half: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movl
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/125432 >From 6897d31ad46d5be2a46ef5c852d82675ddd65cd2 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 31 Jan 2025 13:12:56 -0500 Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic vector. After splitting, all elements are created. The two components must be found by looking at the upper and lower half of EXTRACT_ELEMENT. This change extends EltsFromConsecutiveLoads to understand AtomicSDNode so that unused elements can be removed. commit-id:b83937a8 --- llvm/include/llvm/CodeGen/SelectionDAG.h | 4 +- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 20 ++- .../SelectionDAGAddressAnalysis.cpp | 30 ++-- .../SelectionDAG/SelectionDAGBuilder.cpp | 6 +- llvm/lib/Target/X86/X86ISelLowering.cpp | 43 +++-- llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++ 6 files changed, 83 insertions(+), 187 deletions(-) diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h index ba11ddbb5b731..d3cd81c146280 100644 --- a/llvm/include/llvm/CodeGen/SelectionDAG.h +++ b/llvm/include/llvm/CodeGen/SelectionDAG.h @@ -1843,7 +1843,7 @@ class SelectionDAG { /// chain to the token factor. This ensures that the new memory node will have /// the same relative memory dependency position as the old load. Returns the /// new merged load chain. - SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp); + SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp); /// Topological-sort the AllNodes list and a /// assign a unique node id for each node in the DAG based on their @@ -2281,7 +2281,7 @@ class SelectionDAG { /// merged. Check that both are nonvolatile and if LD is loading /// 'Bytes' bytes from a location that is 'Dist' units away from the /// location that the 'Base' load is loading from. - bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base, + bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base, unsigned Bytes, int Dist) const; /// Infer alignment of a load / store address. Return std::nullopt if it diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index 2a68903c34cef..8e77a542ab029 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -12218,7 +12218,7 @@ SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain, return TokenFactor; } -SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, +SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp) { assert(isa(NewMemOp.getNode()) && "Expected a memop node"); SDValue OldChain = SDValue(OldLoad, 1); @@ -12911,17 +12911,21 @@ std::pair SelectionDAG::UnrollVectorOverflowOp( getBuildVector(NewOvVT, dl, OvScalars)); } -bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD, - LoadSDNode *Base, +bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD, + MemSDNode *Base, unsigned Bytes, int Dist) const { if (LD->isVolatile() || Base->isVolatile()) return false; - // TODO: probably too restrictive for atomics, revisit - if (!LD->isSimple()) -return false; - if (LD->isIndexed() || Base->isIndexed()) -return false; + if (auto Ld = dyn_cast(LD)) { +if (!Ld->isSimple()) + return false; +if (Ld->isIndexed()) + return false; + } + if (auto Ld = dyn_cast(Base)) +if (Ld->isIndexed()) + return false; if (LD->getChain() != Base->getChain()) return false; EVT VT = LD->getMemoryVT(); diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp index f2ab88851b780..c29cb424c7a4c 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp @@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, int64_t BitSize, } /// Parses tree in Ptr for base, index, offset addresses. -static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, - const SelectionDAG &DAG) { +template +static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) { SDValue Ptr = N->getBasePtr(); // (((B + I*M) + c)) + c ... @@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, bool IsIndexSignExt = false; // pre-inc/pre-dec ops are components of EA. - if (N->getAddressingMode() == ISD::P
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/125432 >From 6897d31ad46d5be2a46ef5c852d82675ddd65cd2 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 31 Jan 2025 13:12:56 -0500 Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic vector. After splitting, all elements are created. The two components must be found by looking at the upper and lower half of EXTRACT_ELEMENT. This change extends EltsFromConsecutiveLoads to understand AtomicSDNode so that unused elements can be removed. commit-id:b83937a8 --- llvm/include/llvm/CodeGen/SelectionDAG.h | 4 +- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 20 ++- .../SelectionDAGAddressAnalysis.cpp | 30 ++-- .../SelectionDAG/SelectionDAGBuilder.cpp | 6 +- llvm/lib/Target/X86/X86ISelLowering.cpp | 43 +++-- llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++ 6 files changed, 83 insertions(+), 187 deletions(-) diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h index ba11ddbb5b731..d3cd81c146280 100644 --- a/llvm/include/llvm/CodeGen/SelectionDAG.h +++ b/llvm/include/llvm/CodeGen/SelectionDAG.h @@ -1843,7 +1843,7 @@ class SelectionDAG { /// chain to the token factor. This ensures that the new memory node will have /// the same relative memory dependency position as the old load. Returns the /// new merged load chain. - SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp); + SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp); /// Topological-sort the AllNodes list and a /// assign a unique node id for each node in the DAG based on their @@ -2281,7 +2281,7 @@ class SelectionDAG { /// merged. Check that both are nonvolatile and if LD is loading /// 'Bytes' bytes from a location that is 'Dist' units away from the /// location that the 'Base' load is loading from. - bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base, + bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base, unsigned Bytes, int Dist) const; /// Infer alignment of a load / store address. Return std::nullopt if it diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index 2a68903c34cef..8e77a542ab029 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -12218,7 +12218,7 @@ SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain, return TokenFactor; } -SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, +SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp) { assert(isa(NewMemOp.getNode()) && "Expected a memop node"); SDValue OldChain = SDValue(OldLoad, 1); @@ -12911,17 +12911,21 @@ std::pair SelectionDAG::UnrollVectorOverflowOp( getBuildVector(NewOvVT, dl, OvScalars)); } -bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD, - LoadSDNode *Base, +bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD, + MemSDNode *Base, unsigned Bytes, int Dist) const { if (LD->isVolatile() || Base->isVolatile()) return false; - // TODO: probably too restrictive for atomics, revisit - if (!LD->isSimple()) -return false; - if (LD->isIndexed() || Base->isIndexed()) -return false; + if (auto Ld = dyn_cast(LD)) { +if (!Ld->isSimple()) + return false; +if (Ld->isIndexed()) + return false; + } + if (auto Ld = dyn_cast(Base)) +if (Ld->isIndexed()) + return false; if (LD->getChain() != Base->getChain()) return false; EVT VT = LD->getMemoryVT(); diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp index f2ab88851b780..c29cb424c7a4c 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp @@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, int64_t BitSize, } /// Parses tree in Ptr for base, index, offset addresses. -static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, - const SelectionDAG &DAG) { +template +static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) { SDValue Ptr = N->getBasePtr(); // (((B + I*M) + c)) + c ... @@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, bool IsIndexSignExt = false; // pre-inc/pre-dec ops are components of EA. - if (N->getAddressingMode() == ISD::P
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 >From d600ac360ed1e19693cc99f7785fd3f4cef579da Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 20 Dec 2024 06:14:28 -0500 Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector AtomicExpand fails for aligned `load atomic ` because it does not find a compatible library call. This change adds appropriate bitcasts so that the call can be lowered. commit-id:f430c1af --- llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +- llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 30 + .../X86/expand-atomic-non-integer.ll | 65 +++ 4 files changed, 163 insertions(+), 3 deletions(-) diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index c376de877ac7d..b6f1e9db9ce35 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + if (I->getType()->getScalarType()->isPointerTy() && + I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) { +unsigned AS = + cast(I->getType()->getScalarType())->getAddressSpace(); +ElementCount EC = cast(I->getType())->getElementCount(); +Value *BC = Builder.CreateBitCast( +Result, +VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)), +EC)); +Value *IntToPtr = Builder.CreateIntToPtr( +BC, VectorType::get(PointerType::get(Ctx, AS), EC)); +V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType()); + } else +V = Builder.CreateBitOrPointerCast(Result, I->getType()); +} else { V = Builder.CreateAlignedLoad(I->getType(), AllocaResult, AllocaAlignment); Builder.CreateLifetimeEnd(AllocaResult, SizeVal64); diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll b/llvm/test/CodeGen/ARM/atomic-load-store.ll index 560dfde356c29..36c1305a7c5df 100644 --- a/llvm/test/CodeGen/ARM/atomic-load-store.ll +++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll @@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double %val1) { store atomic double %val1, ptr %ptr seq_cst, align 8 ret void } + +define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 { +; ARM-LABEL: atomic_vec1_ptr: +; ARM: @ %bb.0: +; ARM-NEXT:ldr r0, [r0] +; ARM-NEXT:dmb ish +; ARM-NEXT:bx lr +; +; ARMOPTNONE-LABEL: atomic_vec1_ptr: +; ARMOPTNONE: @ %bb.0: +; ARMOPTNONE-NEXT:ldr r0, [r0] +; ARMOPTNONE-NEXT:dmb ish +; ARMOPTNONE-NEXT:bx lr +; +; THUMBTWO-LABEL: atomic_vec1_ptr: +; THUMBTWO: @ %bb.0: +; THUMBTWO-NEXT:ldr r0, [r0] +; THUMBTWO-NEXT:dmb ish +; THUMBTWO-NEXT:bx lr +; +; THUMBONE-LABEL: atomic_vec1_ptr: +; THUMBONE: @ %bb.0: +; THUMBONE-NEXT:push {r7, lr} +; THUMBONE-NEXT:movs r1, #0 +; THUMBONE-NEXT:mov r2, r1 +; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4 +; THUMBONE-NEXT:pop {r7, pc} +; +; ARMV4-LABEL: atomic_vec1_ptr: +; ARMV4: @ %bb.0: +; ARMV4-NEXT:push {r11, lr} +; ARMV4-NEXT:mov r1, #2 +; ARMV4-NEXT:bl __atomic_load_4 +; ARMV4-NEXT:pop {r11, lr} +; ARMV4-NEXT:mov pc, lr +; +; ARMV6-LABEL: atomic_vec1_ptr: +; ARMV6: @ %bb.0: +; ARMV6-NEXT:mov r1, #0 +; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5 +; ARMV6-NEXT:ldr r0, [r0] +; ARMV6-NEXT:bx lr +; +; THUMBM-LABEL: atomic_vec1_ptr: +; THUMBM: @ %bb.0: +; THUMBM-NEXT:ldr r0, [r0] +; THUMBM-NEXT:dmb sy +; THUMBM-NEXT:bx lr + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 08d0405345f57..4293df8c13571 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { ret <2 x i32> %ret } +define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec2_ptr_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT:popq %rax +; CHECK-NEXT:retq + %ret = load atomic <2 x ptr>, ptr %x acquire, align 16 + ret <2 x ptr> %ret +} + define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { ; CHECK3-LAB
[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)
@@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SetSplitVector(SDValue(N, ResNo), Lo, Hi); } +void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, + SDValue &Hi) { + EVT LoVT, HiVT; + SDLoc dl(LD); + std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0)); + + ISD::LoadExtType ExtType = LD->getExtensionType(); + SDValue Ch = LD->getChain(); + SDValue Ptr = LD->getBasePtr(); + EVT MemoryVT = LD->getMemoryVT(); + + EVT LoMemVT, HiMemVT; + std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT); jofrn wrote: Only one has to be used to get double that size. May we just remove it all and use `MemoryVT.getSizeInBits()`? https://github.com/llvm/llvm-project/pull/120640 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] IR: Remove reference counts from ConstantData (PR #137314)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/137314 >From bdeb993a7b252399949a004ff722da3f30f56b2a Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sat, 19 Apr 2025 21:11:23 +0200 Subject: [PATCH 1/2] IR: Remove reference counts from ConstantData This is a follow up change to eliminating uselists for ConstantData. In the previous revision, ConstantData had a replacement reference count instead of a uselist. This reference count was misleading, and not useful in the same way as it would be for another value. The references may not have even been in the current module, since these are shared throughout the LLVMContext. This doesn't space leak any more than we previously did; nothing was attempting to garbage collect unused constants. Previously the use_empty, and hasNUses type of APIs were supported through the reference count. These now behave as if the uses are always empty. Ideally it would be illegal to inspect these, but this forces API complexity into quite a few places. It may be doable to make it illegal to check these counts, but I would like there to be a targeted fuzzing effort to make sure every transform properly deals with a constant in every operand position. All tests pass if I turn the hasNUses* and getNumUses queries into assertions, only hasOneUse in particular appears to hit in some set of contexts. I've added unit tests to ensure logical consistency between these cases --- llvm/docs/ReleaseNotes.md | 4 +- llvm/include/llvm/IR/Constants.h | 3 +- llvm/include/llvm/IR/Use.h | 9 +- llvm/include/llvm/IR/Value.h | 118 +++-- llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp | 2 +- llvm/lib/IR/AsmWriter.cpp | 3 +- llvm/lib/IR/Instruction.cpp| 4 +- llvm/lib/IR/Value.cpp | 28 +++-- llvm/unittests/IR/ConstantsTest.cpp| 36 +++ 9 files changed, 100 insertions(+), 107 deletions(-) diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md index 504db733308c1..05318362b99c9 100644 --- a/llvm/docs/ReleaseNotes.md +++ b/llvm/docs/ReleaseNotes.md @@ -56,7 +56,9 @@ Makes programs 10x faster by doing Special New Thing. Changes to the LLVM IR -- -* It is no longer permitted to inspect the uses of ConstantData +* It is no longer permitted to inspect the uses of ConstantData. Use + count APIs will behave as if they have no uses (i.e. use_empty() is + always true). * The `nocapture` attribute has been replaced by `captures(none)`. * The constant expression variants of the following instructions have been diff --git a/llvm/include/llvm/IR/Constants.h b/llvm/include/llvm/IR/Constants.h index ff51f59b6ec68..76efa9bd63522 100644 --- a/llvm/include/llvm/IR/Constants.h +++ b/llvm/include/llvm/IR/Constants.h @@ -51,7 +51,8 @@ template struct ConstantAggrKeyType; /// Since they can be in use by unrelated modules (and are never based on /// GlobalValues), it never makes sense to RAUW them. /// -/// These do not have use lists. It is illegal to inspect the uses. +/// These do not have use lists. It is illegal to inspect the uses. These behave +/// as if they have no uses (i.e. use_empty() is always true). class ConstantData : public Constant { constexpr static IntrusiveOperandsAllocMarker AllocMarker{0}; diff --git a/llvm/include/llvm/IR/Use.h b/llvm/include/llvm/IR/Use.h index bcd1fd6677497..dc22d69ba561d 100644 --- a/llvm/include/llvm/IR/Use.h +++ b/llvm/include/llvm/IR/Use.h @@ -23,7 +23,6 @@ namespace llvm { template struct simplify_type; -class ConstantData; class User; class Value; @@ -43,7 +42,7 @@ class Use { private: /// Destructor - Only for zap() - ~Use(); + ~Use() { removeFromList(); } /// Constructor Use(User *Parent) : Parent(Parent) {} @@ -85,10 +84,8 @@ class Use { Use **Prev = nullptr; User *Parent = nullptr; - inline void addToList(unsigned &Count); - inline void addToList(Use *&List); - inline void removeFromList(unsigned &Count); - inline void removeFromList(Use *&List); + inline void addToList(Use **List); + inline void removeFromList(); }; /// Allow clients to treat uses just like values when using diff --git a/llvm/include/llvm/IR/Value.h b/llvm/include/llvm/IR/Value.h index 180b6238eda6c..ae874304c4316 100644 --- a/llvm/include/llvm/IR/Value.h +++ b/llvm/include/llvm/IR/Value.h @@ -116,10 +116,7 @@ class Value { private: Type *VTy; - union { -Use *List = nullptr; -unsigned Count; - } Uses; + Use *UseList = nullptr; friend class ValueAsMetadata; // Allow access to IsUsedByMD. friend class ValueHandleBase; // Allow access to HasValueHandle. @@ -347,23 +344,21 @@ class Value { bool use_empty() const { assertModuleIsMaterialized(); -return hasUseList() ? Uses.List == nullptr : Uses.Count == 0; +return UseList == nullptr; } - bool materialized_use_empty() const { -
[llvm-branch-commits] [llvm] IR: Remove reference counts from ConstantData (PR #137314)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/137314 >From bdeb993a7b252399949a004ff722da3f30f56b2a Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sat, 19 Apr 2025 21:11:23 +0200 Subject: [PATCH 1/2] IR: Remove reference counts from ConstantData This is a follow up change to eliminating uselists for ConstantData. In the previous revision, ConstantData had a replacement reference count instead of a uselist. This reference count was misleading, and not useful in the same way as it would be for another value. The references may not have even been in the current module, since these are shared throughout the LLVMContext. This doesn't space leak any more than we previously did; nothing was attempting to garbage collect unused constants. Previously the use_empty, and hasNUses type of APIs were supported through the reference count. These now behave as if the uses are always empty. Ideally it would be illegal to inspect these, but this forces API complexity into quite a few places. It may be doable to make it illegal to check these counts, but I would like there to be a targeted fuzzing effort to make sure every transform properly deals with a constant in every operand position. All tests pass if I turn the hasNUses* and getNumUses queries into assertions, only hasOneUse in particular appears to hit in some set of contexts. I've added unit tests to ensure logical consistency between these cases --- llvm/docs/ReleaseNotes.md | 4 +- llvm/include/llvm/IR/Constants.h | 3 +- llvm/include/llvm/IR/Use.h | 9 +- llvm/include/llvm/IR/Value.h | 118 +++-- llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp | 2 +- llvm/lib/IR/AsmWriter.cpp | 3 +- llvm/lib/IR/Instruction.cpp| 4 +- llvm/lib/IR/Value.cpp | 28 +++-- llvm/unittests/IR/ConstantsTest.cpp| 36 +++ 9 files changed, 100 insertions(+), 107 deletions(-) diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md index 504db733308c1..05318362b99c9 100644 --- a/llvm/docs/ReleaseNotes.md +++ b/llvm/docs/ReleaseNotes.md @@ -56,7 +56,9 @@ Makes programs 10x faster by doing Special New Thing. Changes to the LLVM IR -- -* It is no longer permitted to inspect the uses of ConstantData +* It is no longer permitted to inspect the uses of ConstantData. Use + count APIs will behave as if they have no uses (i.e. use_empty() is + always true). * The `nocapture` attribute has been replaced by `captures(none)`. * The constant expression variants of the following instructions have been diff --git a/llvm/include/llvm/IR/Constants.h b/llvm/include/llvm/IR/Constants.h index ff51f59b6ec68..76efa9bd63522 100644 --- a/llvm/include/llvm/IR/Constants.h +++ b/llvm/include/llvm/IR/Constants.h @@ -51,7 +51,8 @@ template struct ConstantAggrKeyType; /// Since they can be in use by unrelated modules (and are never based on /// GlobalValues), it never makes sense to RAUW them. /// -/// These do not have use lists. It is illegal to inspect the uses. +/// These do not have use lists. It is illegal to inspect the uses. These behave +/// as if they have no uses (i.e. use_empty() is always true). class ConstantData : public Constant { constexpr static IntrusiveOperandsAllocMarker AllocMarker{0}; diff --git a/llvm/include/llvm/IR/Use.h b/llvm/include/llvm/IR/Use.h index bcd1fd6677497..dc22d69ba561d 100644 --- a/llvm/include/llvm/IR/Use.h +++ b/llvm/include/llvm/IR/Use.h @@ -23,7 +23,6 @@ namespace llvm { template struct simplify_type; -class ConstantData; class User; class Value; @@ -43,7 +42,7 @@ class Use { private: /// Destructor - Only for zap() - ~Use(); + ~Use() { removeFromList(); } /// Constructor Use(User *Parent) : Parent(Parent) {} @@ -85,10 +84,8 @@ class Use { Use **Prev = nullptr; User *Parent = nullptr; - inline void addToList(unsigned &Count); - inline void addToList(Use *&List); - inline void removeFromList(unsigned &Count); - inline void removeFromList(Use *&List); + inline void addToList(Use **List); + inline void removeFromList(); }; /// Allow clients to treat uses just like values when using diff --git a/llvm/include/llvm/IR/Value.h b/llvm/include/llvm/IR/Value.h index 180b6238eda6c..ae874304c4316 100644 --- a/llvm/include/llvm/IR/Value.h +++ b/llvm/include/llvm/IR/Value.h @@ -116,10 +116,7 @@ class Value { private: Type *VTy; - union { -Use *List = nullptr; -unsigned Count; - } Uses; + Use *UseList = nullptr; friend class ValueAsMetadata; // Allow access to IsUsedByMD. friend class ValueHandleBase; // Allow access to HasValueHandle. @@ -347,23 +344,21 @@ class Value { bool use_empty() const { assertModuleIsMaterialized(); -return hasUseList() ? Uses.List == nullptr : Uses.Count == 0; +return UseList == nullptr; } - bool materialized_use_empty() const { -
[llvm-branch-commits] [llvm] IR: Reorder ConstantData enum values (PR #138638)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/138638?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#138638** https://app.graphite.dev/github/pr/llvm/llvm-project/138638?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> ๐ https://app.graphite.dev/github/pr/llvm/llvm-project/138638?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#137314** https://app.graphite.dev/github/pr/llvm/llvm-project/137314?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#137313** https://app.graphite.dev/github/pr/llvm/llvm-project/137313?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/138638 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] IR: Reorder ConstantData enum values (PR #138638)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/138638 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] IR: Reorder ConstantData enum values (PR #138638)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/138638 This sorts ConstantData to the low values, so we can perform a hasUseList check in a single compare instead of requiring 2 compares plus an and for the range check. Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] IR: Reorder ConstantData enum values (PR #138638)
llvmbot wrote: @llvm/pr-subscribers-llvm-ir Author: Matt Arsenault (arsenm) Changes This sorts ConstantData to the low values, so we can perform a hasUseList check in a single compare instead of requiring 2 compares plus an and for the range check. --- Full diff: https://github.com/llvm/llvm-project/pull/138638.diff 1 Files Affected: - (modified) llvm/include/llvm/IR/Value.def (+20-18) ``diff diff --git a/llvm/include/llvm/IR/Value.def b/llvm/include/llvm/IR/Value.def index 160e0f8513e2a..34b8d4967b28a 100644 --- a/llvm/include/llvm/IR/Value.def +++ b/llvm/include/llvm/IR/Value.def @@ -69,24 +69,11 @@ #define HANDLE_CONSTANT_EXCLUDE_LLVM_C_API(ValueName) #endif -// Having constant first makes the range check for isa faster -// and smaller by one operation. +// Having constant first makes the range check for isa faster and +// smaller by one operation. Further, keep ConstantData as the first subset so +// it's also as fast. // Constant -HANDLE_GLOBAL_VALUE(Function) -HANDLE_GLOBAL_VALUE(GlobalAlias) -HANDLE_GLOBAL_VALUE(GlobalIFunc) -HANDLE_GLOBAL_VALUE(GlobalVariable) -HANDLE_CONSTANT(BlockAddress) -HANDLE_CONSTANT(ConstantExpr) -HANDLE_CONSTANT_EXCLUDE_LLVM_C_API(DSOLocalEquivalent) -HANDLE_CONSTANT_EXCLUDE_LLVM_C_API(NoCFIValue) -HANDLE_CONSTANT(ConstantPtrAuth) - -// ConstantAggregate. -HANDLE_CONSTANT(ConstantArray) -HANDLE_CONSTANT(ConstantStruct) -HANDLE_CONSTANT(ConstantVector) // ConstantData. HANDLE_CONSTANT(UndefValue) @@ -100,8 +87,23 @@ HANDLE_CONSTANT(ConstantTargetNone) HANDLE_CONSTANT(ConstantPointerNull) HANDLE_CONSTANT(ConstantTokenNone) -HANDLE_CONSTANT_MARKER(ConstantFirstVal, Function) -HANDLE_CONSTANT_MARKER(ConstantLastVal, ConstantTokenNone) +// ConstantAggregate. +HANDLE_CONSTANT(ConstantArray) +HANDLE_CONSTANT(ConstantStruct) +HANDLE_CONSTANT(ConstantVector) + +HANDLE_GLOBAL_VALUE(Function) +HANDLE_GLOBAL_VALUE(GlobalAlias) +HANDLE_GLOBAL_VALUE(GlobalIFunc) +HANDLE_GLOBAL_VALUE(GlobalVariable) +HANDLE_CONSTANT(BlockAddress) +HANDLE_CONSTANT(ConstantExpr) +HANDLE_CONSTANT_EXCLUDE_LLVM_C_API(DSOLocalEquivalent) +HANDLE_CONSTANT_EXCLUDE_LLVM_C_API(NoCFIValue) +HANDLE_CONSTANT(ConstantPtrAuth) + +HANDLE_CONSTANT_MARKER(ConstantFirstVal, UndefValue) +HANDLE_CONSTANT_MARKER(ConstantLastVal, ConstantPtrAuth) HANDLE_CONSTANT_MARKER(ConstantDataFirstVal, UndefValue) HANDLE_CONSTANT_MARKER(ConstantDataLastVal, ConstantTokenNone) HANDLE_CONSTANT_MARKER(ConstantAggregateFirstVal, ConstantArray) `` https://github.com/llvm/llvm-project/pull/138638 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [OpenMP] Add pre sm_70 load hack back in (#138589) (PR #138626)
llvmbot wrote: @shiltian What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/138626 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [OpenMP] Add pre sm_70 load hack back in (#138589) (PR #138626)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/138626 Backport dfcb8cb Requested by: @ye-luo >From dd2f2eeb12fe3349944a12caf30ec874752dea34 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 5 May 2025 16:33:41 -0500 Subject: [PATCH] [OpenMP] Add pre sm_70 load hack back in (#138589) Summary: Different ordering modes aren't supported for an atomic load, so we just do an add of zero as the same thing. It's less efficient, but it works. Fixes https://github.com/llvm/llvm-project/issues/138560 (cherry picked from commit dfcb8cb2a92c9f72ddde5ea08dadf2f640197d32) --- offload/DeviceRTL/include/Synchronization.h | 4 1 file changed, 4 insertions(+) diff --git a/offload/DeviceRTL/include/Synchronization.h b/offload/DeviceRTL/include/Synchronization.h index 5a789441b9d35..c510fbf0774c2 100644 --- a/offload/DeviceRTL/include/Synchronization.h +++ b/offload/DeviceRTL/include/Synchronization.h @@ -61,7 +61,11 @@ V add(Ty *Address, V Val, atomic::OrderingTy Ordering, template > V load(Ty *Address, atomic::OrderingTy Ordering, MemScopeTy MemScope = MemScopeTy::device) { +#ifdef __NVPTX__ + return __scoped_atomic_fetch_add(Address, V(0), Ordering, MemScope); +#else return __scoped_atomic_load_n(Address, Ordering, MemScope); +#endif } template > ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [OpenMP] Add pre sm_70 load hack back in (#138589) (PR #138626)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/138626 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [OpenMP] Add pre sm_70 load hack back in (#138589) (PR #138626)
llvmbot wrote: @llvm/pr-subscribers-offload Author: None (llvmbot) Changes Backport dfcb8cb Requested by: @ye-luo --- Full diff: https://github.com/llvm/llvm-project/pull/138626.diff 1 Files Affected: - (modified) offload/DeviceRTL/include/Synchronization.h (+4) ``diff diff --git a/offload/DeviceRTL/include/Synchronization.h b/offload/DeviceRTL/include/Synchronization.h index 5a789441b9d35..c510fbf0774c2 100644 --- a/offload/DeviceRTL/include/Synchronization.h +++ b/offload/DeviceRTL/include/Synchronization.h @@ -61,7 +61,11 @@ V add(Ty *Address, V Val, atomic::OrderingTy Ordering, template > V load(Ty *Address, atomic::OrderingTy Ordering, MemScopeTy MemScope = MemScopeTy::device) { +#ifdef __NVPTX__ + return __scoped_atomic_fetch_add(Address, V(0), Ordering, MemScope); +#else return __scoped_atomic_load_n(Address, Ordering, MemScope); +#endif } template > `` https://github.com/llvm/llvm-project/pull/138626 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 >From 681d443e5e9a069a73eaa0ce50684b41b95c48fd Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 20 Dec 2024 06:14:28 -0500 Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector AtomicExpand fails for aligned `load atomic ` because it does not find a compatible library call. This change adds appropriate bitcasts so that the call can be lowered. commit-id:f430c1af --- llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +- llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 30 + .../X86/expand-atomic-non-integer.ll | 65 +++ 4 files changed, 163 insertions(+), 3 deletions(-) diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index c376de877ac7d..b6f1e9db9ce35 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + if (I->getType()->getScalarType()->isPointerTy() && + I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) { +unsigned AS = + cast(I->getType()->getScalarType())->getAddressSpace(); +ElementCount EC = cast(I->getType())->getElementCount(); +Value *BC = Builder.CreateBitCast( +Result, +VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)), +EC)); +Value *IntToPtr = Builder.CreateIntToPtr( +BC, VectorType::get(PointerType::get(Ctx, AS), EC)); +V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType()); + } else +V = Builder.CreateBitOrPointerCast(Result, I->getType()); +} else { V = Builder.CreateAlignedLoad(I->getType(), AllocaResult, AllocaAlignment); Builder.CreateLifetimeEnd(AllocaResult, SizeVal64); diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll b/llvm/test/CodeGen/ARM/atomic-load-store.ll index 560dfde356c29..36c1305a7c5df 100644 --- a/llvm/test/CodeGen/ARM/atomic-load-store.ll +++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll @@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double %val1) { store atomic double %val1, ptr %ptr seq_cst, align 8 ret void } + +define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 { +; ARM-LABEL: atomic_vec1_ptr: +; ARM: @ %bb.0: +; ARM-NEXT:ldr r0, [r0] +; ARM-NEXT:dmb ish +; ARM-NEXT:bx lr +; +; ARMOPTNONE-LABEL: atomic_vec1_ptr: +; ARMOPTNONE: @ %bb.0: +; ARMOPTNONE-NEXT:ldr r0, [r0] +; ARMOPTNONE-NEXT:dmb ish +; ARMOPTNONE-NEXT:bx lr +; +; THUMBTWO-LABEL: atomic_vec1_ptr: +; THUMBTWO: @ %bb.0: +; THUMBTWO-NEXT:ldr r0, [r0] +; THUMBTWO-NEXT:dmb ish +; THUMBTWO-NEXT:bx lr +; +; THUMBONE-LABEL: atomic_vec1_ptr: +; THUMBONE: @ %bb.0: +; THUMBONE-NEXT:push {r7, lr} +; THUMBONE-NEXT:movs r1, #0 +; THUMBONE-NEXT:mov r2, r1 +; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4 +; THUMBONE-NEXT:pop {r7, pc} +; +; ARMV4-LABEL: atomic_vec1_ptr: +; ARMV4: @ %bb.0: +; ARMV4-NEXT:push {r11, lr} +; ARMV4-NEXT:mov r1, #2 +; ARMV4-NEXT:bl __atomic_load_4 +; ARMV4-NEXT:pop {r11, lr} +; ARMV4-NEXT:mov pc, lr +; +; ARMV6-LABEL: atomic_vec1_ptr: +; ARMV6: @ %bb.0: +; ARMV6-NEXT:mov r1, #0 +; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5 +; ARMV6-NEXT:ldr r0, [r0] +; ARMV6-NEXT:bx lr +; +; THUMBM-LABEL: atomic_vec1_ptr: +; THUMBM: @ %bb.0: +; THUMBM-NEXT:ldr r0, [r0] +; THUMBM-NEXT:dmb sy +; THUMBM-NEXT:bx lr + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 08d0405345f57..4293df8c13571 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { ret <2 x i32> %ret } +define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec2_ptr_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT:popq %rax +; CHECK-NEXT:retq + %ret = load atomic <2 x ptr>, ptr %x acquire, align 16 + ret <2 x ptr> %ret +} + define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { ; CHECK3-LAB
[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120640 >From db5b862de7faed37cb9c40c170d4cd1e9612b489 Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 16:25:55 -0500 Subject: [PATCH] [SelectionDAG] Split vector types for atomic load Vector types that aren't widened are split so that a single ATOMIC_LOAD is issued for the entire vector at once. This change utilizes the load vectorization infrastructure in SelectionDAG in order to group the vectors. This enables SelectionDAG to translate vectors with type bfloat,half. commit-id:3a045357 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 37 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++ 3 files changed, 209 insertions(+) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index bdfa5f7741ad3..d8f402f529632 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi); + void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo, diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index 0edbbe8af623a..895d3c51e0e1e 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SplitVecRes_STEP_VECTOR(N, Lo, Hi); break; case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break; + case ISD::ATOMIC_LOAD: +SplitVecRes_ATOMIC_LOAD(cast(N), Lo, Hi); +break; case ISD::LOAD: SplitVecRes_LOAD(cast(N), Lo, Hi); break; @@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SetSplitVector(SDValue(N, ResNo), Lo, Hi); } +void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, + SDValue &Hi) { + EVT LoVT, HiVT; + SDLoc dl(LD); + std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0)); + + ISD::LoadExtType ExtType = LD->getExtensionType(); + SDValue Ch = LD->getChain(); + SDValue Ptr = LD->getBasePtr(); + EVT MemoryVT = LD->getMemoryVT(); + + EVT LoMemVT, HiMemVT; + std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT); + + EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), LD->getValueType(0).getSizeInBits()); + EVT MemIntVT = EVT::getIntegerVT(*DAG.getContext(), 2 * LoMemVT.getSizeInBits()); + SDValue ALD = DAG.getAtomicLoad(ExtType, dl, MemIntVT, IntVT, Ch, Ptr, + LD->getMemOperand()); + + EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits()); + EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits()); + SDValue ExtractLo = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, LoIntVT, ALD, + DAG.getIntPtrConstant(0, dl)); + SDValue ExtractHi = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiIntVT, ALD, + DAG.getIntPtrConstant(1, dl)); + + Lo = DAG.getBitcast(LoVT, ExtractLo); + Hi = DAG.getBitcast(HiVT, ExtractHi); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(LD, 1), ALD.getValue(1)); +} + void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT, MachinePointerInfo &MPI, SDValue &Ptr, uint64_t *ScaledOffset) { diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 935d058a52f8f..42b0955824293 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -204,6 +204,68 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) { ret <2 x float> %ret } +define <2 x half> @atomic_vec2_half(ptr %x) { +; CHECK3-LABEL: atomic_vec2_half: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movl (%rdi), %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK3-NEXT:shrl $16, %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm1 +; CHECK3-NEXT:punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3] +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec2_half: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movl (%rdi), %eax
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/125432 >From fc2debee17c4ded2edbe2f1803f3184cea78bfdc Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 31 Jan 2025 13:12:56 -0500 Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic vector. After splitting, all elements are created. The two components must be found by looking at the upper and lower half of EXTRACT_ELEMENT. This change extends EltsFromConsecutiveLoads to understand AtomicSDNode so that unused elements can be removed. commit-id:b83937a8 --- llvm/include/llvm/CodeGen/SelectionDAG.h | 4 +- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 20 ++- .../SelectionDAGAddressAnalysis.cpp | 30 ++-- .../SelectionDAG/SelectionDAGBuilder.cpp | 6 +- llvm/lib/Target/X86/X86ISelLowering.cpp | 43 +++-- llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++ 6 files changed, 83 insertions(+), 187 deletions(-) diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h index ba11ddbb5b731..d3cd81c146280 100644 --- a/llvm/include/llvm/CodeGen/SelectionDAG.h +++ b/llvm/include/llvm/CodeGen/SelectionDAG.h @@ -1843,7 +1843,7 @@ class SelectionDAG { /// chain to the token factor. This ensures that the new memory node will have /// the same relative memory dependency position as the old load. Returns the /// new merged load chain. - SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp); + SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp); /// Topological-sort the AllNodes list and a /// assign a unique node id for each node in the DAG based on their @@ -2281,7 +2281,7 @@ class SelectionDAG { /// merged. Check that both are nonvolatile and if LD is loading /// 'Bytes' bytes from a location that is 'Dist' units away from the /// location that the 'Base' load is loading from. - bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base, + bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base, unsigned Bytes, int Dist) const; /// Infer alignment of a load / store address. Return std::nullopt if it diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index 2a68903c34cef..8e77a542ab029 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -12218,7 +12218,7 @@ SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain, return TokenFactor; } -SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, +SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp) { assert(isa(NewMemOp.getNode()) && "Expected a memop node"); SDValue OldChain = SDValue(OldLoad, 1); @@ -12911,17 +12911,21 @@ std::pair SelectionDAG::UnrollVectorOverflowOp( getBuildVector(NewOvVT, dl, OvScalars)); } -bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD, - LoadSDNode *Base, +bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD, + MemSDNode *Base, unsigned Bytes, int Dist) const { if (LD->isVolatile() || Base->isVolatile()) return false; - // TODO: probably too restrictive for atomics, revisit - if (!LD->isSimple()) -return false; - if (LD->isIndexed() || Base->isIndexed()) -return false; + if (auto Ld = dyn_cast(LD)) { +if (!Ld->isSimple()) + return false; +if (Ld->isIndexed()) + return false; + } + if (auto Ld = dyn_cast(Base)) +if (Ld->isIndexed()) + return false; if (LD->getChain() != Base->getChain()) return false; EVT VT = LD->getMemoryVT(); diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp index f2ab88851b780..c29cb424c7a4c 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp @@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, int64_t BitSize, } /// Parses tree in Ptr for base, index, offset addresses. -static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, - const SelectionDAG &DAG) { +template +static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) { SDValue Ptr = N->getBasePtr(); // (((B + I*M) + c)) + c ... @@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, bool IsIndexSignExt = false; // pre-inc/pre-dec ops are components of EA. - if (N->getAddressingMode() == ISD::P
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120598 >From 218ce15319d641c89ce8a5ea7e770fd0c2e5223d Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 11:19:39 -0500 Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load Vector types of 2 elements must be widened. This change does this for vector types of atomic load in SelectionDAG so that it can translate aligned vectors of >1 size. Also, it also adds Pats to remove an extra MOV. commit-id:2894ccd1 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 104 ++ llvm/lib/Target/X86/X86InstrCompiler.td | 7 ++ llvm/test/CodeGen/X86/atomic-load-store.ll| 81 ++ llvm/test/CodeGen/X86/atomic-unordered.ll | 3 +- 5 files changed, 173 insertions(+), 23 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index 89ea7ef4dbe89..bdfa5f7741ad3 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N); SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N); SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N); + SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N); SDValue WidenVecRes_LOAD(SDNode* N); SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N); SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N); diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index 8eee7a4c61fe6..0edbbe8af623a 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -4625,6 +4625,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) { break; case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break; case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break; + case ISD::ATOMIC_LOAD: +Res = WidenVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: Res = WidenVecRes_LOAD(N); break; case ISD::STEP_VECTOR: case ISD::SPLAT_VECTOR: @@ -6014,6 +6017,85 @@ SDValue DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) { N->getOperand(1), N->getOperand(2)); } +/// Either return the same load or provide appropriate casts +/// from the load and return that. +static SDValue loadElement(SDValue LdOp, EVT FirstVT, EVT WidenVT, + TypeSize LdWidth, TypeSize FirstVTWidth, SDLoc dl, + SelectionDAG &DAG) { + assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth)); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + if (!FirstVT.isVector()) { +unsigned NumElts = +WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue(); +EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts); +SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp); +return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp); + } else if (FirstVT == WidenVT) +return LdOp; + else { +// TODO: We don't currently have any tests that exercise this code path. +assert(WidenWidth.getFixedValue() % FirstVTWidth.getFixedValue() == 0); +unsigned NumConcat = +WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue(); +SmallVector ConcatOps(NumConcat); +SDValue UndefVal = DAG.getUNDEF(FirstVT); +ConcatOps[0] = LdOp; +for (unsigned i = 1; i != NumConcat; ++i) + ConcatOps[i] = UndefVal; +return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, ConcatOps); + } +} + +static std::optional findMemType(SelectionDAG &DAG, + const TargetLowering &TLI, unsigned Width, + EVT WidenVT, unsigned Align, + unsigned WidenEx); + +SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) { + EVT WidenVT = + TLI.getTypeToTransformTo(*DAG.getContext(), LD->getValueType(0)); + EVT LdVT = LD->getMemoryVT(); + SDLoc dl(LD); + assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors"); + assert(LdVT.isScalableVector() == WidenVT.isScalableVector() && + "Must be scalable"); + assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() && + "Expected equivalent element types"); + + // Load information + SDValue Chain = LD->getChain(); + SDValue BasePtr = LD->getBasePtr(); + MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags(); + AAMDNodes AAInfo = LD->getAAInfo(); + + TypeSize LdWidth = LdVT.getSizeInBits(); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + TypeSize WidthDiff = WidenWidth - LdWidth; + + // Find the vector type that can load from. + std::optional FirstVT = +
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/125432 >From fc2debee17c4ded2edbe2f1803f3184cea78bfdc Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 31 Jan 2025 13:12:56 -0500 Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic vector. After splitting, all elements are created. The two components must be found by looking at the upper and lower half of EXTRACT_ELEMENT. This change extends EltsFromConsecutiveLoads to understand AtomicSDNode so that unused elements can be removed. commit-id:b83937a8 --- llvm/include/llvm/CodeGen/SelectionDAG.h | 4 +- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 20 ++- .../SelectionDAGAddressAnalysis.cpp | 30 ++-- .../SelectionDAG/SelectionDAGBuilder.cpp | 6 +- llvm/lib/Target/X86/X86ISelLowering.cpp | 43 +++-- llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++ 6 files changed, 83 insertions(+), 187 deletions(-) diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h index ba11ddbb5b731..d3cd81c146280 100644 --- a/llvm/include/llvm/CodeGen/SelectionDAG.h +++ b/llvm/include/llvm/CodeGen/SelectionDAG.h @@ -1843,7 +1843,7 @@ class SelectionDAG { /// chain to the token factor. This ensures that the new memory node will have /// the same relative memory dependency position as the old load. Returns the /// new merged load chain. - SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp); + SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp); /// Topological-sort the AllNodes list and a /// assign a unique node id for each node in the DAG based on their @@ -2281,7 +2281,7 @@ class SelectionDAG { /// merged. Check that both are nonvolatile and if LD is loading /// 'Bytes' bytes from a location that is 'Dist' units away from the /// location that the 'Base' load is loading from. - bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base, + bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base, unsigned Bytes, int Dist) const; /// Infer alignment of a load / store address. Return std::nullopt if it diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index 2a68903c34cef..8e77a542ab029 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -12218,7 +12218,7 @@ SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain, return TokenFactor; } -SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, +SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp) { assert(isa(NewMemOp.getNode()) && "Expected a memop node"); SDValue OldChain = SDValue(OldLoad, 1); @@ -12911,17 +12911,21 @@ std::pair SelectionDAG::UnrollVectorOverflowOp( getBuildVector(NewOvVT, dl, OvScalars)); } -bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD, - LoadSDNode *Base, +bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD, + MemSDNode *Base, unsigned Bytes, int Dist) const { if (LD->isVolatile() || Base->isVolatile()) return false; - // TODO: probably too restrictive for atomics, revisit - if (!LD->isSimple()) -return false; - if (LD->isIndexed() || Base->isIndexed()) -return false; + if (auto Ld = dyn_cast(LD)) { +if (!Ld->isSimple()) + return false; +if (Ld->isIndexed()) + return false; + } + if (auto Ld = dyn_cast(Base)) +if (Ld->isIndexed()) + return false; if (LD->getChain() != Base->getChain()) return false; EVT VT = LD->getMemoryVT(); diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp index f2ab88851b780..c29cb424c7a4c 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp @@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, int64_t BitSize, } /// Parses tree in Ptr for base, index, offset addresses. -static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, - const SelectionDAG &DAG) { +template +static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) { SDValue Ptr = N->getBasePtr(); // (((B + I*M) + c)) + c ... @@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, bool IsIndexSignExt = false; // pre-inc/pre-dec ops are components of EA. - if (N->getAddressingMode() == ISD::P
[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120387 >From acfcbcc08b856f6e55a7065f28df2691f940ad76 Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:40:32 -0500 Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes. Unaligned atomic vectors with size >1 are lowered to calls. Adding their tests separately here. commit-id:a06a5cc6 --- llvm/test/CodeGen/X86/atomic-load-store.ll | 253 + 1 file changed, 253 insertions(+) diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 6efcbb80c0ce6..39e9fdfa5e62b 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind { ret <1 x i64> %ret } +define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec1_ptr: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movq (%rsp), %rax +; CHECK3-NEXT:popq %rcx +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_ptr: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movq (%rsp), %rax +; CHECK0-NEXT:popq %rcx +; CHECK0-NEXT:retq + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} + define <1 x half> @atomic_vec1_half(ptr %x) { ; CHECK3-LABEL: atomic_vec1_half: ; CHECK3: ## %bb.0: @@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind { %ret = load atomic <1 x double>, ptr %x acquire, align 8 ret <1 x double> %ret } + +define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec1_i64: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movq (%rsp), %rax +; CHECK3-NEXT:popq %rcx +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i64: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movq (%rsp), %rax +; CHECK0-NEXT:popq %rcx +; CHECK0-NEXT:retq + %ret = load atomic <1 x i64>, ptr %x acquire, align 4 + ret <1 x i64> %ret +} + +define <1 x double> @atomic_vec1_double(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec1_double: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK3-NEXT:popq %rax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_double: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK0-NEXT:popq %rax +; CHECK0-NEXT:retq + %ret = load atomic <1 x double>, ptr %x acquire, align 4 + ret <1 x double> %ret +} + +define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec2_i32: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK3-NEXT:popq %rax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec2_i32: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero +; CHECK0-NEXT:popq %rax +; CHECK0-NEXT:retq + %ret = load atomic <2 x i32>, ptr %x acquire, align 4 + ret <2 x i32> %ret +} + +define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec4_float_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[
[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120385 >From e7805ff5855e1b5117c143e700e83ab7dd1557d6 Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:37:17 -0500 Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load `load atomic <1 x T>` is not valid. This change legalizes vector types of atomic load via scalarization in SelectionDAG so that it can, for example, translate from `v1i32` to `i32`. commit-id:5c36cc8c --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 15 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +- 3 files changed, 135 insertions(+), 2 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index 720393158aa5e..89ea7ef4dbe89 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -874,6 +874,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N); SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N); SDValue ScalarizeVecRes_LOAD(LoadSDNode *N); + SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N); SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N); SDValue ScalarizeVecRes_VSELECT(SDNode *N); SDValue ScalarizeVecRes_SELECT(SDNode *N); diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index d0b69b88748a9..8eee7a4c61fe6 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, unsigned ResNo) { R = ScalarizeVecRes_UnaryOpWithExtraInput(N); break; case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break; + case ISD::ATOMIC_LOAD: +R = ScalarizeVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: R = ScalarizeVecRes_LOAD(cast(N));break; case ISD::SCALAR_TO_VECTOR: R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break; case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break; @@ -458,6 +461,18 @@ SDValue DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) { return Op; } +SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) { + SDValue Result = DAG.getAtomicLoad( + ISD::NON_EXTLOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(), + N->getValueType(0).getVectorElementType(), N->getChain(), N->getBasePtr(), + N->getMemOperand()); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Result.getValue(1)); + return Result; +} + SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) { assert(N->isUnindexed() && "Indexed vector load?"); diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 5bce4401f7bdb..d23cfb89f9fc8 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | FileCheck %s -; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | FileCheck %s +; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | FileCheck %s --check-prefixes=CHECK,CHECK3 +; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | FileCheck %s --check-prefixes=CHECK,CHECK0 define void @test1(ptr %ptr, i32 %val1) { ; CHECK-LABEL: test1: @@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) { %val = load atomic i32, ptr %ptr seq_cst, align 4 ret i32 %val } + +define <1 x i32> @atomic_vec1_i32(ptr %x) { +; CHECK-LABEL: atomic_vec1_i32: +; CHECK: ## %bb.0: +; CHECK-NEXT:movl (%rdi), %eax +; CHECK-NEXT:retq + %ret = load atomic <1 x i32>, ptr %x acquire, align 4 + ret <1 x i32> %ret +} + +define <1 x i8> @atomic_vec1_i8(ptr %x) { +; CHECK3-LABEL: atomic_vec1_i8: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzbl (%rdi), %eax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i8: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movb (%rdi), %al +; CHECK0-NEXT:retq + %ret = load atomic <1 x i8>, ptr %x acquire, align 1 + ret <1 x i8> %ret +} + +define <1 x i16> @atomic_vec1_i16(ptr %x) { +; CHECK3-LABEL: atomic_vec1_i16: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzwl (%rdi), %eax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i16: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movw (%rdi), %ax +; CHECK0-NEXT:retq + %ret = load atomic <1 x i16>, ptr %x acquire, align 2 + ret <1 x i16> %ret +} + +define <1 x i32> @atomic_vec1_i8_zext(ptr %x) { +; CHECK3-LABEL: atomic_ve
[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120386 >From fdc21070689116f6f220f29686b09c93314ad075 Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:38:23 -0500 Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG When lowering atomic <1 x T> vector types with floats, selection can fail since this pattern is unsupported. To support this, floats can be casted to an integer type of the same size. commit-id:f9d761c5 --- llvm/lib/Target/X86/X86ISelLowering.cpp| 4 +++ llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++ 2 files changed, 41 insertions(+) diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 3d9c76f3d05f5..4e59a3fb16369 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -2651,6 +2651,10 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM, setOperationAction(Op, MVT::f32, Promote); } + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16); + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32); + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64); + // We have target-specific dag combine patterns for the following nodes: setTargetDAGCombine({ISD::VECTOR_SHUFFLE, ISD::SCALAR_TO_VECTOR, diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index d23cfb89f9fc8..6efcbb80c0ce6 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind { %ret = load atomic <1 x i64>, ptr %x acquire, align 8 ret <1 x i64> %ret } + +define <1 x half> @atomic_vec1_half(ptr %x) { +; CHECK3-LABEL: atomic_vec1_half: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzwl (%rdi), %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_half: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movw (%rdi), %cx +; CHECK0-NEXT:## implicit-def: $eax +; CHECK0-NEXT:movw %cx, %ax +; CHECK0-NEXT:## implicit-def: $xmm0 +; CHECK0-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK0-NEXT:retq + %ret = load atomic <1 x half>, ptr %x acquire, align 2 + ret <1 x half> %ret +} + +define <1 x float> @atomic_vec1_float(ptr %x) { +; CHECK-LABEL: atomic_vec1_float: +; CHECK: ## %bb.0: +; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT:retq + %ret = load atomic <1 x float>, ptr %x acquire, align 4 + ret <1 x float> %ret +} + +define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec1_double_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT:retq + %ret = load atomic <1 x double>, ptr %x acquire, align 8 + ret <1 x double> %ret +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120385 >From e7805ff5855e1b5117c143e700e83ab7dd1557d6 Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:37:17 -0500 Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load `load atomic <1 x T>` is not valid. This change legalizes vector types of atomic load via scalarization in SelectionDAG so that it can, for example, translate from `v1i32` to `i32`. commit-id:5c36cc8c --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 15 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +- 3 files changed, 135 insertions(+), 2 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index 720393158aa5e..89ea7ef4dbe89 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -874,6 +874,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N); SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N); SDValue ScalarizeVecRes_LOAD(LoadSDNode *N); + SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N); SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N); SDValue ScalarizeVecRes_VSELECT(SDNode *N); SDValue ScalarizeVecRes_SELECT(SDNode *N); diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index d0b69b88748a9..8eee7a4c61fe6 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, unsigned ResNo) { R = ScalarizeVecRes_UnaryOpWithExtraInput(N); break; case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break; + case ISD::ATOMIC_LOAD: +R = ScalarizeVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: R = ScalarizeVecRes_LOAD(cast(N));break; case ISD::SCALAR_TO_VECTOR: R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break; case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break; @@ -458,6 +461,18 @@ SDValue DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) { return Op; } +SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) { + SDValue Result = DAG.getAtomicLoad( + ISD::NON_EXTLOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(), + N->getValueType(0).getVectorElementType(), N->getChain(), N->getBasePtr(), + N->getMemOperand()); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Result.getValue(1)); + return Result; +} + SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) { assert(N->isUnindexed() && "Indexed vector load?"); diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 5bce4401f7bdb..d23cfb89f9fc8 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | FileCheck %s -; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | FileCheck %s +; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | FileCheck %s --check-prefixes=CHECK,CHECK3 +; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | FileCheck %s --check-prefixes=CHECK,CHECK0 define void @test1(ptr %ptr, i32 %val1) { ; CHECK-LABEL: test1: @@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) { %val = load atomic i32, ptr %ptr seq_cst, align 4 ret i32 %val } + +define <1 x i32> @atomic_vec1_i32(ptr %x) { +; CHECK-LABEL: atomic_vec1_i32: +; CHECK: ## %bb.0: +; CHECK-NEXT:movl (%rdi), %eax +; CHECK-NEXT:retq + %ret = load atomic <1 x i32>, ptr %x acquire, align 4 + ret <1 x i32> %ret +} + +define <1 x i8> @atomic_vec1_i8(ptr %x) { +; CHECK3-LABEL: atomic_vec1_i8: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzbl (%rdi), %eax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i8: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movb (%rdi), %al +; CHECK0-NEXT:retq + %ret = load atomic <1 x i8>, ptr %x acquire, align 1 + ret <1 x i8> %ret +} + +define <1 x i16> @atomic_vec1_i16(ptr %x) { +; CHECK3-LABEL: atomic_vec1_i16: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzwl (%rdi), %eax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i16: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movw (%rdi), %ax +; CHECK0-NEXT:retq + %ret = load atomic <1 x i16>, ptr %x acquire, align 2 + ret <1 x i16> %ret +} + +define <1 x i32> @atomic_vec1_i8_zext(ptr %x) { +; CHECK3-LABEL: atomic_ve
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 >From 681d443e5e9a069a73eaa0ce50684b41b95c48fd Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 20 Dec 2024 06:14:28 -0500 Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector AtomicExpand fails for aligned `load atomic ` because it does not find a compatible library call. This change adds appropriate bitcasts so that the call can be lowered. commit-id:f430c1af --- llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +- llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 30 + .../X86/expand-atomic-non-integer.ll | 65 +++ 4 files changed, 163 insertions(+), 3 deletions(-) diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index c376de877ac7d..b6f1e9db9ce35 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + if (I->getType()->getScalarType()->isPointerTy() && + I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) { +unsigned AS = + cast(I->getType()->getScalarType())->getAddressSpace(); +ElementCount EC = cast(I->getType())->getElementCount(); +Value *BC = Builder.CreateBitCast( +Result, +VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)), +EC)); +Value *IntToPtr = Builder.CreateIntToPtr( +BC, VectorType::get(PointerType::get(Ctx, AS), EC)); +V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType()); + } else +V = Builder.CreateBitOrPointerCast(Result, I->getType()); +} else { V = Builder.CreateAlignedLoad(I->getType(), AllocaResult, AllocaAlignment); Builder.CreateLifetimeEnd(AllocaResult, SizeVal64); diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll b/llvm/test/CodeGen/ARM/atomic-load-store.ll index 560dfde356c29..36c1305a7c5df 100644 --- a/llvm/test/CodeGen/ARM/atomic-load-store.ll +++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll @@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double %val1) { store atomic double %val1, ptr %ptr seq_cst, align 8 ret void } + +define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 { +; ARM-LABEL: atomic_vec1_ptr: +; ARM: @ %bb.0: +; ARM-NEXT:ldr r0, [r0] +; ARM-NEXT:dmb ish +; ARM-NEXT:bx lr +; +; ARMOPTNONE-LABEL: atomic_vec1_ptr: +; ARMOPTNONE: @ %bb.0: +; ARMOPTNONE-NEXT:ldr r0, [r0] +; ARMOPTNONE-NEXT:dmb ish +; ARMOPTNONE-NEXT:bx lr +; +; THUMBTWO-LABEL: atomic_vec1_ptr: +; THUMBTWO: @ %bb.0: +; THUMBTWO-NEXT:ldr r0, [r0] +; THUMBTWO-NEXT:dmb ish +; THUMBTWO-NEXT:bx lr +; +; THUMBONE-LABEL: atomic_vec1_ptr: +; THUMBONE: @ %bb.0: +; THUMBONE-NEXT:push {r7, lr} +; THUMBONE-NEXT:movs r1, #0 +; THUMBONE-NEXT:mov r2, r1 +; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4 +; THUMBONE-NEXT:pop {r7, pc} +; +; ARMV4-LABEL: atomic_vec1_ptr: +; ARMV4: @ %bb.0: +; ARMV4-NEXT:push {r11, lr} +; ARMV4-NEXT:mov r1, #2 +; ARMV4-NEXT:bl __atomic_load_4 +; ARMV4-NEXT:pop {r11, lr} +; ARMV4-NEXT:mov pc, lr +; +; ARMV6-LABEL: atomic_vec1_ptr: +; ARMV6: @ %bb.0: +; ARMV6-NEXT:mov r1, #0 +; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5 +; ARMV6-NEXT:ldr r0, [r0] +; ARMV6-NEXT:bx lr +; +; THUMBM-LABEL: atomic_vec1_ptr: +; THUMBM: @ %bb.0: +; THUMBM-NEXT:ldr r0, [r0] +; THUMBM-NEXT:dmb sy +; THUMBM-NEXT:bx lr + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 08d0405345f57..4293df8c13571 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { ret <2 x i32> %ret } +define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec2_ptr_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT:popq %rax +; CHECK-NEXT:retq + %ret = load atomic <2 x ptr>, ptr %x acquire, align 16 + ret <2 x ptr> %ret +} + define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { ; CHECK3-LAB
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120598 >From 218ce15319d641c89ce8a5ea7e770fd0c2e5223d Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 11:19:39 -0500 Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load Vector types of 2 elements must be widened. This change does this for vector types of atomic load in SelectionDAG so that it can translate aligned vectors of >1 size. Also, it also adds Pats to remove an extra MOV. commit-id:2894ccd1 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 104 ++ llvm/lib/Target/X86/X86InstrCompiler.td | 7 ++ llvm/test/CodeGen/X86/atomic-load-store.ll| 81 ++ llvm/test/CodeGen/X86/atomic-unordered.ll | 3 +- 5 files changed, 173 insertions(+), 23 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index 89ea7ef4dbe89..bdfa5f7741ad3 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N); SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N); SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N); + SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N); SDValue WidenVecRes_LOAD(SDNode* N); SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N); SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N); diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index 8eee7a4c61fe6..0edbbe8af623a 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -4625,6 +4625,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) { break; case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break; case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break; + case ISD::ATOMIC_LOAD: +Res = WidenVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: Res = WidenVecRes_LOAD(N); break; case ISD::STEP_VECTOR: case ISD::SPLAT_VECTOR: @@ -6014,6 +6017,85 @@ SDValue DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) { N->getOperand(1), N->getOperand(2)); } +/// Either return the same load or provide appropriate casts +/// from the load and return that. +static SDValue loadElement(SDValue LdOp, EVT FirstVT, EVT WidenVT, + TypeSize LdWidth, TypeSize FirstVTWidth, SDLoc dl, + SelectionDAG &DAG) { + assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth)); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + if (!FirstVT.isVector()) { +unsigned NumElts = +WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue(); +EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts); +SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp); +return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp); + } else if (FirstVT == WidenVT) +return LdOp; + else { +// TODO: We don't currently have any tests that exercise this code path. +assert(WidenWidth.getFixedValue() % FirstVTWidth.getFixedValue() == 0); +unsigned NumConcat = +WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue(); +SmallVector ConcatOps(NumConcat); +SDValue UndefVal = DAG.getUNDEF(FirstVT); +ConcatOps[0] = LdOp; +for (unsigned i = 1; i != NumConcat; ++i) + ConcatOps[i] = UndefVal; +return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, ConcatOps); + } +} + +static std::optional findMemType(SelectionDAG &DAG, + const TargetLowering &TLI, unsigned Width, + EVT WidenVT, unsigned Align, + unsigned WidenEx); + +SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) { + EVT WidenVT = + TLI.getTypeToTransformTo(*DAG.getContext(), LD->getValueType(0)); + EVT LdVT = LD->getMemoryVT(); + SDLoc dl(LD); + assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors"); + assert(LdVT.isScalableVector() == WidenVT.isScalableVector() && + "Must be scalable"); + assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() && + "Expected equivalent element types"); + + // Load information + SDValue Chain = LD->getChain(); + SDValue BasePtr = LD->getBasePtr(); + MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags(); + AAMDNodes AAInfo = LD->getAAInfo(); + + TypeSize LdWidth = LdVT.getSizeInBits(); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + TypeSize WidthDiff = WidenWidth - LdWidth; + + // Find the vector type that can load from. + std::optional FirstVT = +
[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120640 >From db5b862de7faed37cb9c40c170d4cd1e9612b489 Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 16:25:55 -0500 Subject: [PATCH] [SelectionDAG] Split vector types for atomic load Vector types that aren't widened are split so that a single ATOMIC_LOAD is issued for the entire vector at once. This change utilizes the load vectorization infrastructure in SelectionDAG in order to group the vectors. This enables SelectionDAG to translate vectors with type bfloat,half. commit-id:3a045357 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 37 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++ 3 files changed, 209 insertions(+) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index bdfa5f7741ad3..d8f402f529632 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi); + void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo, diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index 0edbbe8af623a..895d3c51e0e1e 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SplitVecRes_STEP_VECTOR(N, Lo, Hi); break; case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break; + case ISD::ATOMIC_LOAD: +SplitVecRes_ATOMIC_LOAD(cast(N), Lo, Hi); +break; case ISD::LOAD: SplitVecRes_LOAD(cast(N), Lo, Hi); break; @@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SetSplitVector(SDValue(N, ResNo), Lo, Hi); } +void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, + SDValue &Hi) { + EVT LoVT, HiVT; + SDLoc dl(LD); + std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0)); + + ISD::LoadExtType ExtType = LD->getExtensionType(); + SDValue Ch = LD->getChain(); + SDValue Ptr = LD->getBasePtr(); + EVT MemoryVT = LD->getMemoryVT(); + + EVT LoMemVT, HiMemVT; + std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT); + + EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), LD->getValueType(0).getSizeInBits()); + EVT MemIntVT = EVT::getIntegerVT(*DAG.getContext(), 2 * LoMemVT.getSizeInBits()); + SDValue ALD = DAG.getAtomicLoad(ExtType, dl, MemIntVT, IntVT, Ch, Ptr, + LD->getMemOperand()); + + EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits()); + EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits()); + SDValue ExtractLo = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, LoIntVT, ALD, + DAG.getIntPtrConstant(0, dl)); + SDValue ExtractHi = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiIntVT, ALD, + DAG.getIntPtrConstant(1, dl)); + + Lo = DAG.getBitcast(LoVT, ExtractLo); + Hi = DAG.getBitcast(HiVT, ExtractHi); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(LD, 1), ALD.getValue(1)); +} + void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT, MachinePointerInfo &MPI, SDValue &Ptr, uint64_t *ScaledOffset) { diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 935d058a52f8f..42b0955824293 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -204,6 +204,68 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) { ret <2 x float> %ret } +define <2 x half> @atomic_vec2_half(ptr %x) { +; CHECK3-LABEL: atomic_vec2_half: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movl (%rdi), %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK3-NEXT:shrl $16, %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm1 +; CHECK3-NEXT:punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3] +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec2_half: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movl (%rdi), %eax
[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120640 >From db5b862de7faed37cb9c40c170d4cd1e9612b489 Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 16:25:55 -0500 Subject: [PATCH] [SelectionDAG] Split vector types for atomic load Vector types that aren't widened are split so that a single ATOMIC_LOAD is issued for the entire vector at once. This change utilizes the load vectorization infrastructure in SelectionDAG in order to group the vectors. This enables SelectionDAG to translate vectors with type bfloat,half. commit-id:3a045357 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 37 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++ 3 files changed, 209 insertions(+) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index bdfa5f7741ad3..d8f402f529632 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi); + void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo, diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index 0edbbe8af623a..895d3c51e0e1e 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SplitVecRes_STEP_VECTOR(N, Lo, Hi); break; case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break; + case ISD::ATOMIC_LOAD: +SplitVecRes_ATOMIC_LOAD(cast(N), Lo, Hi); +break; case ISD::LOAD: SplitVecRes_LOAD(cast(N), Lo, Hi); break; @@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SetSplitVector(SDValue(N, ResNo), Lo, Hi); } +void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, + SDValue &Hi) { + EVT LoVT, HiVT; + SDLoc dl(LD); + std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0)); + + ISD::LoadExtType ExtType = LD->getExtensionType(); + SDValue Ch = LD->getChain(); + SDValue Ptr = LD->getBasePtr(); + EVT MemoryVT = LD->getMemoryVT(); + + EVT LoMemVT, HiMemVT; + std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT); + + EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), LD->getValueType(0).getSizeInBits()); + EVT MemIntVT = EVT::getIntegerVT(*DAG.getContext(), 2 * LoMemVT.getSizeInBits()); + SDValue ALD = DAG.getAtomicLoad(ExtType, dl, MemIntVT, IntVT, Ch, Ptr, + LD->getMemOperand()); + + EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits()); + EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits()); + SDValue ExtractLo = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, LoIntVT, ALD, + DAG.getIntPtrConstant(0, dl)); + SDValue ExtractHi = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiIntVT, ALD, + DAG.getIntPtrConstant(1, dl)); + + Lo = DAG.getBitcast(LoVT, ExtractLo); + Hi = DAG.getBitcast(HiVT, ExtractHi); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(LD, 1), ALD.getValue(1)); +} + void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT, MachinePointerInfo &MPI, SDValue &Ptr, uint64_t *ScaledOffset) { diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 935d058a52f8f..42b0955824293 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -204,6 +204,68 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) { ret <2 x float> %ret } +define <2 x half> @atomic_vec2_half(ptr %x) { +; CHECK3-LABEL: atomic_vec2_half: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movl (%rdi), %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK3-NEXT:shrl $16, %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm1 +; CHECK3-NEXT:punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3] +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec2_half: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movl (%rdi), %eax
[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120387 >From acfcbcc08b856f6e55a7065f28df2691f940ad76 Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:40:32 -0500 Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes. Unaligned atomic vectors with size >1 are lowered to calls. Adding their tests separately here. commit-id:a06a5cc6 --- llvm/test/CodeGen/X86/atomic-load-store.ll | 253 + 1 file changed, 253 insertions(+) diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 6efcbb80c0ce6..39e9fdfa5e62b 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind { ret <1 x i64> %ret } +define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec1_ptr: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movq (%rsp), %rax +; CHECK3-NEXT:popq %rcx +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_ptr: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movq (%rsp), %rax +; CHECK0-NEXT:popq %rcx +; CHECK0-NEXT:retq + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} + define <1 x half> @atomic_vec1_half(ptr %x) { ; CHECK3-LABEL: atomic_vec1_half: ; CHECK3: ## %bb.0: @@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind { %ret = load atomic <1 x double>, ptr %x acquire, align 8 ret <1 x double> %ret } + +define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec1_i64: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movq (%rsp), %rax +; CHECK3-NEXT:popq %rcx +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i64: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movq (%rsp), %rax +; CHECK0-NEXT:popq %rcx +; CHECK0-NEXT:retq + %ret = load atomic <1 x i64>, ptr %x acquire, align 4 + ret <1 x i64> %ret +} + +define <1 x double> @atomic_vec1_double(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec1_double: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK3-NEXT:popq %rax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_double: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK0-NEXT:popq %rax +; CHECK0-NEXT:retq + %ret = load atomic <1 x double>, ptr %x acquire, align 4 + ret <1 x double> %ret +} + +define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec2_i32: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK3-NEXT:popq %rax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec2_i32: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero +; CHECK0-NEXT:popq %rax +; CHECK0-NEXT:retq + %ret = load atomic <2 x i32>, ptr %x acquire, align 4 + ret <2 x i32> %ret +} + +define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec4_float_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[
[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120385 >From e7805ff5855e1b5117c143e700e83ab7dd1557d6 Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:37:17 -0500 Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load `load atomic <1 x T>` is not valid. This change legalizes vector types of atomic load via scalarization in SelectionDAG so that it can, for example, translate from `v1i32` to `i32`. commit-id:5c36cc8c --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 15 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +- 3 files changed, 135 insertions(+), 2 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index 720393158aa5e..89ea7ef4dbe89 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -874,6 +874,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N); SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N); SDValue ScalarizeVecRes_LOAD(LoadSDNode *N); + SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N); SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N); SDValue ScalarizeVecRes_VSELECT(SDNode *N); SDValue ScalarizeVecRes_SELECT(SDNode *N); diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index d0b69b88748a9..8eee7a4c61fe6 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, unsigned ResNo) { R = ScalarizeVecRes_UnaryOpWithExtraInput(N); break; case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break; + case ISD::ATOMIC_LOAD: +R = ScalarizeVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: R = ScalarizeVecRes_LOAD(cast(N));break; case ISD::SCALAR_TO_VECTOR: R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break; case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break; @@ -458,6 +461,18 @@ SDValue DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) { return Op; } +SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) { + SDValue Result = DAG.getAtomicLoad( + ISD::NON_EXTLOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(), + N->getValueType(0).getVectorElementType(), N->getChain(), N->getBasePtr(), + N->getMemOperand()); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Result.getValue(1)); + return Result; +} + SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) { assert(N->isUnindexed() && "Indexed vector load?"); diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 5bce4401f7bdb..d23cfb89f9fc8 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | FileCheck %s -; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | FileCheck %s +; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | FileCheck %s --check-prefixes=CHECK,CHECK3 +; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | FileCheck %s --check-prefixes=CHECK,CHECK0 define void @test1(ptr %ptr, i32 %val1) { ; CHECK-LABEL: test1: @@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) { %val = load atomic i32, ptr %ptr seq_cst, align 4 ret i32 %val } + +define <1 x i32> @atomic_vec1_i32(ptr %x) { +; CHECK-LABEL: atomic_vec1_i32: +; CHECK: ## %bb.0: +; CHECK-NEXT:movl (%rdi), %eax +; CHECK-NEXT:retq + %ret = load atomic <1 x i32>, ptr %x acquire, align 4 + ret <1 x i32> %ret +} + +define <1 x i8> @atomic_vec1_i8(ptr %x) { +; CHECK3-LABEL: atomic_vec1_i8: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzbl (%rdi), %eax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i8: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movb (%rdi), %al +; CHECK0-NEXT:retq + %ret = load atomic <1 x i8>, ptr %x acquire, align 1 + ret <1 x i8> %ret +} + +define <1 x i16> @atomic_vec1_i16(ptr %x) { +; CHECK3-LABEL: atomic_vec1_i16: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzwl (%rdi), %eax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i16: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movw (%rdi), %ax +; CHECK0-NEXT:retq + %ret = load atomic <1 x i16>, ptr %x acquire, align 2 + ret <1 x i16> %ret +} + +define <1 x i32> @atomic_vec1_i8_zext(ptr %x) { +; CHECK3-LABEL: atomic_ve
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/125432 >From fc2debee17c4ded2edbe2f1803f3184cea78bfdc Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 31 Jan 2025 13:12:56 -0500 Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic vector. After splitting, all elements are created. The two components must be found by looking at the upper and lower half of EXTRACT_ELEMENT. This change extends EltsFromConsecutiveLoads to understand AtomicSDNode so that unused elements can be removed. commit-id:b83937a8 --- llvm/include/llvm/CodeGen/SelectionDAG.h | 4 +- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 20 ++- .../SelectionDAGAddressAnalysis.cpp | 30 ++-- .../SelectionDAG/SelectionDAGBuilder.cpp | 6 +- llvm/lib/Target/X86/X86ISelLowering.cpp | 43 +++-- llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++ 6 files changed, 83 insertions(+), 187 deletions(-) diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h index ba11ddbb5b731..d3cd81c146280 100644 --- a/llvm/include/llvm/CodeGen/SelectionDAG.h +++ b/llvm/include/llvm/CodeGen/SelectionDAG.h @@ -1843,7 +1843,7 @@ class SelectionDAG { /// chain to the token factor. This ensures that the new memory node will have /// the same relative memory dependency position as the old load. Returns the /// new merged load chain. - SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp); + SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp); /// Topological-sort the AllNodes list and a /// assign a unique node id for each node in the DAG based on their @@ -2281,7 +2281,7 @@ class SelectionDAG { /// merged. Check that both are nonvolatile and if LD is loading /// 'Bytes' bytes from a location that is 'Dist' units away from the /// location that the 'Base' load is loading from. - bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base, + bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base, unsigned Bytes, int Dist) const; /// Infer alignment of a load / store address. Return std::nullopt if it diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index 2a68903c34cef..8e77a542ab029 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -12218,7 +12218,7 @@ SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain, return TokenFactor; } -SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, +SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp) { assert(isa(NewMemOp.getNode()) && "Expected a memop node"); SDValue OldChain = SDValue(OldLoad, 1); @@ -12911,17 +12911,21 @@ std::pair SelectionDAG::UnrollVectorOverflowOp( getBuildVector(NewOvVT, dl, OvScalars)); } -bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD, - LoadSDNode *Base, +bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD, + MemSDNode *Base, unsigned Bytes, int Dist) const { if (LD->isVolatile() || Base->isVolatile()) return false; - // TODO: probably too restrictive for atomics, revisit - if (!LD->isSimple()) -return false; - if (LD->isIndexed() || Base->isIndexed()) -return false; + if (auto Ld = dyn_cast(LD)) { +if (!Ld->isSimple()) + return false; +if (Ld->isIndexed()) + return false; + } + if (auto Ld = dyn_cast(Base)) +if (Ld->isIndexed()) + return false; if (LD->getChain() != Base->getChain()) return false; EVT VT = LD->getMemoryVT(); diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp index f2ab88851b780..c29cb424c7a4c 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp @@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, int64_t BitSize, } /// Parses tree in Ptr for base, index, offset addresses. -static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, - const SelectionDAG &DAG) { +template +static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) { SDValue Ptr = N->getBasePtr(); // (((B + I*M) + c)) + c ... @@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, bool IsIndexSignExt = false; // pre-inc/pre-dec ops are components of EA. - if (N->getAddressingMode() == ISD::P
[llvm-branch-commits] [llvm] release/20.x: [OpenMP] Add pre sm_70 load hack back in (#138589) (PR #138626)
ye-luo wrote: I manually verified the effectiveness of this patch on 20.x release branch. https://github.com/llvm/llvm-project/pull/138626 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 >From bdf9a55d6beecbee24114d60e922ae2c360fc8b3 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 20 Dec 2024 06:14:28 -0500 Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector AtomicExpand fails for aligned `load atomic ` because it does not find a compatible library call. This change adds appropriate bitcasts so that the call can be lowered. commit-id:f430c1af --- llvm/lib/CodeGen/AtomicExpandPass.cpp | 16 - llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 30 + .../X86/expand-atomic-non-integer.ll | 65 +++ 4 files changed, 159 insertions(+), 3 deletions(-) diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index c376de877ac7d..2a266bd773a72 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -2066,9 +2066,19 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + auto PtrTy = dyn_cast(I->getType()->getScalarType()); + auto VTy = dyn_cast(I->getType()); + if (VTy && PtrTy && !Result->getType()->isVectorTy()) { +unsigned AS = PtrTy->getAddressSpace(); +Value *BC = Builder.CreateBitCast( +Result, VTy->getWithNewType(DL.getIntPtrType(Ctx, AS))); +V = Builder.CreateIntToPtr( +BC, VTy->getWithNewType(PointerType::get(Ctx, AS))); + } else +V = Builder.CreateBitOrPointerCast(Result, I->getType()); +} else { V = Builder.CreateAlignedLoad(I->getType(), AllocaResult, AllocaAlignment); Builder.CreateLifetimeEnd(AllocaResult, SizeVal64); diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll b/llvm/test/CodeGen/ARM/atomic-load-store.ll index 560dfde356c29..36c1305a7c5df 100644 --- a/llvm/test/CodeGen/ARM/atomic-load-store.ll +++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll @@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double %val1) { store atomic double %val1, ptr %ptr seq_cst, align 8 ret void } + +define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 { +; ARM-LABEL: atomic_vec1_ptr: +; ARM: @ %bb.0: +; ARM-NEXT:ldr r0, [r0] +; ARM-NEXT:dmb ish +; ARM-NEXT:bx lr +; +; ARMOPTNONE-LABEL: atomic_vec1_ptr: +; ARMOPTNONE: @ %bb.0: +; ARMOPTNONE-NEXT:ldr r0, [r0] +; ARMOPTNONE-NEXT:dmb ish +; ARMOPTNONE-NEXT:bx lr +; +; THUMBTWO-LABEL: atomic_vec1_ptr: +; THUMBTWO: @ %bb.0: +; THUMBTWO-NEXT:ldr r0, [r0] +; THUMBTWO-NEXT:dmb ish +; THUMBTWO-NEXT:bx lr +; +; THUMBONE-LABEL: atomic_vec1_ptr: +; THUMBONE: @ %bb.0: +; THUMBONE-NEXT:push {r7, lr} +; THUMBONE-NEXT:movs r1, #0 +; THUMBONE-NEXT:mov r2, r1 +; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4 +; THUMBONE-NEXT:pop {r7, pc} +; +; ARMV4-LABEL: atomic_vec1_ptr: +; ARMV4: @ %bb.0: +; ARMV4-NEXT:push {r11, lr} +; ARMV4-NEXT:mov r1, #2 +; ARMV4-NEXT:bl __atomic_load_4 +; ARMV4-NEXT:pop {r11, lr} +; ARMV4-NEXT:mov pc, lr +; +; ARMV6-LABEL: atomic_vec1_ptr: +; ARMV6: @ %bb.0: +; ARMV6-NEXT:mov r1, #0 +; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5 +; ARMV6-NEXT:ldr r0, [r0] +; ARMV6-NEXT:bx lr +; +; THUMBM-LABEL: atomic_vec1_ptr: +; THUMBM: @ %bb.0: +; THUMBM-NEXT:ldr r0, [r0] +; THUMBM-NEXT:dmb sy +; THUMBM-NEXT:bx lr + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 08d0405345f57..4293df8c13571 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { ret <2 x i32> %ret } +define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec2_ptr_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT:popq %rax +; CHECK-NEXT:retq + %ret = load atomic <2 x ptr>, ptr %x acquire, align 16 + ret <2 x ptr> %ret +} + define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { ; CHECK3-LABEL: atomic_vec4_i8: ; CHECK3: ## %bb.0: @@ -394,6 +409,21 @@ define <4 x i16> @atomic_vec4_i16(ptr %x) nounwind { ret <4 x i16> %ret } +define <4 x ptr addrspace(270)> @atomic_vec4_ptr270(ptr %x) nounwind { +; CHECK-LA
[llvm-branch-commits] [llvm] release/20.x: [OpenMP] Add pre sm_70 load hack back in (#138589) (PR #138626)
https://github.com/shiltian approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/138626 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120387 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120386 >From 5005b94e624695e209a41c2f82f438b9bf1b1bb8 Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:38:23 -0500 Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG When lowering atomic <1 x T> vector types with floats, selection can fail since this pattern is unsupported. To support this, floats can be casted to an integer type of the same size. commit-id:f9d761c5 --- llvm/lib/Target/X86/X86ISelLowering.cpp| 4 +++ llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++ 2 files changed, 41 insertions(+) diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 3d9c76f3d05f5..4e59a3fb16369 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -2651,6 +2651,10 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM, setOperationAction(Op, MVT::f32, Promote); } + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16); + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32); + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64); + // We have target-specific dag combine patterns for the following nodes: setTargetDAGCombine({ISD::VECTOR_SHUFFLE, ISD::SCALAR_TO_VECTOR, diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index d23cfb89f9fc8..6efcbb80c0ce6 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind { %ret = load atomic <1 x i64>, ptr %x acquire, align 8 ret <1 x i64> %ret } + +define <1 x half> @atomic_vec1_half(ptr %x) { +; CHECK3-LABEL: atomic_vec1_half: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzwl (%rdi), %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_half: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movw (%rdi), %cx +; CHECK0-NEXT:## implicit-def: $eax +; CHECK0-NEXT:movw %cx, %ax +; CHECK0-NEXT:## implicit-def: $xmm0 +; CHECK0-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK0-NEXT:retq + %ret = load atomic <1 x half>, ptr %x acquire, align 2 + ret <1 x half> %ret +} + +define <1 x float> @atomic_vec1_float(ptr %x) { +; CHECK-LABEL: atomic_vec1_float: +; CHECK: ## %bb.0: +; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT:retq + %ret = load atomic <1 x float>, ptr %x acquire, align 4 + ret <1 x float> %ret +} + +define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec1_double_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT:retq + %ret = load atomic <1 x double>, ptr %x acquire, align 8 + ret <1 x double> %ret +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/125432 >From 18fe5c781d1b128edcfef06a71152846fd7d2bec Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 31 Jan 2025 13:12:56 -0500 Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic vector. After splitting, all elements are created. The two components must be found by looking at the upper and lower half of EXTRACT_ELEMENT. This change extends EltsFromConsecutiveLoads to understand AtomicSDNode so that unused elements can be removed. commit-id:b83937a8 --- llvm/include/llvm/CodeGen/SelectionDAG.h | 4 +- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 20 ++- .../SelectionDAGAddressAnalysis.cpp | 30 ++-- .../SelectionDAG/SelectionDAGBuilder.cpp | 6 +- llvm/lib/Target/X86/X86ISelLowering.cpp | 43 +++-- llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++ 6 files changed, 83 insertions(+), 187 deletions(-) diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h index ba11ddbb5b731..d3cd81c146280 100644 --- a/llvm/include/llvm/CodeGen/SelectionDAG.h +++ b/llvm/include/llvm/CodeGen/SelectionDAG.h @@ -1843,7 +1843,7 @@ class SelectionDAG { /// chain to the token factor. This ensures that the new memory node will have /// the same relative memory dependency position as the old load. Returns the /// new merged load chain. - SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp); + SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp); /// Topological-sort the AllNodes list and a /// assign a unique node id for each node in the DAG based on their @@ -2281,7 +2281,7 @@ class SelectionDAG { /// merged. Check that both are nonvolatile and if LD is loading /// 'Bytes' bytes from a location that is 'Dist' units away from the /// location that the 'Base' load is loading from. - bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base, + bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base, unsigned Bytes, int Dist) const; /// Infer alignment of a load / store address. Return std::nullopt if it diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index 2a68903c34cef..8e77a542ab029 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -12218,7 +12218,7 @@ SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain, return TokenFactor; } -SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, +SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp) { assert(isa(NewMemOp.getNode()) && "Expected a memop node"); SDValue OldChain = SDValue(OldLoad, 1); @@ -12911,17 +12911,21 @@ std::pair SelectionDAG::UnrollVectorOverflowOp( getBuildVector(NewOvVT, dl, OvScalars)); } -bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD, - LoadSDNode *Base, +bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD, + MemSDNode *Base, unsigned Bytes, int Dist) const { if (LD->isVolatile() || Base->isVolatile()) return false; - // TODO: probably too restrictive for atomics, revisit - if (!LD->isSimple()) -return false; - if (LD->isIndexed() || Base->isIndexed()) -return false; + if (auto Ld = dyn_cast(LD)) { +if (!Ld->isSimple()) + return false; +if (Ld->isIndexed()) + return false; + } + if (auto Ld = dyn_cast(Base)) +if (Ld->isIndexed()) + return false; if (LD->getChain() != Base->getChain()) return false; EVT VT = LD->getMemoryVT(); diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp index f2ab88851b780..c29cb424c7a4c 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp @@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, int64_t BitSize, } /// Parses tree in Ptr for base, index, offset addresses. -static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, - const SelectionDAG &DAG) { +template +static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) { SDValue Ptr = N->getBasePtr(); // (((B + I*M) + c)) + c ... @@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, bool IsIndexSignExt = false; // pre-inc/pre-dec ops are components of EA. - if (N->getAddressingMode() == ISD::P
[llvm-branch-commits] [llvm] [X86] Remove extra MOV after widening atomic load (PR #138635)
https://github.com/jofrn created https://github.com/llvm/llvm-project/pull/138635 This change adds patterns to optimize out an extra MOV present after widening the atomic load. --- **Stack**: - #120716 - #125432 - #120640 - #138635 โฌ - #120598 - #120387 - #120386 - #120385 - #120384 โ ๏ธ *Part of a stack created by [spr](https://github.com/ejoffe/spr). Do not merge manually using the UI - doing so may have unexpected results.* >From 438373241048d37fc2ee11419ceae9b53821fcaf Mon Sep 17 00:00:00 2001 From: jofernau_amdeng Date: Tue, 6 May 2025 01:48:11 -0400 Subject: [PATCH] [X86] Remove extra MOV after widening atomic load This change adds patterns to optimize out an extra MOV present after widening the atomic load. commit-id:45989503 --- llvm/lib/Target/X86/X86InstrCompiler.td| 7 llvm/test/CodeGen/X86/atomic-load-store.ll | 43 -- 2 files changed, 30 insertions(+), 20 deletions(-) diff --git a/llvm/lib/Target/X86/X86InstrCompiler.td b/llvm/lib/Target/X86/X86InstrCompiler.td index 167e27eddd71e..8ad8a0a6194d6 100644 --- a/llvm/lib/Target/X86/X86InstrCompiler.td +++ b/llvm/lib/Target/X86/X86InstrCompiler.td @@ -1200,6 +1200,13 @@ def : Pat<(i16 (atomic_load_nonext_16 addr:$src)), (MOV16rm addr:$src)>; def : Pat<(i32 (atomic_load_nonext_32 addr:$src)), (MOV32rm addr:$src)>; def : Pat<(i64 (atomic_load_nonext_64 addr:$src)), (MOV64rm addr:$src)>; +def : Pat<(v4i32 (scalar_to_vector (i32 (anyext (i16 (atomic_load_16 addr:$src)), + (MOVDI2PDIrm addr:$src)>; // load atomic <2 x i8> +def : Pat<(v4i32 (scalar_to_vector (i32 (atomic_load_32 addr:$src, + (MOVDI2PDIrm addr:$src)>; // load atomic <2 x i16> +def : Pat<(v2i64 (scalar_to_vector (i64 (atomic_load_64 addr:$src, + (MOV64toPQIrm addr:$src)>; // load atomic <2 x i32,float> + // Floating point loads/stores. def : Pat<(atomic_store_32 (i32 (bitconvert (f32 FR32:$src))), addr:$dst), (MOVSSmr addr:$dst, FR32:$src)>, Requires<[UseSSE1]>; diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 9ee8b4fc5ac7f..935d058a52f8f 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -149,8 +149,7 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind { define <2 x i8> @atomic_vec2_i8(ptr %x) { ; CHECK3-LABEL: atomic_vec2_i8: ; CHECK3: ## %bb.0: -; CHECK3-NEXT:movzwl (%rdi), %eax -; CHECK3-NEXT:movd %eax, %xmm0 +; CHECK3-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero ; CHECK3-NEXT:retq ; ; CHECK0-LABEL: atomic_vec2_i8: @@ -165,11 +164,15 @@ define <2 x i8> @atomic_vec2_i8(ptr %x) { } define <2 x i16> @atomic_vec2_i16(ptr %x) { -; CHECK-LABEL: atomic_vec2_i16: -; CHECK: ## %bb.0: -; CHECK-NEXT:movl (%rdi), %eax -; CHECK-NEXT:movd %eax, %xmm0 -; CHECK-NEXT:retq +; CHECK3-LABEL: atomic_vec2_i16: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec2_i16: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movd {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK0-NEXT:retq %ret = load atomic <2 x i16>, ptr %x acquire, align 4 ret <2 x i16> %ret } @@ -177,8 +180,7 @@ define <2 x i16> @atomic_vec2_i16(ptr %x) { define <2 x ptr addrspace(270)> @atomic_vec2_ptr270(ptr %x) { ; CHECK-LABEL: atomic_vec2_ptr270: ; CHECK: ## %bb.0: -; CHECK-NEXT:movq (%rdi), %rax -; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:movq (%rdi), %xmm0 ; CHECK-NEXT:retq %ret = load atomic <2 x ptr addrspace(270)>, ptr %x acquire, align 8 ret <2 x ptr addrspace(270)> %ret @@ -187,8 +189,7 @@ define <2 x ptr addrspace(270)> @atomic_vec2_ptr270(ptr %x) { define <2 x i32> @atomic_vec2_i32_align(ptr %x) { ; CHECK-LABEL: atomic_vec2_i32_align: ; CHECK: ## %bb.0: -; CHECK-NEXT:movq (%rdi), %rax -; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:movq (%rdi), %xmm0 ; CHECK-NEXT:retq %ret = load atomic <2 x i32>, ptr %x acquire, align 8 ret <2 x i32> %ret @@ -197,8 +198,7 @@ define <2 x i32> @atomic_vec2_i32_align(ptr %x) { define <2 x float> @atomic_vec2_float_align(ptr %x) { ; CHECK-LABEL: atomic_vec2_float_align: ; CHECK: ## %bb.0: -; CHECK-NEXT:movq (%rdi), %rax -; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:movq (%rdi), %xmm0 ; CHECK-NEXT:retq %ret = load atomic <2 x float>, ptr %x acquire, align 8 ret <2 x float> %ret @@ -354,11 +354,15 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { } define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { -; CHECK-LABEL: atomic_vec4_i8: -; CHECK: ## %bb.0: -; CHECK-NEXT:movl (%rdi), %eax -; CHECK-NEXT:movd %eax, %xmm0 -; CHECK-NEXT:retq +; CHECK3-LABEL: atomic_vec4_i8: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec4_i8
[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)
https://github.com/jofrn edited https://github.com/llvm/llvm-project/pull/120385 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120385 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)
https://github.com/jofrn edited https://github.com/llvm/llvm-project/pull/120386 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)
https://github.com/jofrn edited https://github.com/llvm/llvm-project/pull/120640 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG] Widen <2 x T> vector types for atomic load (PR #120598)
https://github.com/jofrn edited https://github.com/llvm/llvm-project/pull/120598 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)
https://github.com/jofrn edited https://github.com/llvm/llvm-project/pull/120387 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [X86] Remove extra MOV after widening atomic load (PR #138635)
llvmbot wrote: @llvm/pr-subscribers-backend-x86 Author: None (jofrn) Changes This change adds patterns to optimize out an extra MOV present after widening the atomic load. --- **Stack**: - #120716 - #125432 - #120640 - #138635 โฌ - #120598 - #120387 - #120386 - #120385 - #120384 โ ๏ธ *Part of a stack created by [spr](https://github.com/ejoffe/spr). Do not merge manually using the UI - doing so may have unexpected results.* --- Full diff: https://github.com/llvm/llvm-project/pull/138635.diff 2 Files Affected: - (modified) llvm/lib/Target/X86/X86InstrCompiler.td (+7) - (modified) llvm/test/CodeGen/X86/atomic-load-store.ll (+23-20) ``diff diff --git a/llvm/lib/Target/X86/X86InstrCompiler.td b/llvm/lib/Target/X86/X86InstrCompiler.td index 167e27eddd71e..8ad8a0a6194d6 100644 --- a/llvm/lib/Target/X86/X86InstrCompiler.td +++ b/llvm/lib/Target/X86/X86InstrCompiler.td @@ -1200,6 +1200,13 @@ def : Pat<(i16 (atomic_load_nonext_16 addr:$src)), (MOV16rm addr:$src)>; def : Pat<(i32 (atomic_load_nonext_32 addr:$src)), (MOV32rm addr:$src)>; def : Pat<(i64 (atomic_load_nonext_64 addr:$src)), (MOV64rm addr:$src)>; +def : Pat<(v4i32 (scalar_to_vector (i32 (anyext (i16 (atomic_load_16 addr:$src)), + (MOVDI2PDIrm addr:$src)>; // load atomic <2 x i8> +def : Pat<(v4i32 (scalar_to_vector (i32 (atomic_load_32 addr:$src, + (MOVDI2PDIrm addr:$src)>; // load atomic <2 x i16> +def : Pat<(v2i64 (scalar_to_vector (i64 (atomic_load_64 addr:$src, + (MOV64toPQIrm addr:$src)>; // load atomic <2 x i32,float> + // Floating point loads/stores. def : Pat<(atomic_store_32 (i32 (bitconvert (f32 FR32:$src))), addr:$dst), (MOVSSmr addr:$dst, FR32:$src)>, Requires<[UseSSE1]>; diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 9ee8b4fc5ac7f..935d058a52f8f 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -149,8 +149,7 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind { define <2 x i8> @atomic_vec2_i8(ptr %x) { ; CHECK3-LABEL: atomic_vec2_i8: ; CHECK3: ## %bb.0: -; CHECK3-NEXT:movzwl (%rdi), %eax -; CHECK3-NEXT:movd %eax, %xmm0 +; CHECK3-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero ; CHECK3-NEXT:retq ; ; CHECK0-LABEL: atomic_vec2_i8: @@ -165,11 +164,15 @@ define <2 x i8> @atomic_vec2_i8(ptr %x) { } define <2 x i16> @atomic_vec2_i16(ptr %x) { -; CHECK-LABEL: atomic_vec2_i16: -; CHECK: ## %bb.0: -; CHECK-NEXT:movl (%rdi), %eax -; CHECK-NEXT:movd %eax, %xmm0 -; CHECK-NEXT:retq +; CHECK3-LABEL: atomic_vec2_i16: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec2_i16: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movd {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK0-NEXT:retq %ret = load atomic <2 x i16>, ptr %x acquire, align 4 ret <2 x i16> %ret } @@ -177,8 +180,7 @@ define <2 x i16> @atomic_vec2_i16(ptr %x) { define <2 x ptr addrspace(270)> @atomic_vec2_ptr270(ptr %x) { ; CHECK-LABEL: atomic_vec2_ptr270: ; CHECK: ## %bb.0: -; CHECK-NEXT:movq (%rdi), %rax -; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:movq (%rdi), %xmm0 ; CHECK-NEXT:retq %ret = load atomic <2 x ptr addrspace(270)>, ptr %x acquire, align 8 ret <2 x ptr addrspace(270)> %ret @@ -187,8 +189,7 @@ define <2 x ptr addrspace(270)> @atomic_vec2_ptr270(ptr %x) { define <2 x i32> @atomic_vec2_i32_align(ptr %x) { ; CHECK-LABEL: atomic_vec2_i32_align: ; CHECK: ## %bb.0: -; CHECK-NEXT:movq (%rdi), %rax -; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:movq (%rdi), %xmm0 ; CHECK-NEXT:retq %ret = load atomic <2 x i32>, ptr %x acquire, align 8 ret <2 x i32> %ret @@ -197,8 +198,7 @@ define <2 x i32> @atomic_vec2_i32_align(ptr %x) { define <2 x float> @atomic_vec2_float_align(ptr %x) { ; CHECK-LABEL: atomic_vec2_float_align: ; CHECK: ## %bb.0: -; CHECK-NEXT:movq (%rdi), %rax -; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:movq (%rdi), %xmm0 ; CHECK-NEXT:retq %ret = load atomic <2 x float>, ptr %x acquire, align 8 ret <2 x float> %ret @@ -354,11 +354,15 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { } define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { -; CHECK-LABEL: atomic_vec4_i8: -; CHECK: ## %bb.0: -; CHECK-NEXT:movl (%rdi), %eax -; CHECK-NEXT:movd %eax, %xmm0 -; CHECK-NEXT:retq +; CHECK3-LABEL: atomic_vec4_i8: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec4_i8: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movd {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK0-NEXT:retq %ret = load atomic <4 x i8>, ptr %x acquire, align 4 ret <4 x i8> %ret } @@ -366,8 +370,7 @@ define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
[llvm-branch-commits] [flang] [flang][fir] Basic PFT to MLIR lowering for do concurrent locality specifiers (PR #138534)
https://github.com/ergawy edited https://github.com/llvm/llvm-project/pull/138534 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][fir] Add `fir.local` op for locality specifiers (PR #138505)
https://github.com/ergawy edited https://github.com/llvm/llvm-project/pull/138505 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Improve StructurizeCFG pass performance by using SSAUpdaterBulk. (PR #135181)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/135181 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] Add missing imports (PR #138550)
https://github.com/FilipLaurentiu edited https://github.com/llvm/llvm-project/pull/138550 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] Add missing imports (PR #138550)
github-actions[bot] wrote: Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using `@` followed by their GitHub username. If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment โPingโ. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the [LLVM GitHub User Guide](https://llvm.org/docs/GitHub.html). You can also ask questions in a comment on this PR, on the [LLVM Discord](https://discord.com/invite/xS7Z362) or on the [forums](https://discourse.llvm.org/). https://github.com/llvm/llvm-project/pull/138550 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] Add missing imports (PR #138550)
https://github.com/FilipLaurentiu created https://github.com/llvm/llvm-project/pull/138550 Build fail without these improts >From 7a5195fd542f71142e4524f4d4720305bb14c2bb Mon Sep 17 00:00:00 2001 From: Filip Laurentiu Date: Mon, 5 May 2025 18:56:30 +0300 Subject: [PATCH] Add missing imports --- mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h | 1 + mlir/include/mlir/Target/SPIRV/Deserialization.h | 1 + mlir/include/mlir/Target/SPIRV/Serialization.h | 1 + 3 files changed, 3 insertions(+) diff --git a/mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h b/mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h index 451c466fa0c95..642e99d963ef6 100644 --- a/mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h +++ b/mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h @@ -10,6 +10,7 @@ #define MLIR_DIALECT_AFFINE_IR_VALUEBOUNDSOPINTERFACEIMPL_H #include "mlir/Support/LLVM.h" +#include namespace mlir { class DialectRegistry; diff --git a/mlir/include/mlir/Target/SPIRV/Deserialization.h b/mlir/include/mlir/Target/SPIRV/Deserialization.h index e39258beeaac8..a346a7fd1e5f7 100644 --- a/mlir/include/mlir/Target/SPIRV/Deserialization.h +++ b/mlir/include/mlir/Target/SPIRV/Deserialization.h @@ -15,6 +15,7 @@ #include "mlir/IR/OwningOpRef.h" #include "mlir/Support/LLVM.h" +#include namespace mlir { class MLIRContext; diff --git a/mlir/include/mlir/Target/SPIRV/Serialization.h b/mlir/include/mlir/Target/SPIRV/Serialization.h index 613f0a423f9f8..225777e25d607 100644 --- a/mlir/include/mlir/Target/SPIRV/Serialization.h +++ b/mlir/include/mlir/Target/SPIRV/Serialization.h @@ -14,6 +14,7 @@ #define MLIR_TARGET_SPIRV_SERIALIZATION_H #include "mlir/Support/LLVM.h" +#include namespace mlir { class MLIRContext; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] Add missing imports (PR #138550)
llvmbot wrote: @llvm/pr-subscribers-mlir @llvm/pr-subscribers-mlir-affine Author: Filip Laurentiu (FilipLaurentiu) Changes Build fail without these imports --- Full diff: https://github.com/llvm/llvm-project/pull/138550.diff 3 Files Affected: - (modified) mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h (+1) - (modified) mlir/include/mlir/Target/SPIRV/Deserialization.h (+1) - (modified) mlir/include/mlir/Target/SPIRV/Serialization.h (+1) ``diff diff --git a/mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h b/mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h index 451c466fa0c95..642e99d963ef6 100644 --- a/mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h +++ b/mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h @@ -10,6 +10,7 @@ #define MLIR_DIALECT_AFFINE_IR_VALUEBOUNDSOPINTERFACEIMPL_H #include "mlir/Support/LLVM.h" +#include namespace mlir { class DialectRegistry; diff --git a/mlir/include/mlir/Target/SPIRV/Deserialization.h b/mlir/include/mlir/Target/SPIRV/Deserialization.h index e39258beeaac8..a346a7fd1e5f7 100644 --- a/mlir/include/mlir/Target/SPIRV/Deserialization.h +++ b/mlir/include/mlir/Target/SPIRV/Deserialization.h @@ -15,6 +15,7 @@ #include "mlir/IR/OwningOpRef.h" #include "mlir/Support/LLVM.h" +#include namespace mlir { class MLIRContext; diff --git a/mlir/include/mlir/Target/SPIRV/Serialization.h b/mlir/include/mlir/Target/SPIRV/Serialization.h index 613f0a423f9f8..225777e25d607 100644 --- a/mlir/include/mlir/Target/SPIRV/Serialization.h +++ b/mlir/include/mlir/Target/SPIRV/Serialization.h @@ -14,6 +14,7 @@ #define MLIR_TARGET_SPIRV_SERIALIZATION_H #include "mlir/Support/LLVM.h" +#include namespace mlir { class MLIRContext; `` https://github.com/llvm/llvm-project/pull/138550 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG] Adaptation for FP operation lowering (PR #138553)
https://github.com/spavloff created https://github.com/llvm/llvm-project/pull/138553 FP operations listed in FloatingPointOps.def are now lowered to DAG in the same way as constrained intrinsics, using special DAG nodes like STRICT_NEARBYINT. This is a temporary solution. Existing nodes like STRICT_NEARBYINT cannot carry information about rounding or other control modes, so they cannot implement static rounding, for example. However they can serve as a first step toward a solution based on the FP operand bundles. >From ef14a9b6151e66ac2b7452d8d7958f55731f35ab Mon Sep 17 00:00:00 2001 From: Serge Pavlov Date: Mon, 28 Apr 2025 00:21:56 +0700 Subject: [PATCH] [SelectionDAG] Adaptation for FP operation lowering FP operations listed in FloatingPointOps.def are now lowered to DAG in the same way as constrained intrinsics, using special DAG nodes like STRICT_NEARBYINT. This is a temporary solution. Existing nodes like STRICT_NEARBYINT cannot carry information about rounding or other control modes, so they cannot implement static rounding, for example. However they can serve as a first step toward a solution based on the FP operand bundles. --- llvm/include/llvm/IR/FloatingPointOps.def | 7 ++- .../SelectionDAG/SelectionDAGBuilder.cpp | 62 +-- .../SelectionDAG/SelectionDAGBuilder.h| 1 + 3 files changed, 65 insertions(+), 5 deletions(-) diff --git a/llvm/include/llvm/IR/FloatingPointOps.def b/llvm/include/llvm/IR/FloatingPointOps.def index 468227e648300..6d653668fe340 100644 --- a/llvm/include/llvm/IR/FloatingPointOps.def +++ b/llvm/include/llvm/IR/FloatingPointOps.def @@ -14,12 +14,17 @@ #define FUNCTION(N,D) #endif +#ifndef CONSTRAINED +#define CONSTRAINED(N,D) FUNCTION(N,D) +#endif + // Arguments of the entries are: // - intrinsic function name, // - DAG node corresponding to the intrinsic. -FUNCTION(experimental_constrained_fadd, FADD) +CONSTRAINED(experimental_constrained_fadd, FADD) FUNCTION(nearbyint, FNEARBYINT) FUNCTION(trunc, FTRUNC) #undef FUNCTION +#undef CONSTRAINED diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp index 8cae34d06c8ba..cd97338cfaa93 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -6802,9 +6802,7 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, case Intrinsic::exp10: case Intrinsic::floor: case Intrinsic::ceil: - case Intrinsic::trunc: case Intrinsic::rint: - case Intrinsic::nearbyint: case Intrinsic::round: case Intrinsic::roundeven: case Intrinsic::canonicalize: { @@ -6826,9 +6824,7 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, case Intrinsic::exp10:Opcode = ISD::FEXP10;break; case Intrinsic::floor:Opcode = ISD::FFLOOR;break; case Intrinsic::ceil: Opcode = ISD::FCEIL; break; -case Intrinsic::trunc:Opcode = ISD::FTRUNC;break; case Intrinsic::rint: Opcode = ISD::FRINT; break; -case Intrinsic::nearbyint:Opcode = ISD::FNEARBYINT;break; case Intrinsic::round:Opcode = ISD::FROUND;break; case Intrinsic::roundeven:Opcode = ISD::FROUNDEVEN;break; case Intrinsic::canonicalize: Opcode = ISD::FCANONICALIZE; break; @@ -6959,6 +6955,11 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, #include "llvm/IR/ConstrainedOps.def" visitConstrainedFPIntrinsic(cast(I)); return; +#define CONSTRAINED(INTRINSIC, DAGN) +#define FUNCTION(INTRINSIC, DAGN) case Intrinsic::INTRINSIC: +#include "llvm/IR/FloatingPointOps.def" +visitFPOperationIntrinsic(I, Intrinsic); +return; #define BEGIN_REGISTER_VP_INTRINSIC(VPID, ...) case Intrinsic::VPID: #include "llvm/IR/VPIntrinsics.def" visitVectorPredicationIntrinsic(cast(I)); @@ -8350,6 +8351,59 @@ void SelectionDAGBuilder::visitConstrainedFPIntrinsic( setValue(&FPI, FPResult); } +void SelectionDAGBuilder::visitFPOperationIntrinsic(const CallInst &CI, +unsigned Intrinsic) { + SDLoc sdl = getCurSDLoc(); + bool StrictFP = + FuncInfo.Fn->getAttributes().hasFnAttr(llvm::Attribute::StrictFP); + + int Opcode = -1; + switch (Intrinsic) { +#define CONSTRAINED(NAME, DAGN) +#define FUNCTION(NAME, DAGN) \ + case Intrinsic::NAME: \ +Opcode = StrictFP ? ISD::STRICT_##DAGN : ISD::DAGN; \ +break; +#include "llvm/IR/FloatingPointOps.def" + } + + SDNodeFlags Flags; + if (CI.getExceptionBehavior() == fp::ExceptionBehavior::ebIgnore) +Flags.setNoFPExcept(true); + if (auto *FPOp = dyn_cast(&CI)) +Flags.copyFMF(*FPOp); + + SmallVector Operands;
[llvm-branch-commits] [llvm] [SelectionDAG] Adaptation for FP operation lowering (PR #138553)
llvmbot wrote: @llvm/pr-subscribers-llvm-selectiondag Author: Serge Pavlov (spavloff) Changes FP operations listed in FloatingPointOps.def are now lowered to DAG in the same way as constrained intrinsics, using special DAG nodes like STRICT_NEARBYINT. This is a temporary solution. Existing nodes like STRICT_NEARBYINT cannot carry information about rounding or other control modes, so they cannot implement static rounding, for example. However they can serve as a first step toward a solution based on the FP operand bundles. --- Full diff: https://github.com/llvm/llvm-project/pull/138553.diff 3 Files Affected: - (modified) llvm/include/llvm/IR/FloatingPointOps.def (+6-1) - (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+58-4) - (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h (+1) ``diff diff --git a/llvm/include/llvm/IR/FloatingPointOps.def b/llvm/include/llvm/IR/FloatingPointOps.def index 468227e648300..6d653668fe340 100644 --- a/llvm/include/llvm/IR/FloatingPointOps.def +++ b/llvm/include/llvm/IR/FloatingPointOps.def @@ -14,12 +14,17 @@ #define FUNCTION(N,D) #endif +#ifndef CONSTRAINED +#define CONSTRAINED(N,D) FUNCTION(N,D) +#endif + // Arguments of the entries are: // - intrinsic function name, // - DAG node corresponding to the intrinsic. -FUNCTION(experimental_constrained_fadd, FADD) +CONSTRAINED(experimental_constrained_fadd, FADD) FUNCTION(nearbyint, FNEARBYINT) FUNCTION(trunc, FTRUNC) #undef FUNCTION +#undef CONSTRAINED diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp index 8cae34d06c8ba..cd97338cfaa93 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -6802,9 +6802,7 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, case Intrinsic::exp10: case Intrinsic::floor: case Intrinsic::ceil: - case Intrinsic::trunc: case Intrinsic::rint: - case Intrinsic::nearbyint: case Intrinsic::round: case Intrinsic::roundeven: case Intrinsic::canonicalize: { @@ -6826,9 +6824,7 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, case Intrinsic::exp10:Opcode = ISD::FEXP10;break; case Intrinsic::floor:Opcode = ISD::FFLOOR;break; case Intrinsic::ceil: Opcode = ISD::FCEIL; break; -case Intrinsic::trunc:Opcode = ISD::FTRUNC;break; case Intrinsic::rint: Opcode = ISD::FRINT; break; -case Intrinsic::nearbyint:Opcode = ISD::FNEARBYINT;break; case Intrinsic::round:Opcode = ISD::FROUND;break; case Intrinsic::roundeven:Opcode = ISD::FROUNDEVEN;break; case Intrinsic::canonicalize: Opcode = ISD::FCANONICALIZE; break; @@ -6959,6 +6955,11 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, #include "llvm/IR/ConstrainedOps.def" visitConstrainedFPIntrinsic(cast(I)); return; +#define CONSTRAINED(INTRINSIC, DAGN) +#define FUNCTION(INTRINSIC, DAGN) case Intrinsic::INTRINSIC: +#include "llvm/IR/FloatingPointOps.def" +visitFPOperationIntrinsic(I, Intrinsic); +return; #define BEGIN_REGISTER_VP_INTRINSIC(VPID, ...) case Intrinsic::VPID: #include "llvm/IR/VPIntrinsics.def" visitVectorPredicationIntrinsic(cast(I)); @@ -8350,6 +8351,59 @@ void SelectionDAGBuilder::visitConstrainedFPIntrinsic( setValue(&FPI, FPResult); } +void SelectionDAGBuilder::visitFPOperationIntrinsic(const CallInst &CI, +unsigned Intrinsic) { + SDLoc sdl = getCurSDLoc(); + bool StrictFP = + FuncInfo.Fn->getAttributes().hasFnAttr(llvm::Attribute::StrictFP); + + int Opcode = -1; + switch (Intrinsic) { +#define CONSTRAINED(NAME, DAGN) +#define FUNCTION(NAME, DAGN) \ + case Intrinsic::NAME: \ +Opcode = StrictFP ? ISD::STRICT_##DAGN : ISD::DAGN; \ +break; +#include "llvm/IR/FloatingPointOps.def" + } + + SDNodeFlags Flags; + if (CI.getExceptionBehavior() == fp::ExceptionBehavior::ebIgnore) +Flags.setNoFPExcept(true); + if (auto *FPOp = dyn_cast(&CI)) +Flags.copyFMF(*FPOp); + + SmallVector Operands; + if (StrictFP) +Operands.push_back(DAG.getRoot()); + for (unsigned I = 0, E = CI.arg_size(); I != E; ++I) +Operands.push_back(getValue(CI.getArgOperand(I))); + + const TargetLowering &TLI = DAG.getTargetLoweringInfo(); + EVT VT = TLI.getValueType(DAG.getDataLayout(), CI.getType()); + SDVTList VTs = StrictFP ? DAG.getVTList(VT, MVT::Other) : DAG.getVTList(VT); + + SDValue Result = DAG.getNode(Opcode, sdl, VTs, Operands, Flags); + + SDValue OutChain; + if (StrictFP) { +OutChain = Result.getValue(1); +switch (CI.
[llvm-branch-commits] [flang] [Flang][OpenMP] Minimize host ops remaining in device compilation (PR #137200)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/137200 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [MLIR][OpenMP] Simplify OpenMP device codegen (PR #137201)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/137201 >From 22f22aa0ca2c98dfcc48a70f2f7e0a5b68d7b1d9 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Tue, 22 Apr 2025 12:04:45 +0100 Subject: [PATCH] [MLIR][OpenMP] Simplify OpenMP device codegen After removing host operations from the device MLIR module, it is no longer necessary to provide special codegen logic to prevent these operations from causing compiler crashes or miscompilations. This patch removes these now unnecessary code paths to simplify codegen logic. Some MLIR tests are now replaced with Flang tests, since the responsibility of dealing with host operations has been moved earlier in the compilation flow. MLIR tests holding target device modules are updated to no longer include now unsupported host operations. --- .../OpenMP/target-nesting-in-host-ops.f90 | 87 .../Integration/OpenMP/task-target-device.f90 | 37 ++ .../OpenMP/threadprivate-target-device.f90| 40 ++ .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 423 +++--- ...arget-constant-indexing-device-region.mlir | 25 +- .../Target/LLVMIR/omptarget-debug-var-1.mlir | 19 +- .../omptarget-memcpy-align-metadata.mlir | 61 +-- .../LLVMIR/omptarget-target-inside-task.mlir | 43 -- ...ptarget-threadprivate-device-lowering.mlir | 31 -- .../Target/LLVMIR/openmp-llvm-invalid.mlir| 45 ++ .../openmp-target-nesting-in-host-ops.mlir| 160 --- .../LLVMIR/openmp-task-target-device.mlir | 26 -- 12 files changed, 409 insertions(+), 588 deletions(-) create mode 100644 flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90 create mode 100644 flang/test/Integration/OpenMP/task-target-device.f90 create mode 100644 flang/test/Integration/OpenMP/threadprivate-target-device.f90 delete mode 100644 mlir/test/Target/LLVMIR/omptarget-target-inside-task.mlir delete mode 100644 mlir/test/Target/LLVMIR/omptarget-threadprivate-device-lowering.mlir delete mode 100644 mlir/test/Target/LLVMIR/openmp-target-nesting-in-host-ops.mlir delete mode 100644 mlir/test/Target/LLVMIR/openmp-task-target-device.mlir diff --git a/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90 b/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90 new file mode 100644 index 0..8c85a3c1784ed --- /dev/null +++ b/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90 @@ -0,0 +1,87 @@ +!===--===! +! This directory can be used to add Integration tests involving multiple +! stages of the compiler (for eg. from Fortran to LLVM IR). It should not +! contain executable tests. We should only add tests here sparingly and only +! if there is no other way to test. Repeat this message in each test that is +! added to this directory and sub-directories. +!===--===! + +!REQUIRES: amdgpu-registered-target +!RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-llvm -fopenmp -fopenmp-version=50 -fopenmp-is-target-device %s -o - | FileCheck %s + +! CHECK-NOT: define void @nested_target_in_parallel +! CHECK: define weak_odr protected amdgpu_kernel void @__omp_offloading_{{.*}}_nested_target_in_parallel_{{.*}}(ptr %{{.*}}, ptr %{{.*}}) +subroutine nested_target_in_parallel(v) + implicit none + integer, intent(inout) :: v(10) + + !$omp parallel +!$omp target map(tofrom: v) +!$omp end target + !$omp end parallel +end subroutine + +! CHECK-NOT: define void @nested_target_in_wsloop +! CHECK: define weak_odr protected amdgpu_kernel void @__omp_offloading_{{.*}}_nested_target_in_wsloop_{{.*}}(ptr %{{.*}}, ptr %{{.*}}) +subroutine nested_target_in_wsloop(v) + implicit none + integer, intent(inout) :: v(10) + integer :: i + + !$omp do + do i=1, 10 +!$omp target map(tofrom: v) +!$omp end target + end do +end subroutine + +! CHECK-NOT: define void @nested_target_in_parallel_with_private +! CHECK: define weak_odr protected amdgpu_kernel void @__omp_offloading_{{.*}}_nested_target_in_parallel_with_private_{{.*}}(ptr %{{.*}}, ptr %{{.*}}, ptr %{{.*}}) +subroutine nested_target_in_parallel_with_private(v) + implicit none + integer, intent(inout) :: v(10) + integer :: x + x = 10 + + !$omp parallel firstprivate(x) +!$omp target map(tofrom: v(1:x)) +!$omp end target + !$omp end parallel +end subroutine + +! CHECK-NOT: define void @nested_target_in_task_with_private +! CHECK: define weak_odr protected amdgpu_kernel void @__omp_offloading_{{.*}}_nested_target_in_task_with_private_{{.*}}(ptr %{{.*}}, ptr %{{.*}}, ptr %{{.*}}) +subroutine nested_target_in_task_with_private(v) + implicit none + integer, intent(inout) :: v(10) + integer :: x + x = 10 + + !$omp task firstprivate(x) +!$omp target map(tofrom: v(1:x)) +!$omp end target + !$omp end task +end subroutine + +! CHECK-NOT: define void @target_and_atomic_update +! CHECK: define weak_odr protec
[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)
@@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SetSplitVector(SDValue(N, ResNo), Lo, Hi); } +void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, + SDValue &Hi) { + EVT LoVT, HiVT; + SDLoc dl(LD); + std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0)); + + ISD::LoadExtType ExtType = LD->getExtensionType(); + SDValue Ch = LD->getChain(); + SDValue Ptr = LD->getBasePtr(); + EVT MemoryVT = LD->getMemoryVT(); + + EVT LoMemVT, HiMemVT; + std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT); arsenm wrote: These are unused. You at most should be bit casting the in memory type to the target integer memory type, there's no splitting https://github.com/llvm/llvm-project/pull/120640 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)
@@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SetSplitVector(SDValue(N, ResNo), Lo, Hi); } +void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, + SDValue &Hi) { + EVT LoVT, HiVT; + SDLoc dl(LD); + std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0)); + + ISD::LoadExtType ExtType = LD->getExtensionType(); + SDValue Ch = LD->getChain(); + SDValue Ptr = LD->getBasePtr(); + EVT MemoryVT = LD->getMemoryVT(); + + EVT LoMemVT, HiMemVT; + std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT); + + EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), LD->getValueType(0).getSizeInBits()); + EVT MemIntVT = EVT::getIntegerVT(*DAG.getContext(), LD->getMemoryVT().getSizeInBits()); + SDValue ALD = DAG.getAtomicLoad(ExtType, dl, MemIntVT, IntVT, Ch, Ptr, + LD->getMemOperand()); + + EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits()); + EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits()); + SDValue ExtractLo = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, LoIntVT, ALD, + DAG.getVectorIdxConstant(0, dl)); arsenm wrote: This isn't a vector extract, so don't use getVectorIdxConstant. I'm not actually sure what the type for the second operand is supposed to be for extract_element https://github.com/llvm/llvm-project/pull/120640 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)
@@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SetSplitVector(SDValue(N, ResNo), Lo, Hi); } +void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, + SDValue &Hi) { + EVT LoVT, HiVT; + SDLoc dl(LD); + std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0)); + + ISD::LoadExtType ExtType = LD->getExtensionType(); + SDValue Ch = LD->getChain(); + SDValue Ptr = LD->getBasePtr(); + EVT MemoryVT = LD->getMemoryVT(); + + EVT LoMemVT, HiMemVT; + std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT); + + EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), LD->getValueType(0).getSizeInBits()); + EVT MemIntVT = EVT::getIntegerVT(*DAG.getContext(), LD->getMemoryVT().getSizeInBits()); + SDValue ALD = DAG.getAtomicLoad(ExtType, dl, MemIntVT, IntVT, Ch, Ptr, + LD->getMemOperand()); arsenm wrote: Maybe should assert this isn't an extending load, although you could in principle handle it https://github.com/llvm/llvm-project/pull/120640 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][fir] Add locality specifiers modeling to `fir.do_concurrent.loop` (PR #138506)
https://github.com/ergawy updated https://github.com/llvm/llvm-project/pull/138506 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][fir] Basci lowering `fir.do_concurrent` locality specs to `fir.do_loop ... unordered` (PR #138512)
https://github.com/ergawy updated https://github.com/llvm/llvm-project/pull/138512 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][fir] Basic PFT to MLIR lowering for do concurrent locality specifiers (PR #138534)
https://github.com/ergawy updated https://github.com/llvm/llvm-project/pull/138534 >From 098df5a40cae3f7d514bcdb6579c7107ef74c18e Mon Sep 17 00:00:00 2001 From: ergawy Date: Mon, 5 May 2025 07:15:52 -0500 Subject: [PATCH] [flang][fir] Basic PFT to MLIR lowering for do concurrent locality specifiers Extends support for `fir.do_concurrent` locality specifiers to the PFT to MLIR level. This adds code-gen for generating the newly added `fir.local` ops and referencing these ops from `fir.do_concurrent.loop` ops that have locality specifiers attached to them. This reuses the `DataSharingProcessor` component and generalizes it a bit more to allow for handling `omp.private` ops and `fir.local` ops as well. --- flang/include/flang/Lower/AbstractConverter.h | 4 + .../include/flang/Optimizer/Dialect/FIROps.h | 4 + .../include/flang/Optimizer/Dialect/FIROps.td | 15 +++ flang/lib/Lower/Bridge.cpp| 59 -- .../lib/Lower/OpenMP/DataSharingProcessor.cpp | 104 +- flang/lib/Lower/OpenMP/DataSharingProcessor.h | 14 ++- .../Lower/do_concurrent_delayed_locality.f90 | 59 ++ 7 files changed, 219 insertions(+), 40 deletions(-) create mode 100644 flang/test/Lower/do_concurrent_delayed_locality.f90 diff --git a/flang/include/flang/Lower/AbstractConverter.h b/flang/include/flang/Lower/AbstractConverter.h index 1d1323642bf9c..8ae68e143cd2f 100644 --- a/flang/include/flang/Lower/AbstractConverter.h +++ b/flang/include/flang/Lower/AbstractConverter.h @@ -348,6 +348,10 @@ class AbstractConverter { virtual Fortran::lower::SymbolBox lookupOneLevelUpSymbol(const Fortran::semantics::Symbol &sym) = 0; + /// Find the symbol in the inner-most level of the local map or return null. + virtual Fortran::lower::SymbolBox + shallowLookupSymbol(const Fortran::semantics::Symbol &sym) = 0; + /// Return the mlir::SymbolTable associated to the ModuleOp. /// Look-ups are faster using it than using module.lookup<>, /// but the module op should be queried in case of failure diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.h b/flang/include/flang/Optimizer/Dialect/FIROps.h index 1bed227afb50d..62ef8b4b502f2 100644 --- a/flang/include/flang/Optimizer/Dialect/FIROps.h +++ b/flang/include/flang/Optimizer/Dialect/FIROps.h @@ -147,6 +147,10 @@ class CoordinateIndicesAdaptor { mlir::ValueRange values; }; +struct LocalitySpecifierOperands { + llvm::SmallVector<::mlir::Value> privateVars; + llvm::SmallVector<::mlir::Attribute> privateSyms; +}; } // namespace fir #endif // FORTRAN_OPTIMIZER_DIALECT_FIROPS_H diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td b/flang/include/flang/Optimizer/Dialect/FIROps.td index 01248aa0095ec..5c0fc95a47a8e 100644 --- a/flang/include/flang/Optimizer/Dialect/FIROps.td +++ b/flang/include/flang/Optimizer/Dialect/FIROps.td @@ -3600,6 +3600,21 @@ def fir_LocalitySpecifierOp : fir_Op<"local", [IsolatedFromAbove]> { ]; let extraClassDeclaration = [{ +mlir::BlockArgument getInitMoldArg() { + auto ยฎion = getInitRegion(); + return region.empty() ? nullptr : region.getArgument(0); +} +mlir::BlockArgument getInitPrivateArg() { + auto ยฎion = getInitRegion(); + return region.empty() ? nullptr : region.getArgument(1); +} + +/// Returns true if the init region might read from the mold argument +bool initReadsFromMold() { + mlir::BlockArgument moldArg = getInitMoldArg(); + return moldArg && !moldArg.use_empty(); +} + /// Get the type for arguments to nested regions. This should /// generally be either the same as getType() or some pointer /// type (pointing to the type allocated by this op). diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp index 0a61f61ab8f75..bf55402ec4714 100644 --- a/flang/lib/Lower/Bridge.cpp +++ b/flang/lib/Lower/Bridge.cpp @@ -12,6 +12,8 @@ #include "flang/Lower/Bridge.h" +#include "OpenMP/DataSharingProcessor.h" +#include "OpenMP/Utils.h" #include "flang/Lower/Allocatable.h" #include "flang/Lower/CallInterface.h" #include "flang/Lower/Coarray.h" @@ -1144,6 +1146,14 @@ class FirConverter : public Fortran::lower::AbstractConverter { return name; } + /// Find the symbol in the inner-most level of the local map or return null. + Fortran::lower::SymbolBox + shallowLookupSymbol(const Fortran::semantics::Symbol &sym) override { +if (Fortran::lower::SymbolBox v = localSymbols.shallowLookupSymbol(sym)) + return v; +return {}; + } + private: FirConverter() = delete; FirConverter(const FirConverter &) = delete; @@ -1218,14 +1228,6 @@ class FirConverter : public Fortran::lower::AbstractConverter { return {}; } - /// Find the symbol in the inner-most level of the local map or return null. - Fortran::lower::SymbolBox - shallowLookupSymbol(const Fortran::semantics::Symbol &sym) { -if (Fortran::lower::SymbolBox v = localSymbols.shallowLookupSymbol(sym))
[llvm-branch-commits] [llvm] [CodeGen][NPM] Port PostRAMachineSinking to NPM (PR #138497)
https://github.com/optimisan updated https://github.com/llvm/llvm-project/pull/138497 >From 72ba54e7458fa1b63a9deb01ae3b5131222f516b Mon Sep 17 00:00:00 2001 From: Akshat Oke Date: Mon, 5 May 2025 09:17:40 + Subject: [PATCH] [CodeGen][NPM] Port PostRAMachineSinking to NPM --- llvm/include/llvm/CodeGen/MachineSink.h | 11 +++ llvm/include/llvm/InitializePasses.h | 2 +- .../llvm/Passes/MachinePassRegistry.def | 2 +- llvm/lib/CodeGen/CodeGen.cpp | 2 +- llvm/lib/CodeGen/MachineSink.cpp | 31 +++ .../AArch64/bisect-post-ra-machine-sink.mir | 1 + llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll | 4 +-- .../CodeGen/AMDGPU/postra-machine-sink.mir| 1 + llvm/test/CodeGen/X86/pr38952.mir | 1 + 9 files changed, 44 insertions(+), 11 deletions(-) diff --git a/llvm/include/llvm/CodeGen/MachineSink.h b/llvm/include/llvm/CodeGen/MachineSink.h index 71bd7229b7598..eb9548dc82250 100644 --- a/llvm/include/llvm/CodeGen/MachineSink.h +++ b/llvm/include/llvm/CodeGen/MachineSink.h @@ -26,5 +26,16 @@ class MachineSinkingPass : public PassInfoMixin { function_ref MapClassName2PassName); }; +class PostRAMachineSinkingPass +: public PassInfoMixin { +public: + PreservedAnalyses run(MachineFunction &MF, MachineFunctionAnalysisManager &); + + MachineFunctionProperties getRequiredProperties() const { +return MachineFunctionProperties().set( +MachineFunctionProperties::Property::NoVRegs); + } +}; + } // namespace llvm #endif // LLVM_CODEGEN_MACHINESINK_H diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index 07dc86c6fccf2..e75f9c7a2cfe8 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -241,7 +241,7 @@ void initializePostDominatorTreeWrapperPassPass(PassRegistry &); void initializePostInlineEntryExitInstrumenterPass(PassRegistry &); void initializePostMachineSchedulerLegacyPass(PassRegistry &); void initializePostRAHazardRecognizerLegacyPass(PassRegistry &); -void initializePostRAMachineSinkingPass(PassRegistry &); +void initializePostRAMachineSinkingLegacyPass(PassRegistry &); void initializePostRASchedulerLegacyPass(PassRegistry &); void initializePreISelIntrinsicLoweringLegacyPassPass(PassRegistry &); void initializePrintFunctionPassWrapperPass(PassRegistry &); diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def b/llvm/include/llvm/Passes/MachinePassRegistry.def index 436b26852ce90..c6c00e8f25882 100644 --- a/llvm/include/llvm/Passes/MachinePassRegistry.def +++ b/llvm/include/llvm/Passes/MachinePassRegistry.def @@ -164,6 +164,7 @@ MACHINE_FUNCTION_PASS("phi-node-elimination", PHIEliminationPass()) MACHINE_FUNCTION_PASS("post-RA-hazard-rec", PostRAHazardRecognizerPass()) MACHINE_FUNCTION_PASS("post-RA-sched", PostRASchedulerPass(TM)) MACHINE_FUNCTION_PASS("postmisched", PostMachineSchedulerPass(TM)) +MACHINE_FUNCTION_PASS("postra-machine-sink", PostRAMachineSinkingPass()) MACHINE_FUNCTION_PASS("post-ra-pseudos", ExpandPostRAPseudosPass()) MACHINE_FUNCTION_PASS("print", PrintMIRPass()) MACHINE_FUNCTION_PASS("print", LiveDebugVariablesPrinterPass(errs())) @@ -315,7 +316,6 @@ DUMMY_MACHINE_FUNCTION_PASS("static-data-splitter", StaticDataSplitter) DUMMY_MACHINE_FUNCTION_PASS("machine-function-splitter", MachineFunctionSplitterPass) DUMMY_MACHINE_FUNCTION_PASS("machineinstr-printer", MachineFunctionPrinterPass) DUMMY_MACHINE_FUNCTION_PASS("mirfs-discriminators", MIRAddFSDiscriminatorsPass) -DUMMY_MACHINE_FUNCTION_PASS("postra-machine-sink", PostRAMachineSinkingPass) DUMMY_MACHINE_FUNCTION_PASS("processimpdefs", ProcessImplicitDefsPass) DUMMY_MACHINE_FUNCTION_PASS("prologepilog-code", PrologEpilogCodeInserterPass) DUMMY_MACHINE_FUNCTION_PASS("ra-basic", RABasicPass) diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp index aa3591cb6be58..065fd4704ccfb 100644 --- a/llvm/lib/CodeGen/CodeGen.cpp +++ b/llvm/lib/CodeGen/CodeGen.cpp @@ -107,7 +107,7 @@ void llvm::initializeCodeGen(PassRegistry &Registry) { initializePeepholeOptimizerLegacyPass(Registry); initializePostMachineSchedulerLegacyPass(Registry); initializePostRAHazardRecognizerLegacyPass(Registry); - initializePostRAMachineSinkingPass(Registry); + initializePostRAMachineSinkingLegacyPass(Registry); initializePostRASchedulerLegacyPass(Registry); initializePreISelIntrinsicLoweringLegacyPassPass(Registry); initializeProcessImplicitDefsPass(Registry); diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp index aa2987b6710a3..be1a3ac125c65 100644 --- a/llvm/lib/CodeGen/MachineSink.cpp +++ b/llvm/lib/CodeGen/MachineSink.cpp @@ -2068,12 +2068,12 @@ void MachineSinking::SalvageUnsunkDebugUsersOfCopy( //===--===// namespace { -class PostRAMachineSinking : public MachineFunctionPass {
[llvm-branch-commits] [llvm] [CodeGen][NPM] Port PostRAMachineSinking to NPM (PR #138497)
https://github.com/optimisan updated https://github.com/llvm/llvm-project/pull/138497 >From 72ba54e7458fa1b63a9deb01ae3b5131222f516b Mon Sep 17 00:00:00 2001 From: Akshat Oke Date: Mon, 5 May 2025 09:17:40 + Subject: [PATCH] [CodeGen][NPM] Port PostRAMachineSinking to NPM --- llvm/include/llvm/CodeGen/MachineSink.h | 11 +++ llvm/include/llvm/InitializePasses.h | 2 +- .../llvm/Passes/MachinePassRegistry.def | 2 +- llvm/lib/CodeGen/CodeGen.cpp | 2 +- llvm/lib/CodeGen/MachineSink.cpp | 31 +++ .../AArch64/bisect-post-ra-machine-sink.mir | 1 + llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll | 4 +-- .../CodeGen/AMDGPU/postra-machine-sink.mir| 1 + llvm/test/CodeGen/X86/pr38952.mir | 1 + 9 files changed, 44 insertions(+), 11 deletions(-) diff --git a/llvm/include/llvm/CodeGen/MachineSink.h b/llvm/include/llvm/CodeGen/MachineSink.h index 71bd7229b7598..eb9548dc82250 100644 --- a/llvm/include/llvm/CodeGen/MachineSink.h +++ b/llvm/include/llvm/CodeGen/MachineSink.h @@ -26,5 +26,16 @@ class MachineSinkingPass : public PassInfoMixin { function_ref MapClassName2PassName); }; +class PostRAMachineSinkingPass +: public PassInfoMixin { +public: + PreservedAnalyses run(MachineFunction &MF, MachineFunctionAnalysisManager &); + + MachineFunctionProperties getRequiredProperties() const { +return MachineFunctionProperties().set( +MachineFunctionProperties::Property::NoVRegs); + } +}; + } // namespace llvm #endif // LLVM_CODEGEN_MACHINESINK_H diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index 07dc86c6fccf2..e75f9c7a2cfe8 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -241,7 +241,7 @@ void initializePostDominatorTreeWrapperPassPass(PassRegistry &); void initializePostInlineEntryExitInstrumenterPass(PassRegistry &); void initializePostMachineSchedulerLegacyPass(PassRegistry &); void initializePostRAHazardRecognizerLegacyPass(PassRegistry &); -void initializePostRAMachineSinkingPass(PassRegistry &); +void initializePostRAMachineSinkingLegacyPass(PassRegistry &); void initializePostRASchedulerLegacyPass(PassRegistry &); void initializePreISelIntrinsicLoweringLegacyPassPass(PassRegistry &); void initializePrintFunctionPassWrapperPass(PassRegistry &); diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def b/llvm/include/llvm/Passes/MachinePassRegistry.def index 436b26852ce90..c6c00e8f25882 100644 --- a/llvm/include/llvm/Passes/MachinePassRegistry.def +++ b/llvm/include/llvm/Passes/MachinePassRegistry.def @@ -164,6 +164,7 @@ MACHINE_FUNCTION_PASS("phi-node-elimination", PHIEliminationPass()) MACHINE_FUNCTION_PASS("post-RA-hazard-rec", PostRAHazardRecognizerPass()) MACHINE_FUNCTION_PASS("post-RA-sched", PostRASchedulerPass(TM)) MACHINE_FUNCTION_PASS("postmisched", PostMachineSchedulerPass(TM)) +MACHINE_FUNCTION_PASS("postra-machine-sink", PostRAMachineSinkingPass()) MACHINE_FUNCTION_PASS("post-ra-pseudos", ExpandPostRAPseudosPass()) MACHINE_FUNCTION_PASS("print", PrintMIRPass()) MACHINE_FUNCTION_PASS("print", LiveDebugVariablesPrinterPass(errs())) @@ -315,7 +316,6 @@ DUMMY_MACHINE_FUNCTION_PASS("static-data-splitter", StaticDataSplitter) DUMMY_MACHINE_FUNCTION_PASS("machine-function-splitter", MachineFunctionSplitterPass) DUMMY_MACHINE_FUNCTION_PASS("machineinstr-printer", MachineFunctionPrinterPass) DUMMY_MACHINE_FUNCTION_PASS("mirfs-discriminators", MIRAddFSDiscriminatorsPass) -DUMMY_MACHINE_FUNCTION_PASS("postra-machine-sink", PostRAMachineSinkingPass) DUMMY_MACHINE_FUNCTION_PASS("processimpdefs", ProcessImplicitDefsPass) DUMMY_MACHINE_FUNCTION_PASS("prologepilog-code", PrologEpilogCodeInserterPass) DUMMY_MACHINE_FUNCTION_PASS("ra-basic", RABasicPass) diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp index aa3591cb6be58..065fd4704ccfb 100644 --- a/llvm/lib/CodeGen/CodeGen.cpp +++ b/llvm/lib/CodeGen/CodeGen.cpp @@ -107,7 +107,7 @@ void llvm::initializeCodeGen(PassRegistry &Registry) { initializePeepholeOptimizerLegacyPass(Registry); initializePostMachineSchedulerLegacyPass(Registry); initializePostRAHazardRecognizerLegacyPass(Registry); - initializePostRAMachineSinkingPass(Registry); + initializePostRAMachineSinkingLegacyPass(Registry); initializePostRASchedulerLegacyPass(Registry); initializePreISelIntrinsicLoweringLegacyPassPass(Registry); initializeProcessImplicitDefsPass(Registry); diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp index aa2987b6710a3..be1a3ac125c65 100644 --- a/llvm/lib/CodeGen/MachineSink.cpp +++ b/llvm/lib/CodeGen/MachineSink.cpp @@ -2068,12 +2068,12 @@ void MachineSinking::SalvageUnsunkDebugUsersOfCopy( //===--===// namespace { -class PostRAMachineSinking : public MachineFunctionPass {
[llvm-branch-commits] [llvm] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass (PR #138496)
https://github.com/optimisan updated https://github.com/llvm/llvm-project/pull/138496 >From 7b10c2ced561a9693f505e187bfb64f593e81562 Mon Sep 17 00:00:00 2001 From: Akshat Oke Date: Mon, 5 May 2025 08:58:58 + Subject: [PATCH] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass --- llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def | 1 + llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll | 6 +++--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def index 98a1147ef6d66..f408ced020543 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def +++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def @@ -106,6 +106,7 @@ MACHINE_FUNCTION_PASS("amdgpu-set-wave-priority", AMDGPUSetWavePriorityPass()) MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations", GCNPreRAOptimizationsPass()) MACHINE_FUNCTION_PASS("amdgpu-preload-kern-arg-prolog", AMDGPUPreloadKernArgPrologPass()) MACHINE_FUNCTION_PASS("amdgpu-nsa-reassign", GCNNSAReassignPass()) +MACHINE_FUNCTION_PASS("amdgpu-wait-sgpr-hazards", AMDGPUWaitSGPRHazardsPass()) MACHINE_FUNCTION_PASS("gcn-create-vopd", GCNCreateVOPDPass()) MACHINE_FUNCTION_PASS("gcn-dpp-combine", GCNDPPCombinePass()) MACHINE_FUNCTION_PASS("si-fix-sgpr-copies", SIFixSGPRCopiesPass()) diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll index e00b7ff83e322..468e4208c510a 100644 --- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll +++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll @@ -7,11 +7,11 @@ ; RUN: llc -O3 -enable-new-pm -mtriple=amdgcn--amdhsa -print-pipeline-passes < %s 2>&1 \ ; RUN: | FileCheck -check-prefix=GCN-O3 %s -; GCN-O0: require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,UnreachableBlockElimPass,ee-instrument,scalarize-masked-mem-intrin,ExpandReductionsPass,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,cgscc(function(lower-switch,lower-invoke,UnreachableBlockElimPass,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,AMDGPUWaitSGPRHazardsPass,branch-relaxation,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),invalidate)) +; GCN-O0: require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,UnreachableBlockElimPass,ee-instrument,scalarize-masked-mem-intrin,ExpandReductionsPass,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,cgscc(function(lower-switch,lower-invoke,UnreachableBlockElimPass,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,branch-relaxation,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),invalidate)) -; GCN-O2: require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-image-intrinsic-opt),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(infer-address-spaces,amdgpu-atomic-optimizer,atomic-expand,amdgpu-promote-alloca,separate-const-offset-
[llvm-branch-commits] [llvm] [CodeGen][NPM] Port InitUndef to NPM (PR #138495)
https://github.com/optimisan updated https://github.com/llvm/llvm-project/pull/138495 >From 2db3af07bf3894df69e0336e2c71c4704fd4fca8 Mon Sep 17 00:00:00 2001 From: Akshat Oke Date: Mon, 5 May 2025 08:47:42 + Subject: [PATCH 1/2] [CodeGen][NPM] Port InitUndef to NPM --- llvm/include/llvm/CodeGen/InitUndef.h | 24 + llvm/include/llvm/InitializePasses.h | 2 +- llvm/include/llvm/Passes/CodeGenPassBuilder.h | 1 + .../llvm/Passes/MachinePassRegistry.def | 2 +- llvm/lib/CodeGen/CodeGen.cpp | 2 +- llvm/lib/CodeGen/InitUndef.cpp| 50 +-- llvm/lib/Passes/PassBuilder.cpp | 1 + llvm/test/CodeGen/AArch64/init-undef.mir | 3 ++ llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll | 4 +- .../rvv/handle-noreg-with-implicit-def.mir| 2 + .../rvv/subregister-undef-early-clobber.mir | 1 + .../RISCV/rvv/undef-earlyclobber-chain.mir| 1 + 12 files changed, 73 insertions(+), 20 deletions(-) create mode 100644 llvm/include/llvm/CodeGen/InitUndef.h diff --git a/llvm/include/llvm/CodeGen/InitUndef.h b/llvm/include/llvm/CodeGen/InitUndef.h new file mode 100644 index 0..7274824a74905 --- /dev/null +++ b/llvm/include/llvm/CodeGen/InitUndef.h @@ -0,0 +1,24 @@ +//===- llvm/CodeGen/InitUndef.h *- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef LLVM_CODEGEN_INITUNDEF_H +#define LLVM_CODEGEN_INITUNDEF_H + +#include "llvm/CodeGen/MachinePassManager.h" + +namespace llvm { + +class InitUndefPass : public PassInfoMixin { +public: + PreservedAnalyses run(MachineFunction &MF, +MachineFunctionAnalysisManager &MFAM); +}; + +} // namespace llvm + +#endif // LLVM_CODEGEN_INITUNDEF_H diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index bff0526d4177a..07dc86c6fccf2 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -311,7 +311,7 @@ void initializeTargetTransformInfoWrapperPassPass(PassRegistry &); void initializeTwoAddressInstructionLegacyPassPass(PassRegistry &); void initializeTypeBasedAAWrapperPassPass(PassRegistry &); void initializeTypePromotionLegacyPass(PassRegistry &); -void initializeInitUndefPass(PassRegistry &); +void initializeInitUndefLegacyPass(PassRegistry &); void initializeUniformityInfoWrapperPassPass(PassRegistry &); void initializeUnifyLoopExitsLegacyPassPass(PassRegistry &); void initializeUnpackMachineBundlesPass(PassRegistry &); diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h b/llvm/include/llvm/Passes/CodeGenPassBuilder.h index 982bb16e71eab..351ef63af05c0 100644 --- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h +++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h @@ -43,6 +43,7 @@ #include "llvm/CodeGen/GlobalMerge.h" #include "llvm/CodeGen/GlobalMergeFunctions.h" #include "llvm/CodeGen/IndirectBrExpand.h" +#include "llvm/CodeGen/InitUndef.h" #include "llvm/CodeGen/InterleavedAccess.h" #include "llvm/CodeGen/InterleavedLoadCombine.h" #include "llvm/CodeGen/JMCInstrumenter.h" diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def b/llvm/include/llvm/Passes/MachinePassRegistry.def index c69573ee3ed97..436b26852ce90 100644 --- a/llvm/include/llvm/Passes/MachinePassRegistry.def +++ b/llvm/include/llvm/Passes/MachinePassRegistry.def @@ -148,6 +148,7 @@ MACHINE_FUNCTION_PASS("early-tailduplication", EarlyTailDuplicatePass()) MACHINE_FUNCTION_PASS("fentry-insert", FEntryInserterPass()) MACHINE_FUNCTION_PASS("finalize-isel", FinalizeISelPass()) MACHINE_FUNCTION_PASS("fixup-statepoint-caller-saved", FixupStatepointCallerSavedPass()) +MACHINE_FUNCTION_PASS("init-undef", InitUndefPass()) MACHINE_FUNCTION_PASS("localstackalloc", LocalStackSlotAllocationPass()) MACHINE_FUNCTION_PASS("machine-cp", MachineCopyPropagationPass()) MACHINE_FUNCTION_PASS("machine-cse", MachineCSEPass()) @@ -304,7 +305,6 @@ DUMMY_MACHINE_FUNCTION_PASS("fs-profile-loader", MIRProfileLoaderNewPass) DUMMY_MACHINE_FUNCTION_PASS("funclet-layout", FuncletLayoutPass) DUMMY_MACHINE_FUNCTION_PASS("gc-empty-basic-blocks", GCEmptyBasicBlocksPass) DUMMY_MACHINE_FUNCTION_PASS("implicit-null-checks", ImplicitNullChecksPass) -DUMMY_MACHINE_FUNCTION_PASS("init-undef-pass", InitUndefPass) DUMMY_MACHINE_FUNCTION_PASS("instruction-select", InstructionSelectPass) DUMMY_MACHINE_FUNCTION_PASS("irtranslator", IRTranslatorPass) DUMMY_MACHINE_FUNCTION_PASS("kcfi", MachineKCFIPass) diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp index 5250534d8a4e4..aa3591cb6be58 100644 --- a/llvm/lib/CodeGen/CodeGen.cpp +++ b/llvm/lib/CodeGen/CodeGen.cpp @@ -54,7 +54,7 @@ void llvm::initializeC
[llvm-branch-commits] [llvm] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass (PR #138496)
https://github.com/optimisan updated https://github.com/llvm/llvm-project/pull/138496 >From 7b10c2ced561a9693f505e187bfb64f593e81562 Mon Sep 17 00:00:00 2001 From: Akshat Oke Date: Mon, 5 May 2025 08:58:58 + Subject: [PATCH] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass --- llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def | 1 + llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll | 6 +++--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def index 98a1147ef6d66..f408ced020543 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def +++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def @@ -106,6 +106,7 @@ MACHINE_FUNCTION_PASS("amdgpu-set-wave-priority", AMDGPUSetWavePriorityPass()) MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations", GCNPreRAOptimizationsPass()) MACHINE_FUNCTION_PASS("amdgpu-preload-kern-arg-prolog", AMDGPUPreloadKernArgPrologPass()) MACHINE_FUNCTION_PASS("amdgpu-nsa-reassign", GCNNSAReassignPass()) +MACHINE_FUNCTION_PASS("amdgpu-wait-sgpr-hazards", AMDGPUWaitSGPRHazardsPass()) MACHINE_FUNCTION_PASS("gcn-create-vopd", GCNCreateVOPDPass()) MACHINE_FUNCTION_PASS("gcn-dpp-combine", GCNDPPCombinePass()) MACHINE_FUNCTION_PASS("si-fix-sgpr-copies", SIFixSGPRCopiesPass()) diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll index e00b7ff83e322..468e4208c510a 100644 --- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll +++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll @@ -7,11 +7,11 @@ ; RUN: llc -O3 -enable-new-pm -mtriple=amdgcn--amdhsa -print-pipeline-passes < %s 2>&1 \ ; RUN: | FileCheck -check-prefix=GCN-O3 %s -; GCN-O0: require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,UnreachableBlockElimPass,ee-instrument,scalarize-masked-mem-intrin,ExpandReductionsPass,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,cgscc(function(lower-switch,lower-invoke,UnreachableBlockElimPass,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,AMDGPUWaitSGPRHazardsPass,branch-relaxation,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),invalidate)) +; GCN-O0: require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,UnreachableBlockElimPass,ee-instrument,scalarize-masked-mem-intrin,ExpandReductionsPass,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,cgscc(function(lower-switch,lower-invoke,UnreachableBlockElimPass,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,branch-relaxation,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),invalidate)) -; GCN-O2: require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-image-intrinsic-opt),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(infer-address-spaces,amdgpu-atomic-optimizer,atomic-expand,amdgpu-promote-alloca,separate-const-offset-
[llvm-branch-commits] [llvm] [CodeGen][NPM] Port InitUndef to NPM (PR #138495)
https://github.com/optimisan ready_for_review https://github.com/llvm/llvm-project/pull/138495 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass (PR #138496)
https://github.com/optimisan ready_for_review https://github.com/llvm/llvm-project/pull/138496 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CodeGen][NPM] Port InitUndef to NPM (PR #138495)
llvmbot wrote: @llvm/pr-subscribers-backend-aarch64 Author: Akshat Oke (optimisan) Changes --- Patch is 24.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138495.diff 12 Files Affected: - (added) llvm/include/llvm/CodeGen/InitUndef.h (+24) - (modified) llvm/include/llvm/InitializePasses.h (+1-1) - (modified) llvm/include/llvm/Passes/CodeGenPassBuilder.h (+1) - (modified) llvm/include/llvm/Passes/MachinePassRegistry.def (+1-1) - (modified) llvm/lib/CodeGen/CodeGen.cpp (+1-1) - (modified) llvm/lib/CodeGen/InitUndef.cpp (+35-15) - (modified) llvm/lib/Passes/PassBuilder.cpp (+1) - (modified) llvm/test/CodeGen/AArch64/init-undef.mir (+3) - (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll (+2-2) - (modified) llvm/test/CodeGen/RISCV/rvv/handle-noreg-with-implicit-def.mir (+2) - (modified) llvm/test/CodeGen/RISCV/rvv/subregister-undef-early-clobber.mir (+1) - (modified) llvm/test/CodeGen/RISCV/rvv/undef-earlyclobber-chain.mir (+1) ``diff diff --git a/llvm/include/llvm/CodeGen/InitUndef.h b/llvm/include/llvm/CodeGen/InitUndef.h new file mode 100644 index 0..be1cf4bfc9872 --- /dev/null +++ b/llvm/include/llvm/CodeGen/InitUndef.h @@ -0,0 +1,24 @@ +//===- llvm/CodeGen/InitUndef.h -*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef LLVM_CODEGEN_INITUNDEF_H +#define LLVM_CODEGEN_INITUNDEF_H + +#include "llvm/CodeGen/MachinePassManager.h" + +namespace llvm { + +class InitUndefPass : public PassInfoMixin { +public: + PreservedAnalyses run(MachineFunction &MF, +MachineFunctionAnalysisManager &MFAM); +}; + +} // namespace llvm + +#endif // LLVM_CODEGEN_INITUNDEF_H diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index bff0526d4177a..07dc86c6fccf2 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -311,7 +311,7 @@ void initializeTargetTransformInfoWrapperPassPass(PassRegistry &); void initializeTwoAddressInstructionLegacyPassPass(PassRegistry &); void initializeTypeBasedAAWrapperPassPass(PassRegistry &); void initializeTypePromotionLegacyPass(PassRegistry &); -void initializeInitUndefPass(PassRegistry &); +void initializeInitUndefLegacyPass(PassRegistry &); void initializeUniformityInfoWrapperPassPass(PassRegistry &); void initializeUnifyLoopExitsLegacyPassPass(PassRegistry &); void initializeUnpackMachineBundlesPass(PassRegistry &); diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h b/llvm/include/llvm/Passes/CodeGenPassBuilder.h index 982bb16e71eab..351ef63af05c0 100644 --- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h +++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h @@ -43,6 +43,7 @@ #include "llvm/CodeGen/GlobalMerge.h" #include "llvm/CodeGen/GlobalMergeFunctions.h" #include "llvm/CodeGen/IndirectBrExpand.h" +#include "llvm/CodeGen/InitUndef.h" #include "llvm/CodeGen/InterleavedAccess.h" #include "llvm/CodeGen/InterleavedLoadCombine.h" #include "llvm/CodeGen/JMCInstrumenter.h" diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def b/llvm/include/llvm/Passes/MachinePassRegistry.def index c69573ee3ed97..436b26852ce90 100644 --- a/llvm/include/llvm/Passes/MachinePassRegistry.def +++ b/llvm/include/llvm/Passes/MachinePassRegistry.def @@ -148,6 +148,7 @@ MACHINE_FUNCTION_PASS("early-tailduplication", EarlyTailDuplicatePass()) MACHINE_FUNCTION_PASS("fentry-insert", FEntryInserterPass()) MACHINE_FUNCTION_PASS("finalize-isel", FinalizeISelPass()) MACHINE_FUNCTION_PASS("fixup-statepoint-caller-saved", FixupStatepointCallerSavedPass()) +MACHINE_FUNCTION_PASS("init-undef", InitUndefPass()) MACHINE_FUNCTION_PASS("localstackalloc", LocalStackSlotAllocationPass()) MACHINE_FUNCTION_PASS("machine-cp", MachineCopyPropagationPass()) MACHINE_FUNCTION_PASS("machine-cse", MachineCSEPass()) @@ -304,7 +305,6 @@ DUMMY_MACHINE_FUNCTION_PASS("fs-profile-loader", MIRProfileLoaderNewPass) DUMMY_MACHINE_FUNCTION_PASS("funclet-layout", FuncletLayoutPass) DUMMY_MACHINE_FUNCTION_PASS("gc-empty-basic-blocks", GCEmptyBasicBlocksPass) DUMMY_MACHINE_FUNCTION_PASS("implicit-null-checks", ImplicitNullChecksPass) -DUMMY_MACHINE_FUNCTION_PASS("init-undef-pass", InitUndefPass) DUMMY_MACHINE_FUNCTION_PASS("instruction-select", InstructionSelectPass) DUMMY_MACHINE_FUNCTION_PASS("irtranslator", IRTranslatorPass) DUMMY_MACHINE_FUNCTION_PASS("kcfi", MachineKCFIPass) diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp index 5250534d8a4e4..aa3591cb6be58 100644 --- a/llvm/lib/CodeGen/CodeGen.cpp +++ b/llvm/lib/CodeGen/CodeGen.cpp @@ -54,7 +54,7 @@ void llvm::initializeCodeGen(PassRegistry
[llvm-branch-commits] [llvm] [CodeGen][NPM] Port PostRAMachineSinking to NPM (PR #138497)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Akshat Oke (optimisan) Changes --- Patch is 22.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138497.diff 9 Files Affected: - (modified) llvm/include/llvm/CodeGen/MachineSink.h (+11) - (modified) llvm/include/llvm/InitializePasses.h (+1-1) - (modified) llvm/include/llvm/Passes/MachinePassRegistry.def (+1-1) - (modified) llvm/lib/CodeGen/CodeGen.cpp (+1-1) - (modified) llvm/lib/CodeGen/MachineSink.cpp (+25-6) - (modified) llvm/test/CodeGen/AArch64/bisect-post-ra-machine-sink.mir (+1) - (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll (+2-2) - (modified) llvm/test/CodeGen/AMDGPU/postra-machine-sink.mir (+1) - (modified) llvm/test/CodeGen/X86/pr38952.mir (+1) ``diff diff --git a/llvm/include/llvm/CodeGen/MachineSink.h b/llvm/include/llvm/CodeGen/MachineSink.h index 71bd7229b7598..eb9548dc82250 100644 --- a/llvm/include/llvm/CodeGen/MachineSink.h +++ b/llvm/include/llvm/CodeGen/MachineSink.h @@ -26,5 +26,16 @@ class MachineSinkingPass : public PassInfoMixin { function_ref MapClassName2PassName); }; +class PostRAMachineSinkingPass +: public PassInfoMixin { +public: + PreservedAnalyses run(MachineFunction &MF, MachineFunctionAnalysisManager &); + + MachineFunctionProperties getRequiredProperties() const { +return MachineFunctionProperties().set( +MachineFunctionProperties::Property::NoVRegs); + } +}; + } // namespace llvm #endif // LLVM_CODEGEN_MACHINESINK_H diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index 07dc86c6fccf2..e75f9c7a2cfe8 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -241,7 +241,7 @@ void initializePostDominatorTreeWrapperPassPass(PassRegistry &); void initializePostInlineEntryExitInstrumenterPass(PassRegistry &); void initializePostMachineSchedulerLegacyPass(PassRegistry &); void initializePostRAHazardRecognizerLegacyPass(PassRegistry &); -void initializePostRAMachineSinkingPass(PassRegistry &); +void initializePostRAMachineSinkingLegacyPass(PassRegistry &); void initializePostRASchedulerLegacyPass(PassRegistry &); void initializePreISelIntrinsicLoweringLegacyPassPass(PassRegistry &); void initializePrintFunctionPassWrapperPass(PassRegistry &); diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def b/llvm/include/llvm/Passes/MachinePassRegistry.def index 436b26852ce90..c6c00e8f25882 100644 --- a/llvm/include/llvm/Passes/MachinePassRegistry.def +++ b/llvm/include/llvm/Passes/MachinePassRegistry.def @@ -164,6 +164,7 @@ MACHINE_FUNCTION_PASS("phi-node-elimination", PHIEliminationPass()) MACHINE_FUNCTION_PASS("post-RA-hazard-rec", PostRAHazardRecognizerPass()) MACHINE_FUNCTION_PASS("post-RA-sched", PostRASchedulerPass(TM)) MACHINE_FUNCTION_PASS("postmisched", PostMachineSchedulerPass(TM)) +MACHINE_FUNCTION_PASS("postra-machine-sink", PostRAMachineSinkingPass()) MACHINE_FUNCTION_PASS("post-ra-pseudos", ExpandPostRAPseudosPass()) MACHINE_FUNCTION_PASS("print", PrintMIRPass()) MACHINE_FUNCTION_PASS("print", LiveDebugVariablesPrinterPass(errs())) @@ -315,7 +316,6 @@ DUMMY_MACHINE_FUNCTION_PASS("static-data-splitter", StaticDataSplitter) DUMMY_MACHINE_FUNCTION_PASS("machine-function-splitter", MachineFunctionSplitterPass) DUMMY_MACHINE_FUNCTION_PASS("machineinstr-printer", MachineFunctionPrinterPass) DUMMY_MACHINE_FUNCTION_PASS("mirfs-discriminators", MIRAddFSDiscriminatorsPass) -DUMMY_MACHINE_FUNCTION_PASS("postra-machine-sink", PostRAMachineSinkingPass) DUMMY_MACHINE_FUNCTION_PASS("processimpdefs", ProcessImplicitDefsPass) DUMMY_MACHINE_FUNCTION_PASS("prologepilog-code", PrologEpilogCodeInserterPass) DUMMY_MACHINE_FUNCTION_PASS("ra-basic", RABasicPass) diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp index aa3591cb6be58..065fd4704ccfb 100644 --- a/llvm/lib/CodeGen/CodeGen.cpp +++ b/llvm/lib/CodeGen/CodeGen.cpp @@ -107,7 +107,7 @@ void llvm::initializeCodeGen(PassRegistry &Registry) { initializePeepholeOptimizerLegacyPass(Registry); initializePostMachineSchedulerLegacyPass(Registry); initializePostRAHazardRecognizerLegacyPass(Registry); - initializePostRAMachineSinkingPass(Registry); + initializePostRAMachineSinkingLegacyPass(Registry); initializePostRASchedulerLegacyPass(Registry); initializePreISelIntrinsicLoweringLegacyPassPass(Registry); initializeProcessImplicitDefsPass(Registry); diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp index aa2987b6710a3..be1a3ac125c65 100644 --- a/llvm/lib/CodeGen/MachineSink.cpp +++ b/llvm/lib/CodeGen/MachineSink.cpp @@ -2068,12 +2068,12 @@ void MachineSinking::SalvageUnsunkDebugUsersOfCopy( //===--===// namespace { -class PostRAMachineSinking : public MachineFunctionPass { +class PostRAMachineSinkingLegacy
[llvm-branch-commits] [llvm] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass (PR #138496)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Akshat Oke (optimisan) Changes --- Full diff: https://github.com/llvm/llvm-project/pull/138496.diff 2 Files Affected: - (modified) llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def (+1) - (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll (+3-3) ``diff diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def index 98a1147ef6d66..f408ced020543 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def +++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def @@ -106,6 +106,7 @@ MACHINE_FUNCTION_PASS("amdgpu-set-wave-priority", AMDGPUSetWavePriorityPass()) MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations", GCNPreRAOptimizationsPass()) MACHINE_FUNCTION_PASS("amdgpu-preload-kern-arg-prolog", AMDGPUPreloadKernArgPrologPass()) MACHINE_FUNCTION_PASS("amdgpu-nsa-reassign", GCNNSAReassignPass()) +MACHINE_FUNCTION_PASS("amdgpu-wait-sgpr-hazards", AMDGPUWaitSGPRHazardsPass()) MACHINE_FUNCTION_PASS("gcn-create-vopd", GCNCreateVOPDPass()) MACHINE_FUNCTION_PASS("gcn-dpp-combine", GCNDPPCombinePass()) MACHINE_FUNCTION_PASS("si-fix-sgpr-copies", SIFixSGPRCopiesPass()) diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll index e00b7ff83e322..468e4208c510a 100644 --- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll +++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll @@ -7,11 +7,11 @@ ; RUN: llc -O3 -enable-new-pm -mtriple=amdgcn--amdhsa -print-pipeline-passes < %s 2>&1 \ ; RUN: | FileCheck -check-prefix=GCN-O3 %s -; GCN-O0: require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,UnreachableBlockElimPass,ee-instrument,scalarize-masked-mem-intrin,ExpandReductionsPass,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,cgscc(function(lower-switch,lower-invoke,UnreachableBlockElimPass,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,AMDGPUWaitSGPRHazardsPass,branch-relaxation,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),invalidate)) +; GCN-O0: require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,UnreachableBlockElimPass,ee-instrument,scalarize-masked-mem-intrin,ExpandReductionsPass,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,cgscc(function(lower-switch,lower-invoke,UnreachableBlockElimPass,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,branch-relaxation,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),invalidate)) -; GCN-O2: require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-image-intrinsic-opt),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(infer-address-spaces,amdgpu-atomic-optimizer,atomic-expand,amdgpu-promote-alloca,separate-const-offset-from-gep<>,slsr,early-cse<>,nary-reassociate,early-cse<>,amdgpu-codegenprepare,verify,loop-mssa(loop-reduce),me
[llvm-branch-commits] [llvm] [CodeGen][NPM] Port PostRAMachineSinking to NPM (PR #138497)
https://github.com/optimisan ready_for_review https://github.com/llvm/llvm-project/pull/138497 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][NPM] Complete optimized regalloc pipeline (PR #138491)
https://github.com/optimisan updated https://github.com/llvm/llvm-project/pull/138491 >From fec3016e7cfaefd3fb66a7ec3e9c9a09085e2d49 Mon Sep 17 00:00:00 2001 From: Akshat Oke Date: Mon, 5 May 2025 06:30:03 + Subject: [PATCH] [AMDGPU][NPM] Complete optimized regalloc pipeline Also fill in some other passes. --- llvm/include/llvm/Passes/CodeGenPassBuilder.h | 2 +- .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 41 +-- llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h | 1 + llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll | 7 +--- 4 files changed, 42 insertions(+), 9 deletions(-) diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h b/llvm/include/llvm/Passes/CodeGenPassBuilder.h index ddd258c21f593..982bb16e71eab 100644 --- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h +++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h @@ -574,7 +574,7 @@ template class CodeGenPassBuilder { /// Insert InsertedPass pass after TargetPass pass. /// Only machine function passes are supported. template - void insertPass(InsertedPassT &&Pass) { + void insertPass(InsertedPassT &&Pass) const { AfterCallbacks.emplace_back( [&](StringRef Name, MachineFunctionPassManager &MFPM) mutable { if (Name == TargetPassT::name()) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp index 56f808a553388..076440f869cd0 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp @@ -2162,7 +2162,44 @@ void AMDGPUCodeGenPassBuilder::addMachineSSAOptimization( addPass(SIShrinkInstructionsPass()); } +void AMDGPUCodeGenPassBuilder::addOptimizedRegAlloc( +AddMachinePass &addPass) const { + if (EnableDCEInRA) +insertPass(DeadMachineInstructionElimPass()); + + // FIXME: when an instruction has a Killed operand, and the instruction is + // inside a bundle, seems only the BUNDLE instruction appears as the Kills of + // the register in LiveVariables, this would trigger a failure in verifier, + // we should fix it and enable the verifier. + if (OptVGPRLiveRange) +insertPass>( +SIOptimizeVGPRLiveRangePass()); + + // This must be run immediately after phi elimination and before + // TwoAddressInstructions, otherwise the processing of the tied operand of + // SI_ELSE will introduce a copy of the tied operand source after the else. + insertPass(SILowerControlFlowPass()); + + if (EnableRewritePartialRegUses) +insertPass(GCNRewritePartialRegUsesPass()); + + if (isPassEnabled(EnablePreRAOptimizations)) +insertPass(GCNPreRAOptimizationsPass()); + // Allow the scheduler to run before SIWholeQuadMode inserts exec manipulation + // instructions that cause scheduling barriers. + insertPass(SIWholeQuadModePass()); + + if (OptExecMaskPreRA) +insertPass(SIOptimizeExecMaskingPreRAPass()); + + // This is not an essential optimization and it has a noticeable impact on + // compilation time, so we only enable it from O2. + if (TM.getOptLevel() > CodeGenOptLevel::Less) +insertPass(SIFormMemoryClausesPass()); + + Base::addOptimizedRegAlloc(addPass); +} Error AMDGPUCodeGenPassBuilder::addRegAssignmentOptimized( AddMachinePass &addPass) const { @@ -2190,21 +2227,19 @@ Error AMDGPUCodeGenPassBuilder::addRegAssignmentOptimized( addPass(SIPreAllocateWWMRegsPass()); // For allocating other wwm register operands. - // addRegAlloc(addPass, RegAllocPhase::WWM); addPass(RAGreedyPass({onlyAllocateWWMRegs, "wwm"})); addPass(SILowerWWMCopiesPass()); addPass(VirtRegRewriterPass(false)); addPass(AMDGPUReserveWWMRegsPass()); // For allocating per-thread VGPRs. - // addRegAlloc(addPass, RegAllocPhase::VGPR); addPass(RAGreedyPass({onlyAllocateVGPRs, "vgpr"})); addPreRewrite(addPass); addPass(VirtRegRewriterPass(true)); - // TODO: addPass(AMDGPUMarkLastScratchLoadPass()); + addPass(AMDGPUMarkLastScratchLoadPass()); return Error::success(); } diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h index 589123274d0f5..3c62cd19c6e57 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h @@ -182,6 +182,7 @@ class AMDGPUCodeGenPassBuilder void addPostRegAlloc(AddMachinePass &) const; void addPreEmitPass(AddMachinePass &) const; Error addRegAssignmentOptimized(AddMachinePass &) const; + void addOptimizedRegAlloc(AddMachinePass &) const; /// Check if a pass is enabled given \p Opt option. The option always /// overrides defaults if explicitly used. Otherwise its default will be used diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll index e9b57515e71e0..91c15565762de 100644 --- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll +++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll @@ -7,14 +7,11 @@ ; RUN: llc -O3 -enable-new-pm -mtrip
[llvm-branch-commits] [llvm] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass (PR #138496)
https://github.com/optimisan created https://github.com/llvm/llvm-project/pull/138496 None >From bd51fd929097dba2593a6a0f3e7cb7982ff75b57 Mon Sep 17 00:00:00 2001 From: Akshat Oke Date: Mon, 5 May 2025 08:58:58 + Subject: [PATCH] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass --- llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def | 1 + llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll | 6 +++--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def index 98a1147ef6d66..f408ced020543 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def +++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def @@ -106,6 +106,7 @@ MACHINE_FUNCTION_PASS("amdgpu-set-wave-priority", AMDGPUSetWavePriorityPass()) MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations", GCNPreRAOptimizationsPass()) MACHINE_FUNCTION_PASS("amdgpu-preload-kern-arg-prolog", AMDGPUPreloadKernArgPrologPass()) MACHINE_FUNCTION_PASS("amdgpu-nsa-reassign", GCNNSAReassignPass()) +MACHINE_FUNCTION_PASS("amdgpu-wait-sgpr-hazards", AMDGPUWaitSGPRHazardsPass()) MACHINE_FUNCTION_PASS("gcn-create-vopd", GCNCreateVOPDPass()) MACHINE_FUNCTION_PASS("gcn-dpp-combine", GCNDPPCombinePass()) MACHINE_FUNCTION_PASS("si-fix-sgpr-copies", SIFixSGPRCopiesPass()) diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll index e00b7ff83e322..468e4208c510a 100644 --- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll +++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll @@ -7,11 +7,11 @@ ; RUN: llc -O3 -enable-new-pm -mtriple=amdgcn--amdhsa -print-pipeline-passes < %s 2>&1 \ ; RUN: | FileCheck -check-prefix=GCN-O3 %s -; GCN-O0: require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,UnreachableBlockElimPass,ee-instrument,scalarize-masked-mem-intrin,ExpandReductionsPass,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,cgscc(function(lower-switch,lower-invoke,UnreachableBlockElimPass,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,AMDGPUWaitSGPRHazardsPass,branch-relaxation,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),invalidate)) +; GCN-O0: require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,UnreachableBlockElimPass,ee-instrument,scalarize-masked-mem-intrin,ExpandReductionsPass,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,cgscc(function(lower-switch,lower-invoke,UnreachableBlockElimPass,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,branch-relaxation,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),invalidate)) -; GCN-O2: require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-image-intrinsic-opt),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(infer-address-spaces,amdgpu-atomic-optimizer,atomic-expand,amdgpu-promote-alloca,separate-const-o
[llvm-branch-commits] [llvm] [CodeGen][NPM] Port PostRAMachineSinking to NPM (PR #138497)
https://github.com/optimisan created https://github.com/llvm/llvm-project/pull/138497 None Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CodeGen][NPM] Port InitUndef to NPM (PR #138495)
https://github.com/optimisan created https://github.com/llvm/llvm-project/pull/138495 None >From 2db3af07bf3894df69e0336e2c71c4704fd4fca8 Mon Sep 17 00:00:00 2001 From: Akshat Oke Date: Mon, 5 May 2025 08:47:42 + Subject: [PATCH] [CodeGen][NPM] Port InitUndef to NPM --- llvm/include/llvm/CodeGen/InitUndef.h | 24 + llvm/include/llvm/InitializePasses.h | 2 +- llvm/include/llvm/Passes/CodeGenPassBuilder.h | 1 + .../llvm/Passes/MachinePassRegistry.def | 2 +- llvm/lib/CodeGen/CodeGen.cpp | 2 +- llvm/lib/CodeGen/InitUndef.cpp| 50 +-- llvm/lib/Passes/PassBuilder.cpp | 1 + llvm/test/CodeGen/AArch64/init-undef.mir | 3 ++ llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll | 4 +- .../rvv/handle-noreg-with-implicit-def.mir| 2 + .../rvv/subregister-undef-early-clobber.mir | 1 + .../RISCV/rvv/undef-earlyclobber-chain.mir| 1 + 12 files changed, 73 insertions(+), 20 deletions(-) create mode 100644 llvm/include/llvm/CodeGen/InitUndef.h diff --git a/llvm/include/llvm/CodeGen/InitUndef.h b/llvm/include/llvm/CodeGen/InitUndef.h new file mode 100644 index 0..7274824a74905 --- /dev/null +++ b/llvm/include/llvm/CodeGen/InitUndef.h @@ -0,0 +1,24 @@ +//===- llvm/CodeGen/InitUndef.h *- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef LLVM_CODEGEN_INITUNDEF_H +#define LLVM_CODEGEN_INITUNDEF_H + +#include "llvm/CodeGen/MachinePassManager.h" + +namespace llvm { + +class InitUndefPass : public PassInfoMixin { +public: + PreservedAnalyses run(MachineFunction &MF, +MachineFunctionAnalysisManager &MFAM); +}; + +} // namespace llvm + +#endif // LLVM_CODEGEN_INITUNDEF_H diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index bff0526d4177a..07dc86c6fccf2 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -311,7 +311,7 @@ void initializeTargetTransformInfoWrapperPassPass(PassRegistry &); void initializeTwoAddressInstructionLegacyPassPass(PassRegistry &); void initializeTypeBasedAAWrapperPassPass(PassRegistry &); void initializeTypePromotionLegacyPass(PassRegistry &); -void initializeInitUndefPass(PassRegistry &); +void initializeInitUndefLegacyPass(PassRegistry &); void initializeUniformityInfoWrapperPassPass(PassRegistry &); void initializeUnifyLoopExitsLegacyPassPass(PassRegistry &); void initializeUnpackMachineBundlesPass(PassRegistry &); diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h b/llvm/include/llvm/Passes/CodeGenPassBuilder.h index 982bb16e71eab..351ef63af05c0 100644 --- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h +++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h @@ -43,6 +43,7 @@ #include "llvm/CodeGen/GlobalMerge.h" #include "llvm/CodeGen/GlobalMergeFunctions.h" #include "llvm/CodeGen/IndirectBrExpand.h" +#include "llvm/CodeGen/InitUndef.h" #include "llvm/CodeGen/InterleavedAccess.h" #include "llvm/CodeGen/InterleavedLoadCombine.h" #include "llvm/CodeGen/JMCInstrumenter.h" diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def b/llvm/include/llvm/Passes/MachinePassRegistry.def index c69573ee3ed97..436b26852ce90 100644 --- a/llvm/include/llvm/Passes/MachinePassRegistry.def +++ b/llvm/include/llvm/Passes/MachinePassRegistry.def @@ -148,6 +148,7 @@ MACHINE_FUNCTION_PASS("early-tailduplication", EarlyTailDuplicatePass()) MACHINE_FUNCTION_PASS("fentry-insert", FEntryInserterPass()) MACHINE_FUNCTION_PASS("finalize-isel", FinalizeISelPass()) MACHINE_FUNCTION_PASS("fixup-statepoint-caller-saved", FixupStatepointCallerSavedPass()) +MACHINE_FUNCTION_PASS("init-undef", InitUndefPass()) MACHINE_FUNCTION_PASS("localstackalloc", LocalStackSlotAllocationPass()) MACHINE_FUNCTION_PASS("machine-cp", MachineCopyPropagationPass()) MACHINE_FUNCTION_PASS("machine-cse", MachineCSEPass()) @@ -304,7 +305,6 @@ DUMMY_MACHINE_FUNCTION_PASS("fs-profile-loader", MIRProfileLoaderNewPass) DUMMY_MACHINE_FUNCTION_PASS("funclet-layout", FuncletLayoutPass) DUMMY_MACHINE_FUNCTION_PASS("gc-empty-basic-blocks", GCEmptyBasicBlocksPass) DUMMY_MACHINE_FUNCTION_PASS("implicit-null-checks", ImplicitNullChecksPass) -DUMMY_MACHINE_FUNCTION_PASS("init-undef-pass", InitUndefPass) DUMMY_MACHINE_FUNCTION_PASS("instruction-select", InstructionSelectPass) DUMMY_MACHINE_FUNCTION_PASS("irtranslator", IRTranslatorPass) DUMMY_MACHINE_FUNCTION_PASS("kcfi", MachineKCFIPass) diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp index 5250534d8a4e4..aa3591cb6be58 100644 --- a/llvm/lib/CodeGen/CodeGen.cpp +++ b/llvm/lib/CodeGen/CodeGen.cpp @@ -54,7 +54,7 @@ void llvm::initializ
[llvm-branch-commits] [llvm] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass (PR #138496)
optimisan wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/138496?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#138497** https://app.graphite.dev/github/pr/llvm/llvm-project/138497?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#138496** https://app.graphite.dev/github/pr/llvm/llvm-project/138496?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> ๐ https://app.graphite.dev/github/pr/llvm/llvm-project/138496?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#138495** https://app.graphite.dev/github/pr/llvm/llvm-project/138495?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#138491** https://app.graphite.dev/github/pr/llvm/llvm-project/138491?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#136818** https://app.graphite.dev/github/pr/llvm/llvm-project/136818?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/138496 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CodeGen][NPM] Port InitUndef to NPM (PR #138495)
optimisan wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/138495?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#138497** https://app.graphite.dev/github/pr/llvm/llvm-project/138497?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#138496** https://app.graphite.dev/github/pr/llvm/llvm-project/138496?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#138495** https://app.graphite.dev/github/pr/llvm/llvm-project/138495?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> ๐ https://app.graphite.dev/github/pr/llvm/llvm-project/138495?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#138491** https://app.graphite.dev/github/pr/llvm/llvm-project/138491?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#136818** https://app.graphite.dev/github/pr/llvm/llvm-project/136818?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/138495 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CodeGen][NPM] Port PostRAMachineSinking to NPM (PR #138497)
optimisan wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/138497?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#138497** https://app.graphite.dev/github/pr/llvm/llvm-project/138497?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> ๐ https://app.graphite.dev/github/pr/llvm/llvm-project/138497?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#138496** https://app.graphite.dev/github/pr/llvm/llvm-project/138496?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#138495** https://app.graphite.dev/github/pr/llvm/llvm-project/138495?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#138491** https://app.graphite.dev/github/pr/llvm/llvm-project/138491?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#136818** https://app.graphite.dev/github/pr/llvm/llvm-project/136818?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/138497 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits