date:20250505

[llvm-branch-commits] [flang] [flang][OpenMP] Update `do concurrent` mapping pass to use `fir.do_concurrent` op (PR #138489)

2025-05-05 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)


Changes

This PR updates the `do concurrent` to OpenMP mapping pass to use the newly 
added `fir.do_concurrent` ops that were recently added upstream instead of 
handling nests of `fir.do_loop ... unordered` ops.

Parent PR: https://github.com/llvm/llvm-project/pull/137928.

---

Patch is 34.21 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/138489.diff


9 Files Affected:

- (modified) flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp (+84-278) 
- (modified) flang/test/Transforms/DoConcurrent/basic_device.mlir (+13-11) 
- (modified) flang/test/Transforms/DoConcurrent/basic_host.f90 (-3) 
- (modified) flang/test/Transforms/DoConcurrent/basic_host.mlir (+14-12) 
- (modified) flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90 (-3) 
- (removed) flang/test/Transforms/DoConcurrent/loop_nest_test.f90 (-92) 
- (modified) flang/test/Transforms/DoConcurrent/multiple_iteration_ranges.f90 
(-3) 
- (modified) flang/test/Transforms/DoConcurrent/non_const_bounds.f90 (-3) 
- (modified) flang/test/Transforms/DoConcurrent/not_perfectly_nested.f90 
(+11-13) 


``diff
diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp 
b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
index 2c069860ffdca..0fdb302fe10ca 100644
--- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
@@ -6,6 +6,7 @@
 //
 
//===--===//
 
+#include "flang/Optimizer/Builder/FIRBuilder.h"
 #include "flang/Optimizer/Dialect/FIROps.h"
 #include "flang/Optimizer/OpenMP/Passes.h"
 #include "flang/Optimizer/OpenMP/Utils.h"
@@ -28,8 +29,10 @@ namespace looputils {
 /// Stores info needed about the induction/iteration variable for each `do
 /// concurrent` in a loop nest.
 struct InductionVariableInfo {
-  InductionVariableInfo(fir::DoLoopOp doLoop) { populateInfo(doLoop); }
-
+  InductionVariableInfo(fir::DoConcurrentLoopOp loop,
+mlir::Value inductionVar) {
+populateInfo(loop, inductionVar);
+  }
   /// The operation allocating memory for iteration variable.
   mlir::Operation *iterVarMemDef;
   /// the operation(s) updating the iteration variable with the current
@@ -45,7 +48,7 @@ struct InductionVariableInfo {
   ///   ...
   ///   %i:2 = hlfir.declare %0 {uniq_name = "_QFEi"} : ...
   ///   ...
-  ///   fir.do_loop %ind_var = %lb to %ub step %s unordered {
+  ///   fir.do_concurrent.loop (%ind_var) = (%lb) to (%ub) step (%s) {
   /// %ind_var_conv = fir.convert %ind_var : (index) -> i32
   /// fir.store %ind_var_conv to %i#1 : !fir.ref
   /// ...
@@ -62,14 +65,14 @@ struct InductionVariableInfo {
   /// Note: The current implementation is dependent on how flang emits loop
   /// bodies; which is sufficient for the current simple test/use cases. If 
this
   /// proves to be insufficient, this should be made more generic.
-  void populateInfo(fir::DoLoopOp doLoop) {
+  void populateInfo(fir::DoConcurrentLoopOp loop, mlir::Value inductionVar) {
 mlir::Value result = nullptr;
 
 // Checks if a StoreOp is updating the memref of the loop's iteration
 // variable.
 auto isStoringIV = [&](fir::StoreOp storeOp) {
   // Direct store into the IV memref.
-  if (storeOp.getValue() == doLoop.getInductionVar()) {
+  if (storeOp.getValue() == inductionVar) {
 indVarUpdateOps.push_back(storeOp);
 return true;
   }
@@ -77,7 +80,7 @@ struct InductionVariableInfo {
   // Indirect store into the IV memref.
   if (auto convertOp = mlir::dyn_cast(
   storeOp.getValue().getDefiningOp())) {
-if (convertOp.getOperand() == doLoop.getInductionVar()) {
+if (convertOp.getOperand() == inductionVar) {
   indVarUpdateOps.push_back(convertOp);
   indVarUpdateOps.push_back(storeOp);
   return true;
@@ -87,7 +90,7 @@ struct InductionVariableInfo {
   return false;
 };
 
-for (mlir::Operation &op : doLoop) {
+for (mlir::Operation &op : loop) {
   if (auto storeOp = mlir::dyn_cast(op))
 if (isStoringIV(storeOp)) {
   result = storeOp.getMemref();
@@ -100,219 +103,7 @@ struct InductionVariableInfo {
   }
 };
 
-using LoopNestToIndVarMap =
-llvm::MapVector;
-
-/// Loop \p innerLoop is considered perfectly-nested inside \p outerLoop iff
-/// there are no operations in \p outerloop's body other than:
-///
-/// 1. the operations needed to assign/update \p outerLoop's induction 
variable.
-/// 2. \p innerLoop itself.
-///
-/// \p return true if \p innerLoop is perfectly nested inside \p outerLoop
-/// according to the above definition.
-bool isPerfectlyNested(fir::DoLoopOp outerLoop, fir::DoLoopOp innerLoop) {
-  mlir::ForwardSliceOptions forwardSliceOptions;
-  forwardSliceOptions.inclusive = true;
-  // The following

[llvm-branch-commits] [flang] [flang][OpenMP] Update `do concurrent` mapping pass to use `fir.do_concurrent` op (PR #138489)

2025-05-05 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy created 
https://github.com/llvm/llvm-project/pull/138489

This PR updates the `do concurrent` to OpenMP mapping pass to use the newly 
added `fir.do_concurrent` ops that were recently added upstream instead of 
handling nests of `fir.do_loop ... unordered` ops.

Parent PR: https://github.com/llvm/llvm-project/pull/137928.



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][NPM] Complete optimized regalloc pipeline (PR #138491)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


https://github.com/optimisan created 
https://github.com/llvm/llvm-project/pull/138491

Also fill in some other passes.

>From d415354c8ef7e761a8bcbc83420501bca8abc2f3 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Mon, 5 May 2025 06:30:03 +
Subject: [PATCH] [AMDGPU][NPM] Complete optimized regalloc pipeline

Also fill in some other passes.
---
 llvm/include/llvm/Passes/CodeGenPassBuilder.h |  2 +-
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 41 +--
 llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h  |  1 +
 llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll  |  8 +---
 4 files changed, 42 insertions(+), 10 deletions(-)

diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index ddd258c21f593..982bb16e71eab 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -574,7 +574,7 @@ template  class 
CodeGenPassBuilder {
   /// Insert InsertedPass pass after TargetPass pass.
   /// Only machine function passes are supported.
   template 
-  void insertPass(InsertedPassT &&Pass) {
+  void insertPass(InsertedPassT &&Pass) const {
 AfterCallbacks.emplace_back(
 [&](StringRef Name, MachineFunctionPassManager &MFPM) mutable {
   if (Name == TargetPassT::name())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 56f808a553388..076440f869cd0 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -2162,7 +2162,44 @@ void AMDGPUCodeGenPassBuilder::addMachineSSAOptimization(
   addPass(SIShrinkInstructionsPass());
 }
 
+void AMDGPUCodeGenPassBuilder::addOptimizedRegAlloc(
+AddMachinePass &addPass) const {
+  if (EnableDCEInRA)
+insertPass(DeadMachineInstructionElimPass());
+
+  // FIXME: when an instruction has a Killed operand, and the instruction is
+  // inside a bundle, seems only the BUNDLE instruction appears as the Kills of
+  // the register in LiveVariables, this would trigger a failure in verifier,
+  // we should fix it and enable the verifier.
+  if (OptVGPRLiveRange)
+insertPass>(
+SIOptimizeVGPRLiveRangePass());
+
+  // This must be run immediately after phi elimination and before
+  // TwoAddressInstructions, otherwise the processing of the tied operand of
+  // SI_ELSE will introduce a copy of the tied operand source after the else.
+  insertPass(SILowerControlFlowPass());
+
+  if (EnableRewritePartialRegUses)
+insertPass(GCNRewritePartialRegUsesPass());
+
+  if (isPassEnabled(EnablePreRAOptimizations))
+insertPass(GCNPreRAOptimizationsPass());
 
+  // Allow the scheduler to run before SIWholeQuadMode inserts exec 
manipulation
+  // instructions that cause scheduling barriers.
+  insertPass(SIWholeQuadModePass());
+
+  if (OptExecMaskPreRA)
+insertPass(SIOptimizeExecMaskingPreRAPass());
+
+  // This is not an essential optimization and it has a noticeable impact on
+  // compilation time, so we only enable it from O2.
+  if (TM.getOptLevel() > CodeGenOptLevel::Less)
+insertPass(SIFormMemoryClausesPass());
+
+  Base::addOptimizedRegAlloc(addPass);
+}
 
 Error AMDGPUCodeGenPassBuilder::addRegAssignmentOptimized(
 AddMachinePass &addPass) const {
@@ -2190,21 +2227,19 @@ Error 
AMDGPUCodeGenPassBuilder::addRegAssignmentOptimized(
   addPass(SIPreAllocateWWMRegsPass());
 
   // For allocating other wwm register operands.
-  // addRegAlloc(addPass, RegAllocPhase::WWM);
   addPass(RAGreedyPass({onlyAllocateWWMRegs, "wwm"}));
   addPass(SILowerWWMCopiesPass());
   addPass(VirtRegRewriterPass(false));
   addPass(AMDGPUReserveWWMRegsPass());
 
   // For allocating per-thread VGPRs.
-  // addRegAlloc(addPass, RegAllocPhase::VGPR);
   addPass(RAGreedyPass({onlyAllocateVGPRs, "vgpr"}));
 
 
   addPreRewrite(addPass);
   addPass(VirtRegRewriterPass(true));
 
-  // TODO: addPass(AMDGPUMarkLastScratchLoadPass());
+  addPass(AMDGPUMarkLastScratchLoadPass());
   return Error::success();
 }
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h
index 589123274d0f5..3c62cd19c6e57 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h
@@ -182,6 +182,7 @@ class AMDGPUCodeGenPassBuilder
   void addPostRegAlloc(AddMachinePass &) const;
   void addPreEmitPass(AddMachinePass &) const;
   Error addRegAssignmentOptimized(AddMachinePass &) const;
+  void addOptimizedRegAlloc(AddMachinePass &) const;
 
   /// Check if a pass is enabled given \p Opt option. The option always
   /// overrides defaults if explicitly used. Otherwise its default will be used
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll 
b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
index e9b57515e71e0..e0b9e2c0500e6 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
@@ -7,15 +7,11 @@
 ;

[llvm-branch-commits] [llvm] [AMDGPU][NPM] Complete optimized regalloc pipeline (PR #138491)

2025-05-05 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Akshat Oke (optimisan)


Changes

Also fill in some other passes.

---
Full diff: https://github.com/llvm/llvm-project/pull/138491.diff


4 Files Affected:

- (modified) llvm/include/llvm/Passes/CodeGenPassBuilder.h (+1-1) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+38-3) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h (+1) 
- (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll (+2-6) 


``diff
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index ddd258c21f593..982bb16e71eab 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -574,7 +574,7 @@ template  class 
CodeGenPassBuilder {
   /// Insert InsertedPass pass after TargetPass pass.
   /// Only machine function passes are supported.
   template 
-  void insertPass(InsertedPassT &&Pass) {
+  void insertPass(InsertedPassT &&Pass) const {
 AfterCallbacks.emplace_back(
 [&](StringRef Name, MachineFunctionPassManager &MFPM) mutable {
   if (Name == TargetPassT::name())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 56f808a553388..076440f869cd0 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -2162,7 +2162,44 @@ void AMDGPUCodeGenPassBuilder::addMachineSSAOptimization(
   addPass(SIShrinkInstructionsPass());
 }
 
+void AMDGPUCodeGenPassBuilder::addOptimizedRegAlloc(
+AddMachinePass &addPass) const {
+  if (EnableDCEInRA)
+insertPass(DeadMachineInstructionElimPass());
+
+  // FIXME: when an instruction has a Killed operand, and the instruction is
+  // inside a bundle, seems only the BUNDLE instruction appears as the Kills of
+  // the register in LiveVariables, this would trigger a failure in verifier,
+  // we should fix it and enable the verifier.
+  if (OptVGPRLiveRange)
+insertPass>(
+SIOptimizeVGPRLiveRangePass());
+
+  // This must be run immediately after phi elimination and before
+  // TwoAddressInstructions, otherwise the processing of the tied operand of
+  // SI_ELSE will introduce a copy of the tied operand source after the else.
+  insertPass(SILowerControlFlowPass());
+
+  if (EnableRewritePartialRegUses)
+insertPass(GCNRewritePartialRegUsesPass());
+
+  if (isPassEnabled(EnablePreRAOptimizations))
+insertPass(GCNPreRAOptimizationsPass());
 
+  // Allow the scheduler to run before SIWholeQuadMode inserts exec 
manipulation
+  // instructions that cause scheduling barriers.
+  insertPass(SIWholeQuadModePass());
+
+  if (OptExecMaskPreRA)
+insertPass(SIOptimizeExecMaskingPreRAPass());
+
+  // This is not an essential optimization and it has a noticeable impact on
+  // compilation time, so we only enable it from O2.
+  if (TM.getOptLevel() > CodeGenOptLevel::Less)
+insertPass(SIFormMemoryClausesPass());
+
+  Base::addOptimizedRegAlloc(addPass);
+}
 
 Error AMDGPUCodeGenPassBuilder::addRegAssignmentOptimized(
 AddMachinePass &addPass) const {
@@ -2190,21 +2227,19 @@ Error 
AMDGPUCodeGenPassBuilder::addRegAssignmentOptimized(
   addPass(SIPreAllocateWWMRegsPass());
 
   // For allocating other wwm register operands.
-  // addRegAlloc(addPass, RegAllocPhase::WWM);
   addPass(RAGreedyPass({onlyAllocateWWMRegs, "wwm"}));
   addPass(SILowerWWMCopiesPass());
   addPass(VirtRegRewriterPass(false));
   addPass(AMDGPUReserveWWMRegsPass());
 
   // For allocating per-thread VGPRs.
-  // addRegAlloc(addPass, RegAllocPhase::VGPR);
   addPass(RAGreedyPass({onlyAllocateVGPRs, "vgpr"}));
 
 
   addPreRewrite(addPass);
   addPass(VirtRegRewriterPass(true));
 
-  // TODO: addPass(AMDGPUMarkLastScratchLoadPass());
+  addPass(AMDGPUMarkLastScratchLoadPass());
   return Error::success();
 }
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h
index 589123274d0f5..3c62cd19c6e57 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h
@@ -182,6 +182,7 @@ class AMDGPUCodeGenPassBuilder
   void addPostRegAlloc(AddMachinePass &) const;
   void addPreEmitPass(AddMachinePass &) const;
   Error addRegAssignmentOptimized(AddMachinePass &) const;
+  void addOptimizedRegAlloc(AddMachinePass &) const;
 
   /// Check if a pass is enabled given \p Opt option. The option always
   /// overrides defaults if explicitly used. Otherwise its default will be used
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll 
b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
index e9b57515e71e0..e0b9e2c0500e6 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
@@ -7,15 +7,11 @@
 ; RUN: llc -O3 -enable-new-pm -mtriple=amdgcn--amdhsa -print-pipeline-passes < 
%s 2>&1 \
 ; RUN:   | FileCheck -check-prefix=GCN-O3 %s
 
-
 ;

[llvm-branch-commits] [llvm] [AMDGPU][NPM] Complete optimized regalloc pipeline (PR #138491)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


https://github.com/optimisan ready_for_review 
https://github.com/llvm/llvm-project/pull/138491
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][NPM] Complete optimized regalloc pipeline (PR #138491)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


optimisan wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/138491?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#138491** https://app.graphite.dev/github/pr/llvm/llvm-project/138491?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/138491?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#136818** https://app.graphite.dev/github/pr/llvm/llvm-project/136818?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/138491
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)

2025-05-05 Thread Matt Arsenault via llvm-branch-commits



@@ -0,0 +1,19 @@
+;; Test if the callee_type metadata is dropped when it is attached
+;; to a direct function call during instcombine.
+
+; RUN: opt < %s -passes="instcombine" -disable-verify -S | FileCheck %s

arsenm wrote:

```suggestion
; RUN: opt  -S -passes=instcombine -S < %s | FileCheck %s
```

Definitely do not disable the verifier 


https://github.com/llvm/llvm-project/pull/87573
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)

2025-05-05 Thread Matt Arsenault via llvm-branch-commits



@@ -0,0 +1,19 @@
+;; Test if the callee_type metadata is dropped when it is attached
+;; to a direct function call during instcombine.
+
+; RUN: opt < %s -passes="instcombine" -disable-verify -S | FileCheck %s
+
+define i32 @_Z3barv() local_unnamed_addr !type !3 {
+entry:
+  ; CHECK: %call = call i32 @_Z3fooc(i8 97)
+  ; CHECK-NOT: %call = call i32 @_Z3fooc(i8 97), !callee_type !1

arsenm wrote:

NOT checks are fragile, and this is far too specific, Generate full checks 

https://github.com/llvm/llvm-project/pull/87573
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lld] release/20.x: [wasm-ld] Refactor WasmSym from static globals to per-link context (#134970) (PR #137620)

2025-05-05 Thread Vassil Vassilev via llvm-branch-commits


https://github.com/vgvassilev milestoned 
https://github.com/llvm/llvm-project/pull/137620
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [flang][OpenMP] Update `do concurrent` mapping pass to use `fir.do_concurrent` op (PR #138489)

2025-05-05 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/138489

>From ac0cde7576c49a00e3591254d6891879b52bee81 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Mon, 5 May 2025 02:23:04 -0500
Subject: [PATCH] [flang][OpenMP] Update `do concurrent` mapping pass to use
 `fir.do_concurrent` op

This PR updates the `do concurrent` to OpenMP mapping pass to use the newly 
added `fir.do_concurrent` ops that were recently added upstream instead of 
handling nests of `fir.do_loop ... unordered` ops.

Parent PR: https://github.com/llvm/llvm-project/pull/137928.
---
 .../OpenMP/DoConcurrentConversion.cpp | 362 --
 .../Transforms/DoConcurrent/basic_device.mlir |  24 +-
 .../Transforms/DoConcurrent/basic_host.f90|   3 -
 .../Transforms/DoConcurrent/basic_host.mlir   |  26 +-
 .../DoConcurrent/locally_destroyed_temp.f90   |   3 -
 .../DoConcurrent/loop_nest_test.f90   |  92 -
 .../multiple_iteration_ranges.f90 |   3 -
 .../DoConcurrent/non_const_bounds.f90 |   3 -
 .../DoConcurrent/not_perfectly_nested.f90 |  24 +-
 9 files changed, 122 insertions(+), 418 deletions(-)
 delete mode 100644 flang/test/Transforms/DoConcurrent/loop_nest_test.f90

diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp 
b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
index 2c069860ffdca..0fdb302fe10ca 100644
--- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
@@ -6,6 +6,7 @@
 //
 
//===--===//
 
+#include "flang/Optimizer/Builder/FIRBuilder.h"
 #include "flang/Optimizer/Dialect/FIROps.h"
 #include "flang/Optimizer/OpenMP/Passes.h"
 #include "flang/Optimizer/OpenMP/Utils.h"
@@ -28,8 +29,10 @@ namespace looputils {
 /// Stores info needed about the induction/iteration variable for each `do
 /// concurrent` in a loop nest.
 struct InductionVariableInfo {
-  InductionVariableInfo(fir::DoLoopOp doLoop) { populateInfo(doLoop); }
-
+  InductionVariableInfo(fir::DoConcurrentLoopOp loop,
+mlir::Value inductionVar) {
+populateInfo(loop, inductionVar);
+  }
   /// The operation allocating memory for iteration variable.
   mlir::Operation *iterVarMemDef;
   /// the operation(s) updating the iteration variable with the current
@@ -45,7 +48,7 @@ struct InductionVariableInfo {
   ///   ...
   ///   %i:2 = hlfir.declare %0 {uniq_name = "_QFEi"} : ...
   ///   ...
-  ///   fir.do_loop %ind_var = %lb to %ub step %s unordered {
+  ///   fir.do_concurrent.loop (%ind_var) = (%lb) to (%ub) step (%s) {
   /// %ind_var_conv = fir.convert %ind_var : (index) -> i32
   /// fir.store %ind_var_conv to %i#1 : !fir.ref
   /// ...
@@ -62,14 +65,14 @@ struct InductionVariableInfo {
   /// Note: The current implementation is dependent on how flang emits loop
   /// bodies; which is sufficient for the current simple test/use cases. If 
this
   /// proves to be insufficient, this should be made more generic.
-  void populateInfo(fir::DoLoopOp doLoop) {
+  void populateInfo(fir::DoConcurrentLoopOp loop, mlir::Value inductionVar) {
 mlir::Value result = nullptr;
 
 // Checks if a StoreOp is updating the memref of the loop's iteration
 // variable.
 auto isStoringIV = [&](fir::StoreOp storeOp) {
   // Direct store into the IV memref.
-  if (storeOp.getValue() == doLoop.getInductionVar()) {
+  if (storeOp.getValue() == inductionVar) {
 indVarUpdateOps.push_back(storeOp);
 return true;
   }
@@ -77,7 +80,7 @@ struct InductionVariableInfo {
   // Indirect store into the IV memref.
   if (auto convertOp = mlir::dyn_cast(
   storeOp.getValue().getDefiningOp())) {
-if (convertOp.getOperand() == doLoop.getInductionVar()) {
+if (convertOp.getOperand() == inductionVar) {
   indVarUpdateOps.push_back(convertOp);
   indVarUpdateOps.push_back(storeOp);
   return true;
@@ -87,7 +90,7 @@ struct InductionVariableInfo {
   return false;
 };
 
-for (mlir::Operation &op : doLoop) {
+for (mlir::Operation &op : loop) {
   if (auto storeOp = mlir::dyn_cast(op))
 if (isStoringIV(storeOp)) {
   result = storeOp.getMemref();
@@ -100,219 +103,7 @@ struct InductionVariableInfo {
   }
 };
 
-using LoopNestToIndVarMap =
-llvm::MapVector;
-
-/// Loop \p innerLoop is considered perfectly-nested inside \p outerLoop iff
-/// there are no operations in \p outerloop's body other than:
-///
-/// 1. the operations needed to assign/update \p outerLoop's induction 
variable.
-/// 2. \p innerLoop itself.
-///
-/// \p return true if \p innerLoop is perfectly nested inside \p outerLoop
-/// according to the above definition.
-bool isPerfectlyNested(fir::DoLoopOp outerLoop, fir::DoLoopOp innerLoop) {
-  mlir::ForwardSliceOptions forwardSliceOptions;
-  forwardSliceOptions.inclusive = true;
-  // The follow

[llvm-branch-commits] [SPARC] Use op-then-neg instructions when we have VIS3 (PR #135717)

2025-05-05 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm edited 
https://github.com/llvm/llvm-project/pull/135717
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [SPARC] Use op-then-neg instructions when we have VIS3 (PR #135717)

2025-05-05 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm approved this pull request.

Not sure isFNegFree is doing anything in this patch, would be best to drop it 
and separately test the combines it enables 

https://github.com/llvm/llvm-project/pull/135717
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [SPARC] Use op-then-neg instructions when we have VIS3 (PR #135717)

2025-05-05 Thread Matt Arsenault via llvm-branch-commits



@@ -3605,6 +3605,12 @@ bool SparcTargetLowering::useLoadStackGuardNode(const 
Module &M) const {
   return true;
 }
 
+bool SparcTargetLowering::isFNegFree(EVT VT) const {
+  if (Subtarget->isVIS3())
+return VT == MVT::f32 || VT == MVT::f64;
+  return false;
+}

arsenm wrote:

This ideally would be completely separate patch from the patterns. This enables 
independent combines 

https://github.com/llvm/llvm-project/pull/135717
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)

2025-05-05 Thread via llvm-branch-commits



@@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo,
+   SDValue &Hi) {
+  EVT LoVT, HiVT;
+  SDLoc dl(LD);
+  std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0));
+
+  ISD::LoadExtType ExtType = LD->getExtensionType();
+  SDValue Ch = LD->getChain();
+  SDValue Ptr = LD->getBasePtr();
+  EVT MemoryVT = LD->getMemoryVT();
+
+  EVT LoMemVT, HiMemVT;
+  std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT);
+
+  EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), 
LD->getValueType(0).getSizeInBits());
+  EVT MemIntVT = EVT::getIntegerVT(*DAG.getContext(), 
LD->getMemoryVT().getSizeInBits());
+  SDValue ALD = DAG.getAtomicLoad(ExtType, dl, MemIntVT, IntVT, Ch, Ptr,
+  LD->getMemOperand());

jofrn wrote:

This will fail for some reason, on at least atomic_vec2_ptr270 and 
atomic_vec2_half.

https://github.com/llvm/llvm-project/pull/120640
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From d600ac360ed1e19693cc99f7785fd3f4cef579da Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +-
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 30 +
 .../X86/expand-atomic-non-integer.ll  | 65 +++
 4 files changed, 163 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index c376de877ac7d..b6f1e9db9ce35 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's scalar type to I's  vector type
+  if (I->getType()->getScalarType()->isPointerTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+unsigned AS =
+
cast(I->getType()->getScalarType())->getAddressSpace();
+ElementCount EC = cast(I->getType())->getElementCount();
+Value *BC = Builder.CreateBitCast(
+Result,
+VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)),
+EC));
+Value *IntToPtr = Builder.CreateIntToPtr(
+BC, VectorType::get(PointerType::get(Ctx, AS), EC));
+V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29..36c1305a7c5df 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 08d0405345f57..4293df8c13571 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:popq %rax
+; CHECK-NEXT:retq
+  %ret = load atomic <2 x ptr>, ptr %x acquire, align 16
+  ret <2 x ptr> %ret
+}
+
 define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
 ; CHECK3-LAB

[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From 28f6bf33268da6361d71070d5bf75f93a85b03eb Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG] Split vector types for atomic load

Vector types that aren't widened are split
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  39 
 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++
 3 files changed, 211 insertions(+)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index bdfa5f7741ad3..d8f402f529632 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 0edbbe8af623a..3e364276d1bcf 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N), Lo, Hi);
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1421,6 +1424,42 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo,
+   SDValue &Hi) {
+  EVT LoVT, HiVT;
+  SDLoc dl(LD);
+  std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0));
+
+  ISD::LoadExtType ExtType = LD->getExtensionType();
+  SDValue Ch = LD->getChain();
+  SDValue Ptr = LD->getBasePtr();
+  EVT MemoryVT = LD->getMemoryVT();
+
+  EVT LoMemVT, HiMemVT;
+  std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT);
+
+  EVT IntVT =
+  EVT::getIntegerVT(*DAG.getContext(), 
LD->getValueType(0).getSizeInBits());
+  EVT MemIntVT =
+  EVT::getIntegerVT(*DAG.getContext(), 2 * LoMemVT.getSizeInBits());
+  SDValue ALD = DAG.getAtomicLoad(ExtType, dl, MemIntVT, IntVT, Ch, Ptr,
+  LD->getMemOperand());
+
+  EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits());
+  EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits());
+  SDValue ExtractLo = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, LoIntVT, ALD,
+  DAG.getIntPtrConstant(0, dl));
+  SDValue ExtractHi = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiIntVT, ALD,
+  DAG.getIntPtrConstant(1, dl));
+
+  Lo = DAG.getBitcast(LoVT, ExtractLo);
+  Hi = DAG.getBitcast(HiVT, ExtractHi);
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(LD, 1), ALD.getValue(1));
+}
+
 void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT,
 MachinePointerInfo &MPI, SDValue &Ptr,
 uint64_t *ScaledOffset) {
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 935d058a52f8f..42b0955824293 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -204,6 +204,68 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) {
   ret <2 x float> %ret
 }
 
+define <2 x half> @atomic_vec2_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec2_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:shrl $16, %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm1
+; CHECK3-NEXT:punpcklwd {{.*#+}} xmm0 = 
xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movl

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/125432

>From 6897d31ad46d5be2a46ef5c852d82675ddd65cd2 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 31 Jan 2025 13:12:56 -0500
Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic
 vector.

After splitting, all elements are created. The two components must
be found by looking at the upper and lower half of EXTRACT_ELEMENT.
This change extends EltsFromConsecutiveLoads
to understand AtomicSDNode so that unused elements can be removed.

commit-id:b83937a8
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |   4 +-
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |  20 ++-
 .../SelectionDAGAddressAnalysis.cpp   |  30 ++--
 .../SelectionDAG/SelectionDAGBuilder.cpp  |   6 +-
 llvm/lib/Target/X86/X86ISelLowering.cpp   |  43 +++--
 llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++
 6 files changed, 83 insertions(+), 187 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba11ddbb5b731..d3cd81c146280 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1843,7 +1843,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2281,7 +2281,7 @@ class SelectionDAG {
   /// merged. Check that both are nonvolatile and if LD is loading
   /// 'Bytes' bytes from a location that is 'Dist' units away from the
   /// location that the 'Base' load is loading from.
-  bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
+  bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base,
   unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 2a68903c34cef..8e77a542ab029 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -12218,7 +12218,7 @@ SDValue 
SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain,
   return TokenFactor;
 }
 
-SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad,
+SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad,
SDValue NewMemOp) {
   assert(isa(NewMemOp.getNode()) && "Expected a memop node");
   SDValue OldChain = SDValue(OldLoad, 1);
@@ -12911,17 +12911,21 @@ std::pair 
SelectionDAG::UnrollVectorOverflowOp(
 getBuildVector(NewOvVT, dl, OvScalars));
 }
 
-bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD,
-  LoadSDNode *Base,
+bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD,
+  MemSDNode *Base,
   unsigned Bytes,
   int Dist) const {
   if (LD->isVolatile() || Base->isVolatile())
 return false;
-  // TODO: probably too restrictive for atomics, revisit
-  if (!LD->isSimple())
-return false;
-  if (LD->isIndexed() || Base->isIndexed())
-return false;
+  if (auto Ld = dyn_cast(LD)) {
+if (!Ld->isSimple())
+  return false;
+if (Ld->isIndexed())
+  return false;
+  }
+  if (auto Ld = dyn_cast(Base))
+if (Ld->isIndexed())
+  return false;
   if (LD->getChain() != Base->getChain())
 return false;
   EVT VT = LD->getMemoryVT();
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
index f2ab88851b780..c29cb424c7a4c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
@@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, 
int64_t BitSize,
 }
 
 /// Parses tree in Ptr for base, index, offset addresses.
-static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
-   const SelectionDAG &DAG) {
+template 
+static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) {
   SDValue Ptr = N->getBasePtr();
 
   // (((B + I*M) + c)) + c ...
@@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
   bool IsIndexSignExt = false;
 
   // pre-inc/pre-dec ops are components of EA.
-  if (N->getAddressingMode() == ISD::P

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From d600ac360ed1e19693cc99f7785fd3f4cef579da Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +-
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 30 +
 .../X86/expand-atomic-non-integer.ll  | 65 +++
 4 files changed, 163 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index c376de877ac7d..b6f1e9db9ce35 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's scalar type to I's  vector type
+  if (I->getType()->getScalarType()->isPointerTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+unsigned AS =
+
cast(I->getType()->getScalarType())->getAddressSpace();
+ElementCount EC = cast(I->getType())->getElementCount();
+Value *BC = Builder.CreateBitCast(
+Result,
+VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)),
+EC));
+Value *IntToPtr = Builder.CreateIntToPtr(
+BC, VectorType::get(PointerType::get(Ctx, AS), EC));
+V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29..36c1305a7c5df 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 08d0405345f57..4293df8c13571 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:popq %rax
+; CHECK-NEXT:retq
+  %ret = load atomic <2 x ptr>, ptr %x acquire, align 16
+  ret <2 x ptr> %ret
+}
+
 define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
 ; CHECK3-LAB

[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From 28f6bf33268da6361d71070d5bf75f93a85b03eb Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG] Split vector types for atomic load

Vector types that aren't widened are split
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  39 
 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++
 3 files changed, 211 insertions(+)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index bdfa5f7741ad3..d8f402f529632 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 0edbbe8af623a..3e364276d1bcf 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N), Lo, Hi);
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1421,6 +1424,42 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo,
+   SDValue &Hi) {
+  EVT LoVT, HiVT;
+  SDLoc dl(LD);
+  std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0));
+
+  ISD::LoadExtType ExtType = LD->getExtensionType();
+  SDValue Ch = LD->getChain();
+  SDValue Ptr = LD->getBasePtr();
+  EVT MemoryVT = LD->getMemoryVT();
+
+  EVT LoMemVT, HiMemVT;
+  std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT);
+
+  EVT IntVT =
+  EVT::getIntegerVT(*DAG.getContext(), 
LD->getValueType(0).getSizeInBits());
+  EVT MemIntVT =
+  EVT::getIntegerVT(*DAG.getContext(), 2 * LoMemVT.getSizeInBits());
+  SDValue ALD = DAG.getAtomicLoad(ExtType, dl, MemIntVT, IntVT, Ch, Ptr,
+  LD->getMemOperand());
+
+  EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits());
+  EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits());
+  SDValue ExtractLo = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, LoIntVT, ALD,
+  DAG.getIntPtrConstant(0, dl));
+  SDValue ExtractHi = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiIntVT, ALD,
+  DAG.getIntPtrConstant(1, dl));
+
+  Lo = DAG.getBitcast(LoVT, ExtractLo);
+  Hi = DAG.getBitcast(HiVT, ExtractHi);
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(LD, 1), ALD.getValue(1));
+}
+
 void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT,
 MachinePointerInfo &MPI, SDValue &Ptr,
 uint64_t *ScaledOffset) {
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 935d058a52f8f..42b0955824293 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -204,6 +204,68 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) {
   ret <2 x float> %ret
 }
 
+define <2 x half> @atomic_vec2_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec2_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:shrl $16, %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm1
+; CHECK3-NEXT:punpcklwd {{.*#+}} xmm0 = 
xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movl

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/125432

>From 6897d31ad46d5be2a46ef5c852d82675ddd65cd2 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 31 Jan 2025 13:12:56 -0500
Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic
 vector.

After splitting, all elements are created. The two components must
be found by looking at the upper and lower half of EXTRACT_ELEMENT.
This change extends EltsFromConsecutiveLoads
to understand AtomicSDNode so that unused elements can be removed.

commit-id:b83937a8
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |   4 +-
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |  20 ++-
 .../SelectionDAGAddressAnalysis.cpp   |  30 ++--
 .../SelectionDAG/SelectionDAGBuilder.cpp  |   6 +-
 llvm/lib/Target/X86/X86ISelLowering.cpp   |  43 +++--
 llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++
 6 files changed, 83 insertions(+), 187 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba11ddbb5b731..d3cd81c146280 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1843,7 +1843,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2281,7 +2281,7 @@ class SelectionDAG {
   /// merged. Check that both are nonvolatile and if LD is loading
   /// 'Bytes' bytes from a location that is 'Dist' units away from the
   /// location that the 'Base' load is loading from.
-  bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
+  bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base,
   unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 2a68903c34cef..8e77a542ab029 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -12218,7 +12218,7 @@ SDValue 
SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain,
   return TokenFactor;
 }
 
-SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad,
+SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad,
SDValue NewMemOp) {
   assert(isa(NewMemOp.getNode()) && "Expected a memop node");
   SDValue OldChain = SDValue(OldLoad, 1);
@@ -12911,17 +12911,21 @@ std::pair 
SelectionDAG::UnrollVectorOverflowOp(
 getBuildVector(NewOvVT, dl, OvScalars));
 }
 
-bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD,
-  LoadSDNode *Base,
+bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD,
+  MemSDNode *Base,
   unsigned Bytes,
   int Dist) const {
   if (LD->isVolatile() || Base->isVolatile())
 return false;
-  // TODO: probably too restrictive for atomics, revisit
-  if (!LD->isSimple())
-return false;
-  if (LD->isIndexed() || Base->isIndexed())
-return false;
+  if (auto Ld = dyn_cast(LD)) {
+if (!Ld->isSimple())
+  return false;
+if (Ld->isIndexed())
+  return false;
+  }
+  if (auto Ld = dyn_cast(Base))
+if (Ld->isIndexed())
+  return false;
   if (LD->getChain() != Base->getChain())
 return false;
   EVT VT = LD->getMemoryVT();
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
index f2ab88851b780..c29cb424c7a4c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
@@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, 
int64_t BitSize,
 }
 
 /// Parses tree in Ptr for base, index, offset addresses.
-static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
-   const SelectionDAG &DAG) {
+template 
+static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) {
   SDValue Ptr = N->getBasePtr();
 
   // (((B + I*M) + c)) + c ...
@@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
   bool IsIndexSignExt = false;
 
   // pre-inc/pre-dec ops are components of EA.
-  if (N->getAddressingMode() == ISD::P

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/125432

>From 6897d31ad46d5be2a46ef5c852d82675ddd65cd2 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 31 Jan 2025 13:12:56 -0500
Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic
 vector.

After splitting, all elements are created. The two components must
be found by looking at the upper and lower half of EXTRACT_ELEMENT.
This change extends EltsFromConsecutiveLoads
to understand AtomicSDNode so that unused elements can be removed.

commit-id:b83937a8
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |   4 +-
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |  20 ++-
 .../SelectionDAGAddressAnalysis.cpp   |  30 ++--
 .../SelectionDAG/SelectionDAGBuilder.cpp  |   6 +-
 llvm/lib/Target/X86/X86ISelLowering.cpp   |  43 +++--
 llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++
 6 files changed, 83 insertions(+), 187 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba11ddbb5b731..d3cd81c146280 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1843,7 +1843,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2281,7 +2281,7 @@ class SelectionDAG {
   /// merged. Check that both are nonvolatile and if LD is loading
   /// 'Bytes' bytes from a location that is 'Dist' units away from the
   /// location that the 'Base' load is loading from.
-  bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
+  bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base,
   unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 2a68903c34cef..8e77a542ab029 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -12218,7 +12218,7 @@ SDValue 
SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain,
   return TokenFactor;
 }
 
-SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad,
+SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad,
SDValue NewMemOp) {
   assert(isa(NewMemOp.getNode()) && "Expected a memop node");
   SDValue OldChain = SDValue(OldLoad, 1);
@@ -12911,17 +12911,21 @@ std::pair 
SelectionDAG::UnrollVectorOverflowOp(
 getBuildVector(NewOvVT, dl, OvScalars));
 }
 
-bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD,
-  LoadSDNode *Base,
+bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD,
+  MemSDNode *Base,
   unsigned Bytes,
   int Dist) const {
   if (LD->isVolatile() || Base->isVolatile())
 return false;
-  // TODO: probably too restrictive for atomics, revisit
-  if (!LD->isSimple())
-return false;
-  if (LD->isIndexed() || Base->isIndexed())
-return false;
+  if (auto Ld = dyn_cast(LD)) {
+if (!Ld->isSimple())
+  return false;
+if (Ld->isIndexed())
+  return false;
+  }
+  if (auto Ld = dyn_cast(Base))
+if (Ld->isIndexed())
+  return false;
   if (LD->getChain() != Base->getChain())
 return false;
   EVT VT = LD->getMemoryVT();
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
index f2ab88851b780..c29cb424c7a4c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
@@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, 
int64_t BitSize,
 }
 
 /// Parses tree in Ptr for base, index, offset addresses.
-static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
-   const SelectionDAG &DAG) {
+template 
+static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) {
   SDValue Ptr = N->getBasePtr();
 
   // (((B + I*M) + c)) + c ...
@@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
   bool IsIndexSignExt = false;
 
   // pre-inc/pre-dec ops are components of EA.
-  if (N->getAddressingMode() == ISD::P

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From d600ac360ed1e19693cc99f7785fd3f4cef579da Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +-
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 30 +
 .../X86/expand-atomic-non-integer.ll  | 65 +++
 4 files changed, 163 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index c376de877ac7d..b6f1e9db9ce35 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's scalar type to I's  vector type
+  if (I->getType()->getScalarType()->isPointerTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+unsigned AS =
+
cast(I->getType()->getScalarType())->getAddressSpace();
+ElementCount EC = cast(I->getType())->getElementCount();
+Value *BC = Builder.CreateBitCast(
+Result,
+VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)),
+EC));
+Value *IntToPtr = Builder.CreateIntToPtr(
+BC, VectorType::get(PointerType::get(Ctx, AS), EC));
+V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29..36c1305a7c5df 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 08d0405345f57..4293df8c13571 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:popq %rax
+; CHECK-NEXT:retq
+  %ret = load atomic <2 x ptr>, ptr %x acquire, align 16
+  ret <2 x ptr> %ret
+}
+
 define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
 ; CHECK3-LAB

[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)

2025-05-05 Thread via llvm-branch-commits



@@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo,
+   SDValue &Hi) {
+  EVT LoVT, HiVT;
+  SDLoc dl(LD);
+  std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0));
+
+  ISD::LoadExtType ExtType = LD->getExtensionType();
+  SDValue Ch = LD->getChain();
+  SDValue Ptr = LD->getBasePtr();
+  EVT MemoryVT = LD->getMemoryVT();
+
+  EVT LoMemVT, HiMemVT;
+  std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT);

jofrn wrote:

Only one has to be used to get double that size. May we just remove it all and 
use `MemoryVT.getSizeInBits()`?

https://github.com/llvm/llvm-project/pull/120640
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] IR: Remove reference counts from ConstantData (PR #137314)

2025-05-05 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/137314

>From bdeb993a7b252399949a004ff722da3f30f56b2a Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sat, 19 Apr 2025 21:11:23 +0200
Subject: [PATCH 1/2] IR: Remove reference counts from ConstantData

This is a follow up change to eliminating uselists for ConstantData.
In the previous revision, ConstantData had a replacement reference count
instead of a uselist. This reference count was misleading, and not useful
in the same way as it would be for another value. The references may not
have even been in the current module, since these are shared throughout
the LLVMContext.

This doesn't space leak any more than we previously did; nothing was
attempting to garbage collect unused constants.

Previously the use_empty, and hasNUses type of APIs were supported through
the reference count. These now behave as if the uses are always empty.
Ideally it would be illegal to inspect these, but this forces API complexity
into quite a few places. It may be doable to make it illegal to check these
counts, but I would like there to be a targeted fuzzing effort to make sure
every transform properly deals with a constant in every operand position.

All tests pass if I turn the hasNUses* and getNumUses queries into assertions,
only hasOneUse in particular appears to hit in some set of contexts. I've
added unit tests to ensure logical consistency between these cases
---
 llvm/docs/ReleaseNotes.md  |   4 +-
 llvm/include/llvm/IR/Constants.h   |   3 +-
 llvm/include/llvm/IR/Use.h |   9 +-
 llvm/include/llvm/IR/Value.h   | 118 +++--
 llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp |   2 +-
 llvm/lib/IR/AsmWriter.cpp  |   3 +-
 llvm/lib/IR/Instruction.cpp|   4 +-
 llvm/lib/IR/Value.cpp  |  28 +++--
 llvm/unittests/IR/ConstantsTest.cpp|  36 +++
 9 files changed, 100 insertions(+), 107 deletions(-)

diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 504db733308c1..05318362b99c9 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -56,7 +56,9 @@ Makes programs 10x faster by doing Special New Thing.
 Changes to the LLVM IR
 --
 
-* It is no longer permitted to inspect the uses of ConstantData
+* It is no longer permitted to inspect the uses of ConstantData. Use
+  count APIs will behave as if they have no uses (i.e. use_empty() is
+  always true).
 
 * The `nocapture` attribute has been replaced by `captures(none)`.
 * The constant expression variants of the following instructions have been
diff --git a/llvm/include/llvm/IR/Constants.h b/llvm/include/llvm/IR/Constants.h
index ff51f59b6ec68..76efa9bd63522 100644
--- a/llvm/include/llvm/IR/Constants.h
+++ b/llvm/include/llvm/IR/Constants.h
@@ -51,7 +51,8 @@ template  struct ConstantAggrKeyType;
 /// Since they can be in use by unrelated modules (and are never based on
 /// GlobalValues), it never makes sense to RAUW them.
 ///
-/// These do not have use lists. It is illegal to inspect the uses.
+/// These do not have use lists. It is illegal to inspect the uses. These 
behave
+/// as if they have no uses (i.e. use_empty() is always true).
 class ConstantData : public Constant {
   constexpr static IntrusiveOperandsAllocMarker AllocMarker{0};
 
diff --git a/llvm/include/llvm/IR/Use.h b/llvm/include/llvm/IR/Use.h
index bcd1fd6677497..dc22d69ba561d 100644
--- a/llvm/include/llvm/IR/Use.h
+++ b/llvm/include/llvm/IR/Use.h
@@ -23,7 +23,6 @@
 namespace llvm {
 
 template  struct simplify_type;
-class ConstantData;
 class User;
 class Value;
 
@@ -43,7 +42,7 @@ class Use {
 
 private:
   /// Destructor - Only for zap()
-  ~Use();
+  ~Use() { removeFromList(); }
 
   /// Constructor
   Use(User *Parent) : Parent(Parent) {}
@@ -85,10 +84,8 @@ class Use {
   Use **Prev = nullptr;
   User *Parent = nullptr;
 
-  inline void addToList(unsigned &Count);
-  inline void addToList(Use *&List);
-  inline void removeFromList(unsigned &Count);
-  inline void removeFromList(Use *&List);
+  inline void addToList(Use **List);
+  inline void removeFromList();
 };
 
 /// Allow clients to treat uses just like values when using
diff --git a/llvm/include/llvm/IR/Value.h b/llvm/include/llvm/IR/Value.h
index 180b6238eda6c..ae874304c4316 100644
--- a/llvm/include/llvm/IR/Value.h
+++ b/llvm/include/llvm/IR/Value.h
@@ -116,10 +116,7 @@ class Value {
 
 private:
   Type *VTy;
-  union {
-Use *List = nullptr;
-unsigned Count;
-  } Uses;
+  Use *UseList = nullptr;
 
   friend class ValueAsMetadata; // Allow access to IsUsedByMD.
   friend class ValueHandleBase; // Allow access to HasValueHandle.
@@ -347,23 +344,21 @@ class Value {
 
   bool use_empty() const {
 assertModuleIsMaterialized();
-return hasUseList() ? Uses.List == nullptr : Uses.Count == 0;
+return UseList == nullptr;
   }
 
-  bool materialized_use_empty() const {
-

[llvm-branch-commits] [llvm] IR: Remove reference counts from ConstantData (PR #137314)

2025-05-05 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/137314

>From bdeb993a7b252399949a004ff722da3f30f56b2a Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sat, 19 Apr 2025 21:11:23 +0200
Subject: [PATCH 1/2] IR: Remove reference counts from ConstantData

This is a follow up change to eliminating uselists for ConstantData.
In the previous revision, ConstantData had a replacement reference count
instead of a uselist. This reference count was misleading, and not useful
in the same way as it would be for another value. The references may not
have even been in the current module, since these are shared throughout
the LLVMContext.

This doesn't space leak any more than we previously did; nothing was
attempting to garbage collect unused constants.

Previously the use_empty, and hasNUses type of APIs were supported through
the reference count. These now behave as if the uses are always empty.
Ideally it would be illegal to inspect these, but this forces API complexity
into quite a few places. It may be doable to make it illegal to check these
counts, but I would like there to be a targeted fuzzing effort to make sure
every transform properly deals with a constant in every operand position.

All tests pass if I turn the hasNUses* and getNumUses queries into assertions,
only hasOneUse in particular appears to hit in some set of contexts. I've
added unit tests to ensure logical consistency between these cases
---
 llvm/docs/ReleaseNotes.md  |   4 +-
 llvm/include/llvm/IR/Constants.h   |   3 +-
 llvm/include/llvm/IR/Use.h |   9 +-
 llvm/include/llvm/IR/Value.h   | 118 +++--
 llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp |   2 +-
 llvm/lib/IR/AsmWriter.cpp  |   3 +-
 llvm/lib/IR/Instruction.cpp|   4 +-
 llvm/lib/IR/Value.cpp  |  28 +++--
 llvm/unittests/IR/ConstantsTest.cpp|  36 +++
 9 files changed, 100 insertions(+), 107 deletions(-)

diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 504db733308c1..05318362b99c9 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -56,7 +56,9 @@ Makes programs 10x faster by doing Special New Thing.
 Changes to the LLVM IR
 --
 
-* It is no longer permitted to inspect the uses of ConstantData
+* It is no longer permitted to inspect the uses of ConstantData. Use
+  count APIs will behave as if they have no uses (i.e. use_empty() is
+  always true).
 
 * The `nocapture` attribute has been replaced by `captures(none)`.
 * The constant expression variants of the following instructions have been
diff --git a/llvm/include/llvm/IR/Constants.h b/llvm/include/llvm/IR/Constants.h
index ff51f59b6ec68..76efa9bd63522 100644
--- a/llvm/include/llvm/IR/Constants.h
+++ b/llvm/include/llvm/IR/Constants.h
@@ -51,7 +51,8 @@ template  struct ConstantAggrKeyType;
 /// Since they can be in use by unrelated modules (and are never based on
 /// GlobalValues), it never makes sense to RAUW them.
 ///
-/// These do not have use lists. It is illegal to inspect the uses.
+/// These do not have use lists. It is illegal to inspect the uses. These 
behave
+/// as if they have no uses (i.e. use_empty() is always true).
 class ConstantData : public Constant {
   constexpr static IntrusiveOperandsAllocMarker AllocMarker{0};
 
diff --git a/llvm/include/llvm/IR/Use.h b/llvm/include/llvm/IR/Use.h
index bcd1fd6677497..dc22d69ba561d 100644
--- a/llvm/include/llvm/IR/Use.h
+++ b/llvm/include/llvm/IR/Use.h
@@ -23,7 +23,6 @@
 namespace llvm {
 
 template  struct simplify_type;
-class ConstantData;
 class User;
 class Value;
 
@@ -43,7 +42,7 @@ class Use {
 
 private:
   /// Destructor - Only for zap()
-  ~Use();
+  ~Use() { removeFromList(); }
 
   /// Constructor
   Use(User *Parent) : Parent(Parent) {}
@@ -85,10 +84,8 @@ class Use {
   Use **Prev = nullptr;
   User *Parent = nullptr;
 
-  inline void addToList(unsigned &Count);
-  inline void addToList(Use *&List);
-  inline void removeFromList(unsigned &Count);
-  inline void removeFromList(Use *&List);
+  inline void addToList(Use **List);
+  inline void removeFromList();
 };
 
 /// Allow clients to treat uses just like values when using
diff --git a/llvm/include/llvm/IR/Value.h b/llvm/include/llvm/IR/Value.h
index 180b6238eda6c..ae874304c4316 100644
--- a/llvm/include/llvm/IR/Value.h
+++ b/llvm/include/llvm/IR/Value.h
@@ -116,10 +116,7 @@ class Value {
 
 private:
   Type *VTy;
-  union {
-Use *List = nullptr;
-unsigned Count;
-  } Uses;
+  Use *UseList = nullptr;
 
   friend class ValueAsMetadata; // Allow access to IsUsedByMD.
   friend class ValueHandleBase; // Allow access to HasValueHandle.
@@ -347,23 +344,21 @@ class Value {
 
   bool use_empty() const {
 assertModuleIsMaterialized();
-return hasUseList() ? Uses.List == nullptr : Uses.Count == 0;
+return UseList == nullptr;
   }
 
-  bool materialized_use_empty() const {
-

[llvm-branch-commits] [llvm] IR: Reorder ConstantData enum values (PR #138638)

2025-05-05 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/138638?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#138638** https://app.graphite.dev/github/pr/llvm/llvm-project/138638?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/138638?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#137314** https://app.graphite.dev/github/pr/llvm/llvm-project/137314?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#137313** https://app.graphite.dev/github/pr/llvm/llvm-project/137313?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/138638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] IR: Reorder ConstantData enum values (PR #138638)

2025-05-05 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/138638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] IR: Reorder ConstantData enum values (PR #138638)

2025-05-05 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/138638

This sorts ConstantData to the low values, so we can perform
a hasUseList check in a single compare instead of requiring 2
compares plus an and for the range check.



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] IR: Reorder ConstantData enum values (PR #138638)

2025-05-05 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-llvm-ir

Author: Matt Arsenault (arsenm)


Changes

This sorts ConstantData to the low values, so we can perform
a hasUseList check in a single compare instead of requiring 2
compares plus an and for the range check.

---
Full diff: https://github.com/llvm/llvm-project/pull/138638.diff


1 Files Affected:

- (modified) llvm/include/llvm/IR/Value.def (+20-18) 


``diff
diff --git a/llvm/include/llvm/IR/Value.def b/llvm/include/llvm/IR/Value.def
index 160e0f8513e2a..34b8d4967b28a 100644
--- a/llvm/include/llvm/IR/Value.def
+++ b/llvm/include/llvm/IR/Value.def
@@ -69,24 +69,11 @@
 #define HANDLE_CONSTANT_EXCLUDE_LLVM_C_API(ValueName)
 #endif
 
-// Having constant first makes the range check for isa faster
-// and smaller by one operation.
+// Having constant first makes the range check for isa faster and
+// smaller by one operation. Further, keep ConstantData as the first subset so
+// it's also as fast.
 
 // Constant
-HANDLE_GLOBAL_VALUE(Function)
-HANDLE_GLOBAL_VALUE(GlobalAlias)
-HANDLE_GLOBAL_VALUE(GlobalIFunc)
-HANDLE_GLOBAL_VALUE(GlobalVariable)
-HANDLE_CONSTANT(BlockAddress)
-HANDLE_CONSTANT(ConstantExpr)
-HANDLE_CONSTANT_EXCLUDE_LLVM_C_API(DSOLocalEquivalent)
-HANDLE_CONSTANT_EXCLUDE_LLVM_C_API(NoCFIValue)
-HANDLE_CONSTANT(ConstantPtrAuth)
-
-// ConstantAggregate.
-HANDLE_CONSTANT(ConstantArray)
-HANDLE_CONSTANT(ConstantStruct)
-HANDLE_CONSTANT(ConstantVector)
 
 // ConstantData.
 HANDLE_CONSTANT(UndefValue)
@@ -100,8 +87,23 @@ HANDLE_CONSTANT(ConstantTargetNone)
 HANDLE_CONSTANT(ConstantPointerNull)
 HANDLE_CONSTANT(ConstantTokenNone)
 
-HANDLE_CONSTANT_MARKER(ConstantFirstVal, Function)
-HANDLE_CONSTANT_MARKER(ConstantLastVal, ConstantTokenNone)
+// ConstantAggregate.
+HANDLE_CONSTANT(ConstantArray)
+HANDLE_CONSTANT(ConstantStruct)
+HANDLE_CONSTANT(ConstantVector)
+
+HANDLE_GLOBAL_VALUE(Function)
+HANDLE_GLOBAL_VALUE(GlobalAlias)
+HANDLE_GLOBAL_VALUE(GlobalIFunc)
+HANDLE_GLOBAL_VALUE(GlobalVariable)
+HANDLE_CONSTANT(BlockAddress)
+HANDLE_CONSTANT(ConstantExpr)
+HANDLE_CONSTANT_EXCLUDE_LLVM_C_API(DSOLocalEquivalent)
+HANDLE_CONSTANT_EXCLUDE_LLVM_C_API(NoCFIValue)
+HANDLE_CONSTANT(ConstantPtrAuth)
+
+HANDLE_CONSTANT_MARKER(ConstantFirstVal, UndefValue)
+HANDLE_CONSTANT_MARKER(ConstantLastVal, ConstantPtrAuth)
 HANDLE_CONSTANT_MARKER(ConstantDataFirstVal, UndefValue)
 HANDLE_CONSTANT_MARKER(ConstantDataLastVal, ConstantTokenNone)
 HANDLE_CONSTANT_MARKER(ConstantAggregateFirstVal, ConstantArray)

``




https://github.com/llvm/llvm-project/pull/138638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: [OpenMP] Add pre sm_70 load hack back in (#138589) (PR #138626)

2025-05-05 Thread via llvm-branch-commits


llvmbot wrote:

@shiltian What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/138626
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: [OpenMP] Add pre sm_70 load hack back in (#138589) (PR #138626)

2025-05-05 Thread via llvm-branch-commits


https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/138626

Backport dfcb8cb

Requested by: @ye-luo

>From dd2f2eeb12fe3349944a12caf30ec874752dea34 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 5 May 2025 16:33:41 -0500
Subject: [PATCH] [OpenMP] Add pre sm_70 load hack back in (#138589)

Summary:
Different ordering modes aren't supported for an atomic load, so we just
do an add of zero as the same thing. It's less efficient, but it works.

Fixes https://github.com/llvm/llvm-project/issues/138560

(cherry picked from commit dfcb8cb2a92c9f72ddde5ea08dadf2f640197d32)
---
 offload/DeviceRTL/include/Synchronization.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/offload/DeviceRTL/include/Synchronization.h 
b/offload/DeviceRTL/include/Synchronization.h
index 5a789441b9d35..c510fbf0774c2 100644
--- a/offload/DeviceRTL/include/Synchronization.h
+++ b/offload/DeviceRTL/include/Synchronization.h
@@ -61,7 +61,11 @@ V add(Ty *Address, V Val, atomic::OrderingTy Ordering,
 template >
 V load(Ty *Address, atomic::OrderingTy Ordering,
MemScopeTy MemScope = MemScopeTy::device) {
+#ifdef __NVPTX__
+  return __scoped_atomic_fetch_add(Address, V(0), Ordering, MemScope);
+#else
   return __scoped_atomic_load_n(Address, Ordering, MemScope);
+#endif
 }
 
 template >

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: [OpenMP] Add pre sm_70 load hack back in (#138589) (PR #138626)

2025-05-05 Thread via llvm-branch-commits


https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/138626
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: [OpenMP] Add pre sm_70 load hack back in (#138589) (PR #138626)

2025-05-05 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-offload

Author: None (llvmbot)


Changes

Backport dfcb8cb

Requested by: @ye-luo

---
Full diff: https://github.com/llvm/llvm-project/pull/138626.diff


1 Files Affected:

- (modified) offload/DeviceRTL/include/Synchronization.h (+4) 


``diff
diff --git a/offload/DeviceRTL/include/Synchronization.h 
b/offload/DeviceRTL/include/Synchronization.h
index 5a789441b9d35..c510fbf0774c2 100644
--- a/offload/DeviceRTL/include/Synchronization.h
+++ b/offload/DeviceRTL/include/Synchronization.h
@@ -61,7 +61,11 @@ V add(Ty *Address, V Val, atomic::OrderingTy Ordering,
 template >
 V load(Ty *Address, atomic::OrderingTy Ordering,
MemScopeTy MemScope = MemScopeTy::device) {
+#ifdef __NVPTX__
+  return __scoped_atomic_fetch_add(Address, V(0), Ordering, MemScope);
+#else
   return __scoped_atomic_load_n(Address, Ordering, MemScope);
+#endif
 }
 
 template >

``




https://github.com/llvm/llvm-project/pull/138626
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From 681d443e5e9a069a73eaa0ce50684b41b95c48fd Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +-
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 30 +
 .../X86/expand-atomic-non-integer.ll  | 65 +++
 4 files changed, 163 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index c376de877ac7d..b6f1e9db9ce35 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's scalar type to I's  vector type
+  if (I->getType()->getScalarType()->isPointerTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+unsigned AS =
+
cast(I->getType()->getScalarType())->getAddressSpace();
+ElementCount EC = cast(I->getType())->getElementCount();
+Value *BC = Builder.CreateBitCast(
+Result,
+VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)),
+EC));
+Value *IntToPtr = Builder.CreateIntToPtr(
+BC, VectorType::get(PointerType::get(Ctx, AS), EC));
+V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29..36c1305a7c5df 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 08d0405345f57..4293df8c13571 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:popq %rax
+; CHECK-NEXT:retq
+  %ret = load atomic <2 x ptr>, ptr %x acquire, align 16
+  ret <2 x ptr> %ret
+}
+
 define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
 ; CHECK3-LAB

[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From db5b862de7faed37cb9c40c170d4cd1e9612b489 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG] Split vector types for atomic load

Vector types that aren't widened are split
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  37 
 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++
 3 files changed, 209 insertions(+)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index bdfa5f7741ad3..d8f402f529632 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 0edbbe8af623a..895d3c51e0e1e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N), Lo, Hi);
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo,
+   SDValue &Hi) {
+  EVT LoVT, HiVT;
+  SDLoc dl(LD);
+  std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0));
+
+  ISD::LoadExtType ExtType = LD->getExtensionType();
+  SDValue Ch = LD->getChain();
+  SDValue Ptr = LD->getBasePtr();
+  EVT MemoryVT = LD->getMemoryVT();
+
+  EVT LoMemVT, HiMemVT;
+  std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT);
+
+  EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), 
LD->getValueType(0).getSizeInBits());
+  EVT MemIntVT = EVT::getIntegerVT(*DAG.getContext(), 2 * 
LoMemVT.getSizeInBits());
+  SDValue ALD = DAG.getAtomicLoad(ExtType, dl, MemIntVT, IntVT, Ch, Ptr,
+  LD->getMemOperand());
+
+  EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits());
+  EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits());
+  SDValue ExtractLo = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, LoIntVT, ALD,
+  DAG.getIntPtrConstant(0, dl));
+  SDValue ExtractHi = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiIntVT, ALD,
+  DAG.getIntPtrConstant(1, dl));
+
+  Lo = DAG.getBitcast(LoVT, ExtractLo);
+  Hi = DAG.getBitcast(HiVT, ExtractHi);
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(LD, 1), ALD.getValue(1));
+}
+
 void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT,
 MachinePointerInfo &MPI, SDValue &Ptr,
 uint64_t *ScaledOffset) {
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 935d058a52f8f..42b0955824293 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -204,6 +204,68 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) {
   ret <2 x float> %ret
 }
 
+define <2 x half> @atomic_vec2_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec2_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:shrl $16, %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm1
+; CHECK3-NEXT:punpcklwd {{.*#+}} xmm0 = 
xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movl (%rdi), %eax

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/125432

>From fc2debee17c4ded2edbe2f1803f3184cea78bfdc Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 31 Jan 2025 13:12:56 -0500
Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic
 vector.

After splitting, all elements are created. The two components must
be found by looking at the upper and lower half of EXTRACT_ELEMENT.
This change extends EltsFromConsecutiveLoads
to understand AtomicSDNode so that unused elements can be removed.

commit-id:b83937a8
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |   4 +-
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |  20 ++-
 .../SelectionDAGAddressAnalysis.cpp   |  30 ++--
 .../SelectionDAG/SelectionDAGBuilder.cpp  |   6 +-
 llvm/lib/Target/X86/X86ISelLowering.cpp   |  43 +++--
 llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++
 6 files changed, 83 insertions(+), 187 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba11ddbb5b731..d3cd81c146280 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1843,7 +1843,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2281,7 +2281,7 @@ class SelectionDAG {
   /// merged. Check that both are nonvolatile and if LD is loading
   /// 'Bytes' bytes from a location that is 'Dist' units away from the
   /// location that the 'Base' load is loading from.
-  bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
+  bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base,
   unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 2a68903c34cef..8e77a542ab029 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -12218,7 +12218,7 @@ SDValue 
SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain,
   return TokenFactor;
 }
 
-SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad,
+SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad,
SDValue NewMemOp) {
   assert(isa(NewMemOp.getNode()) && "Expected a memop node");
   SDValue OldChain = SDValue(OldLoad, 1);
@@ -12911,17 +12911,21 @@ std::pair 
SelectionDAG::UnrollVectorOverflowOp(
 getBuildVector(NewOvVT, dl, OvScalars));
 }
 
-bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD,
-  LoadSDNode *Base,
+bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD,
+  MemSDNode *Base,
   unsigned Bytes,
   int Dist) const {
   if (LD->isVolatile() || Base->isVolatile())
 return false;
-  // TODO: probably too restrictive for atomics, revisit
-  if (!LD->isSimple())
-return false;
-  if (LD->isIndexed() || Base->isIndexed())
-return false;
+  if (auto Ld = dyn_cast(LD)) {
+if (!Ld->isSimple())
+  return false;
+if (Ld->isIndexed())
+  return false;
+  }
+  if (auto Ld = dyn_cast(Base))
+if (Ld->isIndexed())
+  return false;
   if (LD->getChain() != Base->getChain())
 return false;
   EVT VT = LD->getMemoryVT();
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
index f2ab88851b780..c29cb424c7a4c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
@@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, 
int64_t BitSize,
 }
 
 /// Parses tree in Ptr for base, index, offset addresses.
-static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
-   const SelectionDAG &DAG) {
+template 
+static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) {
   SDValue Ptr = N->getBasePtr();
 
   // (((B + I*M) + c)) + c ...
@@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
   bool IsIndexSignExt = false;
 
   // pre-inc/pre-dec ops are components of EA.
-  if (N->getAddressingMode() == ISD::P

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From 218ce15319d641c89ce8a5ea7e770fd0c2e5223d Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 104 ++
 llvm/lib/Target/X86/X86InstrCompiler.td   |   7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll|  81 ++
 llvm/test/CodeGen/X86/atomic-unordered.ll |   3 +-
 5 files changed, 173 insertions(+), 23 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 89ea7ef4dbe89..bdfa5f7741ad3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 8eee7a4c61fe6..0edbbe8af623a 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4625,6 +4625,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -6014,6 +6017,85 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+/// Either return the same load or provide appropriate casts
+/// from the load and return that.
+static SDValue loadElement(SDValue LdOp, EVT FirstVT, EVT WidenVT,
+   TypeSize LdWidth, TypeSize FirstVTWidth, SDLoc dl,
+   SelectionDAG &DAG) {
+  assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth));
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  if (!FirstVT.isVector()) {
+unsigned NumElts =
+WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue();
+EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts);
+SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp);
+return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp);
+  } else if (FirstVT == WidenVT)
+return LdOp;
+  else {
+// TODO: We don't currently have any tests that exercise this code path.
+assert(WidenWidth.getFixedValue() % FirstVTWidth.getFixedValue() == 0);
+unsigned NumConcat =
+WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue();
+SmallVector ConcatOps(NumConcat);
+SDValue UndefVal = DAG.getUNDEF(FirstVT);
+ConcatOps[0] = LdOp;
+for (unsigned i = 1; i != NumConcat; ++i)
+  ConcatOps[i] = UndefVal;
+return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, ConcatOps);
+  }
+}
+
+static std::optional findMemType(SelectionDAG &DAG,
+  const TargetLowering &TLI, unsigned 
Width,
+  EVT WidenVT, unsigned Align,
+  unsigned WidenEx);
+
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  EVT WidenVT =
+  TLI.getTypeToTransformTo(*DAG.getContext(), LD->getValueType(0));
+  EVT LdVT = LD->getMemoryVT();
+  SDLoc dl(LD);
+  assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors");
+  assert(LdVT.isScalableVector() == WidenVT.isScalableVector() &&
+ "Must be scalable");
+  assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() &&
+ "Expected equivalent element types");
+
+  // Load information
+  SDValue Chain = LD->getChain();
+  SDValue BasePtr = LD->getBasePtr();
+  MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags();
+  AAMDNodes AAInfo = LD->getAAInfo();
+
+  TypeSize LdWidth = LdVT.getSizeInBits();
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  TypeSize WidthDiff = WidenWidth - LdWidth;
+
+  // Find the vector type that can load from.
+  std::optional FirstVT =
+

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/125432

>From fc2debee17c4ded2edbe2f1803f3184cea78bfdc Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 31 Jan 2025 13:12:56 -0500
Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic
 vector.

After splitting, all elements are created. The two components must
be found by looking at the upper and lower half of EXTRACT_ELEMENT.
This change extends EltsFromConsecutiveLoads
to understand AtomicSDNode so that unused elements can be removed.

commit-id:b83937a8
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |   4 +-
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |  20 ++-
 .../SelectionDAGAddressAnalysis.cpp   |  30 ++--
 .../SelectionDAG/SelectionDAGBuilder.cpp  |   6 +-
 llvm/lib/Target/X86/X86ISelLowering.cpp   |  43 +++--
 llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++
 6 files changed, 83 insertions(+), 187 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba11ddbb5b731..d3cd81c146280 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1843,7 +1843,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2281,7 +2281,7 @@ class SelectionDAG {
   /// merged. Check that both are nonvolatile and if LD is loading
   /// 'Bytes' bytes from a location that is 'Dist' units away from the
   /// location that the 'Base' load is loading from.
-  bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
+  bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base,
   unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 2a68903c34cef..8e77a542ab029 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -12218,7 +12218,7 @@ SDValue 
SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain,
   return TokenFactor;
 }
 
-SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad,
+SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad,
SDValue NewMemOp) {
   assert(isa(NewMemOp.getNode()) && "Expected a memop node");
   SDValue OldChain = SDValue(OldLoad, 1);
@@ -12911,17 +12911,21 @@ std::pair 
SelectionDAG::UnrollVectorOverflowOp(
 getBuildVector(NewOvVT, dl, OvScalars));
 }
 
-bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD,
-  LoadSDNode *Base,
+bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD,
+  MemSDNode *Base,
   unsigned Bytes,
   int Dist) const {
   if (LD->isVolatile() || Base->isVolatile())
 return false;
-  // TODO: probably too restrictive for atomics, revisit
-  if (!LD->isSimple())
-return false;
-  if (LD->isIndexed() || Base->isIndexed())
-return false;
+  if (auto Ld = dyn_cast(LD)) {
+if (!Ld->isSimple())
+  return false;
+if (Ld->isIndexed())
+  return false;
+  }
+  if (auto Ld = dyn_cast(Base))
+if (Ld->isIndexed())
+  return false;
   if (LD->getChain() != Base->getChain())
 return false;
   EVT VT = LD->getMemoryVT();
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
index f2ab88851b780..c29cb424c7a4c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
@@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, 
int64_t BitSize,
 }
 
 /// Parses tree in Ptr for base, index, offset addresses.
-static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
-   const SelectionDAG &DAG) {
+template 
+static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) {
   SDValue Ptr = N->getBasePtr();
 
   // (((B + I*M) + c)) + c ...
@@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
   bool IsIndexSignExt = false;
 
   // pre-inc/pre-dec ops are components of EA.
-  if (N->getAddressingMode() == ISD::P

[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120387

>From acfcbcc08b856f6e55a7065f28df2691f940ad76 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:40:32 -0500
Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes.

Unaligned atomic vectors with size >1 are lowered to calls.
Adding their tests separately here.

commit-id:a06a5cc6
---
 llvm/test/CodeGen/X86/atomic-load-store.ll | 253 +
 1 file changed, 253 insertions(+)

diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 6efcbb80c0ce6..39e9fdfa5e62b 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_ptr:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_ptr:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
+
 define <1 x half> @atomic_vec1_half(ptr %x) {
 ; CHECK3-LABEL: atomic_vec1_half:
 ; CHECK3:   ## %bb.0:
@@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) 
nounwind {
   %ret = load atomic <1 x double>, ptr %x acquire, align 8
   ret <1 x double> %ret
 }
+
+define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_i64:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i64:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i64>, ptr %x acquire, align 4
+  ret <1 x i64> %ret
+}
+
+define <1 x double> @atomic_vec1_double(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_double:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_double:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 4
+  ret <1 x double> %ret
+}
+
+define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec2_i32:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i32:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i32>, ptr %x acquire, align 4
+  ret <2 x i32> %ret
+}
+
+define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_float_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[

[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385

>From e7805ff5855e1b5117c143e700e83ab7dd1557d6 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:37:17 -0500
Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load

`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  15 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +-
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 720393158aa5e..89ea7ef4dbe89 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -874,6 +874,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
   SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
+  SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N);
   SDValue ScalarizeVecRes_VSELECT(SDNode *N);
   SDValue ScalarizeVecRes_SELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index d0b69b88748a9..8eee7a4c61fe6 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, 
unsigned ResNo) {
 R = ScalarizeVecRes_UnaryOpWithExtraInput(N);
 break;
   case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+R = ScalarizeVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:   R = 
ScalarizeVecRes_LOAD(cast(N));break;
   case ISD::SCALAR_TO_VECTOR:  R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break;
   case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break;
@@ -458,6 +461,18 @@ SDValue 
DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) {
   return Op;
 }
 
+SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SDValue Result = DAG.getAtomicLoad(
+  ISD::NON_EXTLOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(),
+  N->getValueType(0).getVectorElementType(), N->getChain(), 
N->getBasePtr(),
+  N->getMemOperand());
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
+  return Result;
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) {
   assert(N->isUnindexed() && "Indexed vector load?");
 
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 5bce4401f7bdb..d23cfb89f9fc8 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s --check-prefixes=CHECK,CHECK0
 
 define void @test1(ptr %ptr, i32 %val1) {
 ; CHECK-LABEL: test1:
@@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) {
   %val = load atomic i32, ptr %ptr seq_cst, align 4
   ret i32 %val
 }
+
+define <1 x i32> @atomic_vec1_i32(ptr %x) {
+; CHECK-LABEL: atomic_vec1_i32:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x i32>, ptr %x acquire, align 4
+  ret <1 x i32> %ret
+}
+
+define <1 x i8> @atomic_vec1_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzbl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movb (%rdi), %al
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i8>, ptr %x acquire, align 1
+  ret <1 x i8> %ret
+}
+
+define <1 x i16> @atomic_vec1_i16(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i16>, ptr %x acquire, align 2
+  ret <1 x i16> %ret
+}
+
+define <1 x i32> @atomic_vec1_i8_zext(ptr %x) {
+; CHECK3-LABEL: atomic_ve

[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120386

>From fdc21070689116f6f220f29686b09c93314ad075 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:38:23 -0500
Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
---
 llvm/lib/Target/X86/X86ISelLowering.cpp|  4 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 3d9c76f3d05f5..4e59a3fb16369 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2651,6 +2651,10 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
 setOperationAction(Op, MVT::f32, Promote);
   }
 
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64);
+
   // We have target-specific dag combine patterns for the following nodes:
   setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::SCALAR_TO_VECTOR,
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index d23cfb89f9fc8..6efcbb80c0ce6 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   %ret = load atomic <1 x i64>, ptr %x acquire, align 8
   ret <1 x i64> %ret
 }
+
+define <1 x half> @atomic_vec1_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %cx
+; CHECK0-NEXT:## implicit-def: $eax
+; CHECK0-NEXT:movw %cx, %ax
+; CHECK0-NEXT:## implicit-def: $xmm0
+; CHECK0-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x half>, ptr %x acquire, align 2
+  ret <1 x half> %ret
+}
+
+define <1 x float> @atomic_vec1_float(ptr %x) {
+; CHECK-LABEL: atomic_vec1_float:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x float>, ptr %x acquire, align 4
+  ret <1 x float> %ret
+}
+
+define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec1_double_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 8
+  ret <1 x double> %ret
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385

>From e7805ff5855e1b5117c143e700e83ab7dd1557d6 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:37:17 -0500
Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load

`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  15 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +-
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 720393158aa5e..89ea7ef4dbe89 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -874,6 +874,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
   SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
+  SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N);
   SDValue ScalarizeVecRes_VSELECT(SDNode *N);
   SDValue ScalarizeVecRes_SELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index d0b69b88748a9..8eee7a4c61fe6 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, 
unsigned ResNo) {
 R = ScalarizeVecRes_UnaryOpWithExtraInput(N);
 break;
   case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+R = ScalarizeVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:   R = 
ScalarizeVecRes_LOAD(cast(N));break;
   case ISD::SCALAR_TO_VECTOR:  R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break;
   case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break;
@@ -458,6 +461,18 @@ SDValue 
DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) {
   return Op;
 }
 
+SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SDValue Result = DAG.getAtomicLoad(
+  ISD::NON_EXTLOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(),
+  N->getValueType(0).getVectorElementType(), N->getChain(), 
N->getBasePtr(),
+  N->getMemOperand());
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
+  return Result;
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) {
   assert(N->isUnindexed() && "Indexed vector load?");
 
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 5bce4401f7bdb..d23cfb89f9fc8 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s --check-prefixes=CHECK,CHECK0
 
 define void @test1(ptr %ptr, i32 %val1) {
 ; CHECK-LABEL: test1:
@@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) {
   %val = load atomic i32, ptr %ptr seq_cst, align 4
   ret i32 %val
 }
+
+define <1 x i32> @atomic_vec1_i32(ptr %x) {
+; CHECK-LABEL: atomic_vec1_i32:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x i32>, ptr %x acquire, align 4
+  ret <1 x i32> %ret
+}
+
+define <1 x i8> @atomic_vec1_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzbl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movb (%rdi), %al
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i8>, ptr %x acquire, align 1
+  ret <1 x i8> %ret
+}
+
+define <1 x i16> @atomic_vec1_i16(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i16>, ptr %x acquire, align 2
+  ret <1 x i16> %ret
+}
+
+define <1 x i32> @atomic_vec1_i8_zext(ptr %x) {
+; CHECK3-LABEL: atomic_ve

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From 681d443e5e9a069a73eaa0ce50684b41b95c48fd Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +-
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 30 +
 .../X86/expand-atomic-non-integer.ll  | 65 +++
 4 files changed, 163 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index c376de877ac7d..b6f1e9db9ce35 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's scalar type to I's  vector type
+  if (I->getType()->getScalarType()->isPointerTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+unsigned AS =
+
cast(I->getType()->getScalarType())->getAddressSpace();
+ElementCount EC = cast(I->getType())->getElementCount();
+Value *BC = Builder.CreateBitCast(
+Result,
+VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)),
+EC));
+Value *IntToPtr = Builder.CreateIntToPtr(
+BC, VectorType::get(PointerType::get(Ctx, AS), EC));
+V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29..36c1305a7c5df 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 08d0405345f57..4293df8c13571 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:popq %rax
+; CHECK-NEXT:retq
+  %ret = load atomic <2 x ptr>, ptr %x acquire, align 16
+  ret <2 x ptr> %ret
+}
+
 define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
 ; CHECK3-LAB

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From 218ce15319d641c89ce8a5ea7e770fd0c2e5223d Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 104 ++
 llvm/lib/Target/X86/X86InstrCompiler.td   |   7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll|  81 ++
 llvm/test/CodeGen/X86/atomic-unordered.ll |   3 +-
 5 files changed, 173 insertions(+), 23 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 89ea7ef4dbe89..bdfa5f7741ad3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 8eee7a4c61fe6..0edbbe8af623a 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4625,6 +4625,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -6014,6 +6017,85 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+/// Either return the same load or provide appropriate casts
+/// from the load and return that.
+static SDValue loadElement(SDValue LdOp, EVT FirstVT, EVT WidenVT,
+   TypeSize LdWidth, TypeSize FirstVTWidth, SDLoc dl,
+   SelectionDAG &DAG) {
+  assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth));
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  if (!FirstVT.isVector()) {
+unsigned NumElts =
+WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue();
+EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts);
+SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp);
+return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp);
+  } else if (FirstVT == WidenVT)
+return LdOp;
+  else {
+// TODO: We don't currently have any tests that exercise this code path.
+assert(WidenWidth.getFixedValue() % FirstVTWidth.getFixedValue() == 0);
+unsigned NumConcat =
+WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue();
+SmallVector ConcatOps(NumConcat);
+SDValue UndefVal = DAG.getUNDEF(FirstVT);
+ConcatOps[0] = LdOp;
+for (unsigned i = 1; i != NumConcat; ++i)
+  ConcatOps[i] = UndefVal;
+return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, ConcatOps);
+  }
+}
+
+static std::optional findMemType(SelectionDAG &DAG,
+  const TargetLowering &TLI, unsigned 
Width,
+  EVT WidenVT, unsigned Align,
+  unsigned WidenEx);
+
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  EVT WidenVT =
+  TLI.getTypeToTransformTo(*DAG.getContext(), LD->getValueType(0));
+  EVT LdVT = LD->getMemoryVT();
+  SDLoc dl(LD);
+  assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors");
+  assert(LdVT.isScalableVector() == WidenVT.isScalableVector() &&
+ "Must be scalable");
+  assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() &&
+ "Expected equivalent element types");
+
+  // Load information
+  SDValue Chain = LD->getChain();
+  SDValue BasePtr = LD->getBasePtr();
+  MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags();
+  AAMDNodes AAInfo = LD->getAAInfo();
+
+  TypeSize LdWidth = LdVT.getSizeInBits();
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  TypeSize WidthDiff = WidenWidth - LdWidth;
+
+  // Find the vector type that can load from.
+  std::optional FirstVT =
+

[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From db5b862de7faed37cb9c40c170d4cd1e9612b489 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG] Split vector types for atomic load

Vector types that aren't widened are split
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  37 
 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++
 3 files changed, 209 insertions(+)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index bdfa5f7741ad3..d8f402f529632 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 0edbbe8af623a..895d3c51e0e1e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N), Lo, Hi);
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo,
+   SDValue &Hi) {
+  EVT LoVT, HiVT;
+  SDLoc dl(LD);
+  std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0));
+
+  ISD::LoadExtType ExtType = LD->getExtensionType();
+  SDValue Ch = LD->getChain();
+  SDValue Ptr = LD->getBasePtr();
+  EVT MemoryVT = LD->getMemoryVT();
+
+  EVT LoMemVT, HiMemVT;
+  std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT);
+
+  EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), 
LD->getValueType(0).getSizeInBits());
+  EVT MemIntVT = EVT::getIntegerVT(*DAG.getContext(), 2 * 
LoMemVT.getSizeInBits());
+  SDValue ALD = DAG.getAtomicLoad(ExtType, dl, MemIntVT, IntVT, Ch, Ptr,
+  LD->getMemOperand());
+
+  EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits());
+  EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits());
+  SDValue ExtractLo = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, LoIntVT, ALD,
+  DAG.getIntPtrConstant(0, dl));
+  SDValue ExtractHi = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiIntVT, ALD,
+  DAG.getIntPtrConstant(1, dl));
+
+  Lo = DAG.getBitcast(LoVT, ExtractLo);
+  Hi = DAG.getBitcast(HiVT, ExtractHi);
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(LD, 1), ALD.getValue(1));
+}
+
 void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT,
 MachinePointerInfo &MPI, SDValue &Ptr,
 uint64_t *ScaledOffset) {
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 935d058a52f8f..42b0955824293 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -204,6 +204,68 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) {
   ret <2 x float> %ret
 }
 
+define <2 x half> @atomic_vec2_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec2_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:shrl $16, %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm1
+; CHECK3-NEXT:punpcklwd {{.*#+}} xmm0 = 
xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movl (%rdi), %eax

[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From db5b862de7faed37cb9c40c170d4cd1e9612b489 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG] Split vector types for atomic load

Vector types that aren't widened are split
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  37 
 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++
 3 files changed, 209 insertions(+)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index bdfa5f7741ad3..d8f402f529632 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 0edbbe8af623a..895d3c51e0e1e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N), Lo, Hi);
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo,
+   SDValue &Hi) {
+  EVT LoVT, HiVT;
+  SDLoc dl(LD);
+  std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0));
+
+  ISD::LoadExtType ExtType = LD->getExtensionType();
+  SDValue Ch = LD->getChain();
+  SDValue Ptr = LD->getBasePtr();
+  EVT MemoryVT = LD->getMemoryVT();
+
+  EVT LoMemVT, HiMemVT;
+  std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT);
+
+  EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), 
LD->getValueType(0).getSizeInBits());
+  EVT MemIntVT = EVT::getIntegerVT(*DAG.getContext(), 2 * 
LoMemVT.getSizeInBits());
+  SDValue ALD = DAG.getAtomicLoad(ExtType, dl, MemIntVT, IntVT, Ch, Ptr,
+  LD->getMemOperand());
+
+  EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits());
+  EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits());
+  SDValue ExtractLo = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, LoIntVT, ALD,
+  DAG.getIntPtrConstant(0, dl));
+  SDValue ExtractHi = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiIntVT, ALD,
+  DAG.getIntPtrConstant(1, dl));
+
+  Lo = DAG.getBitcast(LoVT, ExtractLo);
+  Hi = DAG.getBitcast(HiVT, ExtractHi);
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(LD, 1), ALD.getValue(1));
+}
+
 void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT,
 MachinePointerInfo &MPI, SDValue &Ptr,
 uint64_t *ScaledOffset) {
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 935d058a52f8f..42b0955824293 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -204,6 +204,68 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) {
   ret <2 x float> %ret
 }
 
+define <2 x half> @atomic_vec2_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec2_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:shrl $16, %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm1
+; CHECK3-NEXT:punpcklwd {{.*#+}} xmm0 = 
xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movl (%rdi), %eax

[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120387

>From acfcbcc08b856f6e55a7065f28df2691f940ad76 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:40:32 -0500
Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes.

Unaligned atomic vectors with size >1 are lowered to calls.
Adding their tests separately here.

commit-id:a06a5cc6
---
 llvm/test/CodeGen/X86/atomic-load-store.ll | 253 +
 1 file changed, 253 insertions(+)

diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 6efcbb80c0ce6..39e9fdfa5e62b 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_ptr:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_ptr:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
+
 define <1 x half> @atomic_vec1_half(ptr %x) {
 ; CHECK3-LABEL: atomic_vec1_half:
 ; CHECK3:   ## %bb.0:
@@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) 
nounwind {
   %ret = load atomic <1 x double>, ptr %x acquire, align 8
   ret <1 x double> %ret
 }
+
+define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_i64:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i64:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i64>, ptr %x acquire, align 4
+  ret <1 x i64> %ret
+}
+
+define <1 x double> @atomic_vec1_double(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_double:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_double:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 4
+  ret <1 x double> %ret
+}
+
+define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec2_i32:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i32:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i32>, ptr %x acquire, align 4
+  ret <2 x i32> %ret
+}
+
+define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_float_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[

[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385

>From e7805ff5855e1b5117c143e700e83ab7dd1557d6 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:37:17 -0500
Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load

`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  15 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +-
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 720393158aa5e..89ea7ef4dbe89 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -874,6 +874,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
   SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
+  SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N);
   SDValue ScalarizeVecRes_VSELECT(SDNode *N);
   SDValue ScalarizeVecRes_SELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index d0b69b88748a9..8eee7a4c61fe6 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, 
unsigned ResNo) {
 R = ScalarizeVecRes_UnaryOpWithExtraInput(N);
 break;
   case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+R = ScalarizeVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:   R = 
ScalarizeVecRes_LOAD(cast(N));break;
   case ISD::SCALAR_TO_VECTOR:  R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break;
   case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break;
@@ -458,6 +461,18 @@ SDValue 
DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) {
   return Op;
 }
 
+SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SDValue Result = DAG.getAtomicLoad(
+  ISD::NON_EXTLOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(),
+  N->getValueType(0).getVectorElementType(), N->getChain(), 
N->getBasePtr(),
+  N->getMemOperand());
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
+  return Result;
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) {
   assert(N->isUnindexed() && "Indexed vector load?");
 
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 5bce4401f7bdb..d23cfb89f9fc8 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s --check-prefixes=CHECK,CHECK0
 
 define void @test1(ptr %ptr, i32 %val1) {
 ; CHECK-LABEL: test1:
@@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) {
   %val = load atomic i32, ptr %ptr seq_cst, align 4
   ret i32 %val
 }
+
+define <1 x i32> @atomic_vec1_i32(ptr %x) {
+; CHECK-LABEL: atomic_vec1_i32:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x i32>, ptr %x acquire, align 4
+  ret <1 x i32> %ret
+}
+
+define <1 x i8> @atomic_vec1_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzbl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movb (%rdi), %al
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i8>, ptr %x acquire, align 1
+  ret <1 x i8> %ret
+}
+
+define <1 x i16> @atomic_vec1_i16(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i16>, ptr %x acquire, align 2
+  ret <1 x i16> %ret
+}
+
+define <1 x i32> @atomic_vec1_i8_zext(ptr %x) {
+; CHECK3-LABEL: atomic_ve

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/125432

>From fc2debee17c4ded2edbe2f1803f3184cea78bfdc Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 31 Jan 2025 13:12:56 -0500
Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic
 vector.

After splitting, all elements are created. The two components must
be found by looking at the upper and lower half of EXTRACT_ELEMENT.
This change extends EltsFromConsecutiveLoads
to understand AtomicSDNode so that unused elements can be removed.

commit-id:b83937a8
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |   4 +-
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |  20 ++-
 .../SelectionDAGAddressAnalysis.cpp   |  30 ++--
 .../SelectionDAG/SelectionDAGBuilder.cpp  |   6 +-
 llvm/lib/Target/X86/X86ISelLowering.cpp   |  43 +++--
 llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++
 6 files changed, 83 insertions(+), 187 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba11ddbb5b731..d3cd81c146280 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1843,7 +1843,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2281,7 +2281,7 @@ class SelectionDAG {
   /// merged. Check that both are nonvolatile and if LD is loading
   /// 'Bytes' bytes from a location that is 'Dist' units away from the
   /// location that the 'Base' load is loading from.
-  bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
+  bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base,
   unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 2a68903c34cef..8e77a542ab029 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -12218,7 +12218,7 @@ SDValue 
SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain,
   return TokenFactor;
 }
 
-SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad,
+SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad,
SDValue NewMemOp) {
   assert(isa(NewMemOp.getNode()) && "Expected a memop node");
   SDValue OldChain = SDValue(OldLoad, 1);
@@ -12911,17 +12911,21 @@ std::pair 
SelectionDAG::UnrollVectorOverflowOp(
 getBuildVector(NewOvVT, dl, OvScalars));
 }
 
-bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD,
-  LoadSDNode *Base,
+bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD,
+  MemSDNode *Base,
   unsigned Bytes,
   int Dist) const {
   if (LD->isVolatile() || Base->isVolatile())
 return false;
-  // TODO: probably too restrictive for atomics, revisit
-  if (!LD->isSimple())
-return false;
-  if (LD->isIndexed() || Base->isIndexed())
-return false;
+  if (auto Ld = dyn_cast(LD)) {
+if (!Ld->isSimple())
+  return false;
+if (Ld->isIndexed())
+  return false;
+  }
+  if (auto Ld = dyn_cast(Base))
+if (Ld->isIndexed())
+  return false;
   if (LD->getChain() != Base->getChain())
 return false;
   EVT VT = LD->getMemoryVT();
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
index f2ab88851b780..c29cb424c7a4c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
@@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, 
int64_t BitSize,
 }
 
 /// Parses tree in Ptr for base, index, offset addresses.
-static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
-   const SelectionDAG &DAG) {
+template 
+static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) {
   SDValue Ptr = N->getBasePtr();
 
   // (((B + I*M) + c)) + c ...
@@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
   bool IsIndexSignExt = false;
 
   // pre-inc/pre-dec ops are components of EA.
-  if (N->getAddressingMode() == ISD::P

[llvm-branch-commits] [llvm] release/20.x: [OpenMP] Add pre sm_70 load hack back in (#138589) (PR #138626)

2025-05-05 Thread Ye Luo via llvm-branch-commits


ye-luo wrote:

I manually verified the effectiveness of this patch on 20.x release branch.

https://github.com/llvm/llvm-project/pull/138626
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From bdf9a55d6beecbee24114d60e922ae2c360fc8b3 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 16 -
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 30 +
 .../X86/expand-atomic-non-integer.ll  | 65 +++
 4 files changed, 159 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index c376de877ac7d..2a266bd773a72 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2066,9 +2066,19 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's scalar type to I's  vector type
+  auto PtrTy = dyn_cast(I->getType()->getScalarType());
+  auto VTy = dyn_cast(I->getType());
+  if (VTy && PtrTy && !Result->getType()->isVectorTy()) {
+unsigned AS = PtrTy->getAddressSpace();
+Value *BC = Builder.CreateBitCast(
+Result, VTy->getWithNewType(DL.getIntPtrType(Ctx, AS)));
+V = Builder.CreateIntToPtr(
+BC, VTy->getWithNewType(PointerType::get(Ctx, AS)));
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29..36c1305a7c5df 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 08d0405345f57..4293df8c13571 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:popq %rax
+; CHECK-NEXT:retq
+  %ret = load atomic <2 x ptr>, ptr %x acquire, align 16
+  ret <2 x ptr> %ret
+}
+
 define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
 ; CHECK3-LABEL: atomic_vec4_i8:
 ; CHECK3:   ## %bb.0:
@@ -394,6 +409,21 @@ define <4 x i16> @atomic_vec4_i16(ptr %x) nounwind {
   ret <4 x i16> %ret
 }
 
+define <4 x ptr addrspace(270)> @atomic_vec4_ptr270(ptr %x) nounwind {
+; CHECK-LA

[llvm-branch-commits] [llvm] release/20.x: [OpenMP] Add pre sm_70 load hack back in (#138589) (PR #138626)

2025-05-05 Thread Shilei Tian via llvm-branch-commits


https://github.com/shiltian approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/138626
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120387



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120386

>From 5005b94e624695e209a41c2f82f438b9bf1b1bb8 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:38:23 -0500
Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
---
 llvm/lib/Target/X86/X86ISelLowering.cpp|  4 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 3d9c76f3d05f5..4e59a3fb16369 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2651,6 +2651,10 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
 setOperationAction(Op, MVT::f32, Promote);
   }
 
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64);
+
   // We have target-specific dag combine patterns for the following nodes:
   setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::SCALAR_TO_VECTOR,
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index d23cfb89f9fc8..6efcbb80c0ce6 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   %ret = load atomic <1 x i64>, ptr %x acquire, align 8
   ret <1 x i64> %ret
 }
+
+define <1 x half> @atomic_vec1_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %cx
+; CHECK0-NEXT:## implicit-def: $eax
+; CHECK0-NEXT:movw %cx, %ax
+; CHECK0-NEXT:## implicit-def: $xmm0
+; CHECK0-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x half>, ptr %x acquire, align 2
+  ret <1 x half> %ret
+}
+
+define <1 x float> @atomic_vec1_float(ptr %x) {
+; CHECK-LABEL: atomic_vec1_float:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x float>, ptr %x acquire, align 4
+  ret <1 x float> %ret
+}
+
+define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec1_double_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 8
+  ret <1 x double> %ret
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/125432

>From 18fe5c781d1b128edcfef06a71152846fd7d2bec Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 31 Jan 2025 13:12:56 -0500
Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic
 vector.

After splitting, all elements are created. The two components must
be found by looking at the upper and lower half of EXTRACT_ELEMENT.
This change extends EltsFromConsecutiveLoads
to understand AtomicSDNode so that unused elements can be removed.

commit-id:b83937a8
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |   4 +-
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |  20 ++-
 .../SelectionDAGAddressAnalysis.cpp   |  30 ++--
 .../SelectionDAG/SelectionDAGBuilder.cpp  |   6 +-
 llvm/lib/Target/X86/X86ISelLowering.cpp   |  43 +++--
 llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++
 6 files changed, 83 insertions(+), 187 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba11ddbb5b731..d3cd81c146280 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1843,7 +1843,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2281,7 +2281,7 @@ class SelectionDAG {
   /// merged. Check that both are nonvolatile and if LD is loading
   /// 'Bytes' bytes from a location that is 'Dist' units away from the
   /// location that the 'Base' load is loading from.
-  bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
+  bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base,
   unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 2a68903c34cef..8e77a542ab029 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -12218,7 +12218,7 @@ SDValue 
SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain,
   return TokenFactor;
 }
 
-SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad,
+SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad,
SDValue NewMemOp) {
   assert(isa(NewMemOp.getNode()) && "Expected a memop node");
   SDValue OldChain = SDValue(OldLoad, 1);
@@ -12911,17 +12911,21 @@ std::pair 
SelectionDAG::UnrollVectorOverflowOp(
 getBuildVector(NewOvVT, dl, OvScalars));
 }
 
-bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD,
-  LoadSDNode *Base,
+bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD,
+  MemSDNode *Base,
   unsigned Bytes,
   int Dist) const {
   if (LD->isVolatile() || Base->isVolatile())
 return false;
-  // TODO: probably too restrictive for atomics, revisit
-  if (!LD->isSimple())
-return false;
-  if (LD->isIndexed() || Base->isIndexed())
-return false;
+  if (auto Ld = dyn_cast(LD)) {
+if (!Ld->isSimple())
+  return false;
+if (Ld->isIndexed())
+  return false;
+  }
+  if (auto Ld = dyn_cast(Base))
+if (Ld->isIndexed())
+  return false;
   if (LD->getChain() != Base->getChain())
 return false;
   EVT VT = LD->getMemoryVT();
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
index f2ab88851b780..c29cb424c7a4c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
@@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, 
int64_t BitSize,
 }
 
 /// Parses tree in Ptr for base, index, offset addresses.
-static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
-   const SelectionDAG &DAG) {
+template 
+static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) {
   SDValue Ptr = N->getBasePtr();
 
   // (((B + I*M) + c)) + c ...
@@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
   bool IsIndexSignExt = false;
 
   // pre-inc/pre-dec ops are components of EA.
-  if (N->getAddressingMode() == ISD::P

[llvm-branch-commits] [llvm] [X86] Remove extra MOV after widening atomic load (PR #138635)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn created 
https://github.com/llvm/llvm-project/pull/138635

This change adds patterns to optimize out an extra MOV
present after widening the atomic load.

---

**Stack**:
- #120716
- #125432
- #120640
- #138635 ⬅
- #120598
- #120387
- #120386
- #120385
- #120384


⚠️ *Part of a stack created by [spr](https://github.com/ejoffe/spr). Do not 
merge manually using the UI - doing so may have unexpected results.*

>From 438373241048d37fc2ee11419ceae9b53821fcaf Mon Sep 17 00:00:00 2001
From: jofernau_amdeng 
Date: Tue, 6 May 2025 01:48:11 -0400
Subject: [PATCH] [X86] Remove extra MOV after widening atomic load

This change adds patterns to optimize out an extra MOV
present after widening the atomic load.

commit-id:45989503
---
 llvm/lib/Target/X86/X86InstrCompiler.td|  7 
 llvm/test/CodeGen/X86/atomic-load-store.ll | 43 --
 2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/llvm/lib/Target/X86/X86InstrCompiler.td 
b/llvm/lib/Target/X86/X86InstrCompiler.td
index 167e27eddd71e..8ad8a0a6194d6 100644
--- a/llvm/lib/Target/X86/X86InstrCompiler.td
+++ b/llvm/lib/Target/X86/X86InstrCompiler.td
@@ -1200,6 +1200,13 @@ def : Pat<(i16 (atomic_load_nonext_16 addr:$src)), 
(MOV16rm addr:$src)>;
 def : Pat<(i32 (atomic_load_nonext_32 addr:$src)), (MOV32rm addr:$src)>;
 def : Pat<(i64 (atomic_load_nonext_64 addr:$src)), (MOV64rm addr:$src)>;
 
+def : Pat<(v4i32 (scalar_to_vector (i32 (anyext (i16 (atomic_load_16 
addr:$src)),
+   (MOVDI2PDIrm addr:$src)>;   // load atomic <2 x i8>
+def : Pat<(v4i32 (scalar_to_vector (i32 (atomic_load_32 addr:$src,
+   (MOVDI2PDIrm addr:$src)>;   // load atomic <2 x i16>
+def : Pat<(v2i64 (scalar_to_vector (i64 (atomic_load_64 addr:$src,
+   (MOV64toPQIrm  addr:$src)>; // load atomic <2 x i32,float>
+
 // Floating point loads/stores.
 def : Pat<(atomic_store_32 (i32 (bitconvert (f32 FR32:$src))), addr:$dst),
   (MOVSSmr addr:$dst, FR32:$src)>, Requires<[UseSSE1]>;
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 9ee8b4fc5ac7f..935d058a52f8f 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -149,8 +149,7 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
 define <2 x i8> @atomic_vec2_i8(ptr %x) {
 ; CHECK3-LABEL: atomic_vec2_i8:
 ; CHECK3:   ## %bb.0:
-; CHECK3-NEXT:movzwl (%rdi), %eax
-; CHECK3-NEXT:movd %eax, %xmm0
+; CHECK3-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; CHECK3-NEXT:retq
 ;
 ; CHECK0-LABEL: atomic_vec2_i8:
@@ -165,11 +164,15 @@ define <2 x i8> @atomic_vec2_i8(ptr %x) {
 }
 
 define <2 x i16> @atomic_vec2_i16(ptr %x) {
-; CHECK-LABEL: atomic_vec2_i16:
-; CHECK:   ## %bb.0:
-; CHECK-NEXT:movl (%rdi), %eax
-; CHECK-NEXT:movd %eax, %xmm0
-; CHECK-NEXT:retq
+; CHECK3-LABEL: atomic_vec2_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK0-NEXT:retq
   %ret = load atomic <2 x i16>, ptr %x acquire, align 4
   ret <2 x i16> %ret
 }
@@ -177,8 +180,7 @@ define <2 x i16> @atomic_vec2_i16(ptr %x) {
 define <2 x ptr addrspace(270)> @atomic_vec2_ptr270(ptr %x) {
 ; CHECK-LABEL: atomic_vec2_ptr270:
 ; CHECK:   ## %bb.0:
-; CHECK-NEXT:movq (%rdi), %rax
-; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:movq (%rdi), %xmm0
 ; CHECK-NEXT:retq
   %ret = load atomic <2 x ptr addrspace(270)>, ptr %x acquire, align 8
   ret <2 x ptr addrspace(270)> %ret
@@ -187,8 +189,7 @@ define <2 x ptr addrspace(270)> @atomic_vec2_ptr270(ptr %x) 
{
 define <2 x i32> @atomic_vec2_i32_align(ptr %x) {
 ; CHECK-LABEL: atomic_vec2_i32_align:
 ; CHECK:   ## %bb.0:
-; CHECK-NEXT:movq (%rdi), %rax
-; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:movq (%rdi), %xmm0
 ; CHECK-NEXT:retq
   %ret = load atomic <2 x i32>, ptr %x acquire, align 8
   ret <2 x i32> %ret
@@ -197,8 +198,7 @@ define <2 x i32> @atomic_vec2_i32_align(ptr %x) {
 define <2 x float> @atomic_vec2_float_align(ptr %x) {
 ; CHECK-LABEL: atomic_vec2_float_align:
 ; CHECK:   ## %bb.0:
-; CHECK-NEXT:movq (%rdi), %rax
-; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:movq (%rdi), %xmm0
 ; CHECK-NEXT:retq
   %ret = load atomic <2 x float>, ptr %x acquire, align 8
   ret <2 x float> %ret
@@ -354,11 +354,15 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
 }
 
 define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
-; CHECK-LABEL: atomic_vec4_i8:
-; CHECK:   ## %bb.0:
-; CHECK-NEXT:movl (%rdi), %eax
-; CHECK-NEXT:movd %eax, %xmm0
-; CHECK-NEXT:retq
+; CHECK3-LABEL: atomic_vec4_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec4_i8

[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn edited https://github.com/llvm/llvm-project/pull/120385
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn edited https://github.com/llvm/llvm-project/pull/120386
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn edited https://github.com/llvm/llvm-project/pull/120640
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [SelectionDAG] Widen <2 x T> vector types for atomic load (PR #120598)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn edited https://github.com/llvm/llvm-project/pull/120598
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-05-05 Thread via llvm-branch-commits


https://github.com/jofrn edited https://github.com/llvm/llvm-project/pull/120387
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [X86] Remove extra MOV after widening atomic load (PR #138635)

2025-05-05 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-x86

Author: None (jofrn)


Changes

This change adds patterns to optimize out an extra MOV
present after widening the atomic load.

---

**Stack**:
- #120716
- #125432
- #120640
- #138635 ⬅
- #120598
- #120387
- #120386
- #120385
- #120384


⚠️ *Part of a stack created by [spr](https://github.com/ejoffe/spr). Do not 
merge manually using the UI - doing so may have unexpected results.*

---
Full diff: https://github.com/llvm/llvm-project/pull/138635.diff


2 Files Affected:

- (modified) llvm/lib/Target/X86/X86InstrCompiler.td (+7) 
- (modified) llvm/test/CodeGen/X86/atomic-load-store.ll (+23-20) 


``diff
diff --git a/llvm/lib/Target/X86/X86InstrCompiler.td 
b/llvm/lib/Target/X86/X86InstrCompiler.td
index 167e27eddd71e..8ad8a0a6194d6 100644
--- a/llvm/lib/Target/X86/X86InstrCompiler.td
+++ b/llvm/lib/Target/X86/X86InstrCompiler.td
@@ -1200,6 +1200,13 @@ def : Pat<(i16 (atomic_load_nonext_16 addr:$src)), 
(MOV16rm addr:$src)>;
 def : Pat<(i32 (atomic_load_nonext_32 addr:$src)), (MOV32rm addr:$src)>;
 def : Pat<(i64 (atomic_load_nonext_64 addr:$src)), (MOV64rm addr:$src)>;
 
+def : Pat<(v4i32 (scalar_to_vector (i32 (anyext (i16 (atomic_load_16 
addr:$src)),
+   (MOVDI2PDIrm addr:$src)>;   // load atomic <2 x i8>
+def : Pat<(v4i32 (scalar_to_vector (i32 (atomic_load_32 addr:$src,
+   (MOVDI2PDIrm addr:$src)>;   // load atomic <2 x i16>
+def : Pat<(v2i64 (scalar_to_vector (i64 (atomic_load_64 addr:$src,
+   (MOV64toPQIrm  addr:$src)>; // load atomic <2 x i32,float>
+
 // Floating point loads/stores.
 def : Pat<(atomic_store_32 (i32 (bitconvert (f32 FR32:$src))), addr:$dst),
   (MOVSSmr addr:$dst, FR32:$src)>, Requires<[UseSSE1]>;
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 9ee8b4fc5ac7f..935d058a52f8f 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -149,8 +149,7 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
 define <2 x i8> @atomic_vec2_i8(ptr %x) {
 ; CHECK3-LABEL: atomic_vec2_i8:
 ; CHECK3:   ## %bb.0:
-; CHECK3-NEXT:movzwl (%rdi), %eax
-; CHECK3-NEXT:movd %eax, %xmm0
+; CHECK3-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; CHECK3-NEXT:retq
 ;
 ; CHECK0-LABEL: atomic_vec2_i8:
@@ -165,11 +164,15 @@ define <2 x i8> @atomic_vec2_i8(ptr %x) {
 }
 
 define <2 x i16> @atomic_vec2_i16(ptr %x) {
-; CHECK-LABEL: atomic_vec2_i16:
-; CHECK:   ## %bb.0:
-; CHECK-NEXT:movl (%rdi), %eax
-; CHECK-NEXT:movd %eax, %xmm0
-; CHECK-NEXT:retq
+; CHECK3-LABEL: atomic_vec2_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK0-NEXT:retq
   %ret = load atomic <2 x i16>, ptr %x acquire, align 4
   ret <2 x i16> %ret
 }
@@ -177,8 +180,7 @@ define <2 x i16> @atomic_vec2_i16(ptr %x) {
 define <2 x ptr addrspace(270)> @atomic_vec2_ptr270(ptr %x) {
 ; CHECK-LABEL: atomic_vec2_ptr270:
 ; CHECK:   ## %bb.0:
-; CHECK-NEXT:movq (%rdi), %rax
-; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:movq (%rdi), %xmm0
 ; CHECK-NEXT:retq
   %ret = load atomic <2 x ptr addrspace(270)>, ptr %x acquire, align 8
   ret <2 x ptr addrspace(270)> %ret
@@ -187,8 +189,7 @@ define <2 x ptr addrspace(270)> @atomic_vec2_ptr270(ptr %x) 
{
 define <2 x i32> @atomic_vec2_i32_align(ptr %x) {
 ; CHECK-LABEL: atomic_vec2_i32_align:
 ; CHECK:   ## %bb.0:
-; CHECK-NEXT:movq (%rdi), %rax
-; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:movq (%rdi), %xmm0
 ; CHECK-NEXT:retq
   %ret = load atomic <2 x i32>, ptr %x acquire, align 8
   ret <2 x i32> %ret
@@ -197,8 +198,7 @@ define <2 x i32> @atomic_vec2_i32_align(ptr %x) {
 define <2 x float> @atomic_vec2_float_align(ptr %x) {
 ; CHECK-LABEL: atomic_vec2_float_align:
 ; CHECK:   ## %bb.0:
-; CHECK-NEXT:movq (%rdi), %rax
-; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:movq (%rdi), %xmm0
 ; CHECK-NEXT:retq
   %ret = load atomic <2 x float>, ptr %x acquire, align 8
   ret <2 x float> %ret
@@ -354,11 +354,15 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
 }
 
 define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
-; CHECK-LABEL: atomic_vec4_i8:
-; CHECK:   ## %bb.0:
-; CHECK-NEXT:movl (%rdi), %eax
-; CHECK-NEXT:movd %eax, %xmm0
-; CHECK-NEXT:retq
+; CHECK3-LABEL: atomic_vec4_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec4_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK0-NEXT:retq
   %ret = load atomic <4 x i8>, ptr %x acquire, align 4
   ret <4 x i8> %ret
 }
@@ -366,8 +370,7 @@ define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {

[llvm-branch-commits] [flang] [flang][fir] Basic PFT to MLIR lowering for do concurrent locality specifiers (PR #138534)

2025-05-05 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy edited 
https://github.com/llvm/llvm-project/pull/138534
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [flang][fir] Add `fir.local` op for locality specifiers (PR #138505)

2025-05-05 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy edited 
https://github.com/llvm/llvm-project/pull/138505
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Improve StructurizeCFG pass performance by using SSAUpdaterBulk. (PR #135181)

2025-05-05 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/135181
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [mlir] Add missing imports (PR #138550)

2025-05-05 Thread Filip Laurentiu via llvm-branch-commits


https://github.com/FilipLaurentiu edited 
https://github.com/llvm/llvm-project/pull/138550
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [mlir] Add missing imports (PR #138550)

2025-05-05 Thread via llvm-branch-commits


github-actions[bot] wrote:



Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this 
page.

If this is not working for you, it is probably because you do not have write 
permissions for the repository. In which case you can instead tag reviewers by 
name in a comment by using `@` followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a 
review by "ping"ing the PR by adding a comment “Ping”. The common courtesy 
"ping" rate is once a week. Please remember that you are asking for valuable 
time from other developers.

If you have further questions, they may be answered by the [LLVM GitHub User 
Guide](https://llvm.org/docs/GitHub.html).

You can also ask questions in a comment on this PR, on the [LLVM 
Discord](https://discord.com/invite/xS7Z362) or on the 
[forums](https://discourse.llvm.org/).

https://github.com/llvm/llvm-project/pull/138550
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [mlir] Add missing imports (PR #138550)

2025-05-05 Thread Filip Laurentiu via llvm-branch-commits


https://github.com/FilipLaurentiu created 
https://github.com/llvm/llvm-project/pull/138550

Build fail without these improts

>From 7a5195fd542f71142e4524f4d4720305bb14c2bb Mon Sep 17 00:00:00 2001
From: Filip Laurentiu 
Date: Mon, 5 May 2025 18:56:30 +0300
Subject: [PATCH] Add missing imports

---
 mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h | 1 +
 mlir/include/mlir/Target/SPIRV/Deserialization.h | 1 +
 mlir/include/mlir/Target/SPIRV/Serialization.h   | 1 +
 3 files changed, 3 insertions(+)

diff --git a/mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h 
b/mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h
index 451c466fa0c95..642e99d963ef6 100644
--- a/mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h
+++ b/mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h
@@ -10,6 +10,7 @@
 #define MLIR_DIALECT_AFFINE_IR_VALUEBOUNDSOPINTERFACEIMPL_H
 
 #include "mlir/Support/LLVM.h"
+#include 
 
 namespace mlir {
 class DialectRegistry;
diff --git a/mlir/include/mlir/Target/SPIRV/Deserialization.h 
b/mlir/include/mlir/Target/SPIRV/Deserialization.h
index e39258beeaac8..a346a7fd1e5f7 100644
--- a/mlir/include/mlir/Target/SPIRV/Deserialization.h
+++ b/mlir/include/mlir/Target/SPIRV/Deserialization.h
@@ -15,6 +15,7 @@
 
 #include "mlir/IR/OwningOpRef.h"
 #include "mlir/Support/LLVM.h"
+#include 
 
 namespace mlir {
 class MLIRContext;
diff --git a/mlir/include/mlir/Target/SPIRV/Serialization.h 
b/mlir/include/mlir/Target/SPIRV/Serialization.h
index 613f0a423f9f8..225777e25d607 100644
--- a/mlir/include/mlir/Target/SPIRV/Serialization.h
+++ b/mlir/include/mlir/Target/SPIRV/Serialization.h
@@ -14,6 +14,7 @@
 #define MLIR_TARGET_SPIRV_SERIALIZATION_H
 
 #include "mlir/Support/LLVM.h"
+#include 
 
 namespace mlir {
 class MLIRContext;

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [mlir] Add missing imports (PR #138550)

2025-05-05 Thread via llvm-branch-commits


llvmbot wrote:



@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-affine

Author: Filip Laurentiu (FilipLaurentiu)


Changes

Build fail without these imports

---
Full diff: https://github.com/llvm/llvm-project/pull/138550.diff


3 Files Affected:

- (modified) mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h 
(+1) 
- (modified) mlir/include/mlir/Target/SPIRV/Deserialization.h (+1) 
- (modified) mlir/include/mlir/Target/SPIRV/Serialization.h (+1) 


``diff
diff --git a/mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h 
b/mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h
index 451c466fa0c95..642e99d963ef6 100644
--- a/mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h
+++ b/mlir/include/mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h
@@ -10,6 +10,7 @@
 #define MLIR_DIALECT_AFFINE_IR_VALUEBOUNDSOPINTERFACEIMPL_H
 
 #include "mlir/Support/LLVM.h"
+#include 
 
 namespace mlir {
 class DialectRegistry;
diff --git a/mlir/include/mlir/Target/SPIRV/Deserialization.h 
b/mlir/include/mlir/Target/SPIRV/Deserialization.h
index e39258beeaac8..a346a7fd1e5f7 100644
--- a/mlir/include/mlir/Target/SPIRV/Deserialization.h
+++ b/mlir/include/mlir/Target/SPIRV/Deserialization.h
@@ -15,6 +15,7 @@
 
 #include "mlir/IR/OwningOpRef.h"
 #include "mlir/Support/LLVM.h"
+#include 
 
 namespace mlir {
 class MLIRContext;
diff --git a/mlir/include/mlir/Target/SPIRV/Serialization.h 
b/mlir/include/mlir/Target/SPIRV/Serialization.h
index 613f0a423f9f8..225777e25d607 100644
--- a/mlir/include/mlir/Target/SPIRV/Serialization.h
+++ b/mlir/include/mlir/Target/SPIRV/Serialization.h
@@ -14,6 +14,7 @@
 #define MLIR_TARGET_SPIRV_SERIALIZATION_H
 
 #include "mlir/Support/LLVM.h"
+#include 
 
 namespace mlir {
 class MLIRContext;

``




https://github.com/llvm/llvm-project/pull/138550
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [SelectionDAG] Adaptation for FP operation lowering (PR #138553)

2025-05-05 Thread Serge Pavlov via llvm-branch-commits


https://github.com/spavloff created 
https://github.com/llvm/llvm-project/pull/138553

FP operations listed in FloatingPointOps.def are now lowered to DAG in the same 
way as constrained intrinsics, using special DAG nodes like STRICT_NEARBYINT.

This is a temporary solution. Existing nodes like STRICT_NEARBYINT cannot carry 
information about rounding or other control modes, so they cannot implement 
static rounding, for example. However they can serve as a first step toward a 
solution based on the FP operand bundles.

>From ef14a9b6151e66ac2b7452d8d7958f55731f35ab Mon Sep 17 00:00:00 2001
From: Serge Pavlov 
Date: Mon, 28 Apr 2025 00:21:56 +0700
Subject: [PATCH] [SelectionDAG] Adaptation for FP operation lowering

FP operations listed in FloatingPointOps.def are now lowered to DAG in
the same way as constrained intrinsics, using special DAG nodes like
STRICT_NEARBYINT.

This is a temporary solution. Existing nodes like STRICT_NEARBYINT
cannot carry information about rounding or other control modes, so they
cannot implement static rounding, for example. However they can serve as
a first step toward a solution based on the FP operand bundles.
---
 llvm/include/llvm/IR/FloatingPointOps.def |  7 ++-
 .../SelectionDAG/SelectionDAGBuilder.cpp  | 62 +--
 .../SelectionDAG/SelectionDAGBuilder.h|  1 +
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/llvm/include/llvm/IR/FloatingPointOps.def 
b/llvm/include/llvm/IR/FloatingPointOps.def
index 468227e648300..6d653668fe340 100644
--- a/llvm/include/llvm/IR/FloatingPointOps.def
+++ b/llvm/include/llvm/IR/FloatingPointOps.def
@@ -14,12 +14,17 @@
 #define FUNCTION(N,D)
 #endif
 
+#ifndef CONSTRAINED
+#define CONSTRAINED(N,D) FUNCTION(N,D)
+#endif
+
 // Arguments of the entries are:
 // - intrinsic function name,
 // - DAG node corresponding to the intrinsic.
 
-FUNCTION(experimental_constrained_fadd,  FADD)
+CONSTRAINED(experimental_constrained_fadd,   FADD)
 FUNCTION(nearbyint,  FNEARBYINT)
 FUNCTION(trunc,  FTRUNC)
 
 #undef FUNCTION
+#undef CONSTRAINED
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 8cae34d06c8ba..cd97338cfaa93 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -6802,9 +6802,7 @@ void SelectionDAGBuilder::visitIntrinsicCall(const 
CallInst &I,
   case Intrinsic::exp10:
   case Intrinsic::floor:
   case Intrinsic::ceil:
-  case Intrinsic::trunc:
   case Intrinsic::rint:
-  case Intrinsic::nearbyint:
   case Intrinsic::round:
   case Intrinsic::roundeven:
   case Intrinsic::canonicalize: {
@@ -6826,9 +6824,7 @@ void SelectionDAGBuilder::visitIntrinsicCall(const 
CallInst &I,
 case Intrinsic::exp10:Opcode = ISD::FEXP10;break;
 case Intrinsic::floor:Opcode = ISD::FFLOOR;break;
 case Intrinsic::ceil: Opcode = ISD::FCEIL; break;
-case Intrinsic::trunc:Opcode = ISD::FTRUNC;break;
 case Intrinsic::rint: Opcode = ISD::FRINT; break;
-case Intrinsic::nearbyint:Opcode = ISD::FNEARBYINT;break;
 case Intrinsic::round:Opcode = ISD::FROUND;break;
 case Intrinsic::roundeven:Opcode = ISD::FROUNDEVEN;break;
 case Intrinsic::canonicalize: Opcode = ISD::FCANONICALIZE; break;
@@ -6959,6 +6955,11 @@ void SelectionDAGBuilder::visitIntrinsicCall(const 
CallInst &I,
 #include "llvm/IR/ConstrainedOps.def"
 visitConstrainedFPIntrinsic(cast(I));
 return;
+#define CONSTRAINED(INTRINSIC, DAGN)
+#define FUNCTION(INTRINSIC, DAGN) case Intrinsic::INTRINSIC:
+#include "llvm/IR/FloatingPointOps.def"
+visitFPOperationIntrinsic(I, Intrinsic);
+return;
 #define BEGIN_REGISTER_VP_INTRINSIC(VPID, ...) case Intrinsic::VPID:
 #include "llvm/IR/VPIntrinsics.def"
 visitVectorPredicationIntrinsic(cast(I));
@@ -8350,6 +8351,59 @@ void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
   setValue(&FPI, FPResult);
 }
 
+void SelectionDAGBuilder::visitFPOperationIntrinsic(const CallInst &CI,
+unsigned Intrinsic) {
+  SDLoc sdl = getCurSDLoc();
+  bool StrictFP =
+  FuncInfo.Fn->getAttributes().hasFnAttr(llvm::Attribute::StrictFP);
+
+  int Opcode = -1;
+  switch (Intrinsic) {
+#define CONSTRAINED(NAME, DAGN)
+#define FUNCTION(NAME, DAGN)   
\
+  case Intrinsic::NAME:
\
+Opcode = StrictFP ? ISD::STRICT_##DAGN : ISD::DAGN;
\
+break;
+#include "llvm/IR/FloatingPointOps.def"
+  }
+
+  SDNodeFlags Flags;
+  if (CI.getExceptionBehavior() == fp::ExceptionBehavior::ebIgnore)
+Flags.setNoFPExcept(true);
+  if (auto *FPOp = dyn_cast(&CI))
+Flags.copyFMF(*FPOp);
+
+  SmallVector Operands;

[llvm-branch-commits] [llvm] [SelectionDAG] Adaptation for FP operation lowering (PR #138553)

2025-05-05 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-llvm-selectiondag

Author: Serge Pavlov (spavloff)


Changes

FP operations listed in FloatingPointOps.def are now lowered to DAG in the same 
way as constrained intrinsics, using special DAG nodes like STRICT_NEARBYINT.

This is a temporary solution. Existing nodes like STRICT_NEARBYINT cannot carry 
information about rounding or other control modes, so they cannot implement 
static rounding, for example. However they can serve as a first step toward a 
solution based on the FP operand bundles.

---
Full diff: https://github.com/llvm/llvm-project/pull/138553.diff


3 Files Affected:

- (modified) llvm/include/llvm/IR/FloatingPointOps.def (+6-1) 
- (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+58-4) 
- (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h (+1) 


``diff
diff --git a/llvm/include/llvm/IR/FloatingPointOps.def 
b/llvm/include/llvm/IR/FloatingPointOps.def
index 468227e648300..6d653668fe340 100644
--- a/llvm/include/llvm/IR/FloatingPointOps.def
+++ b/llvm/include/llvm/IR/FloatingPointOps.def
@@ -14,12 +14,17 @@
 #define FUNCTION(N,D)
 #endif
 
+#ifndef CONSTRAINED
+#define CONSTRAINED(N,D) FUNCTION(N,D)
+#endif
+
 // Arguments of the entries are:
 // - intrinsic function name,
 // - DAG node corresponding to the intrinsic.
 
-FUNCTION(experimental_constrained_fadd,  FADD)
+CONSTRAINED(experimental_constrained_fadd,   FADD)
 FUNCTION(nearbyint,  FNEARBYINT)
 FUNCTION(trunc,  FTRUNC)
 
 #undef FUNCTION
+#undef CONSTRAINED
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 8cae34d06c8ba..cd97338cfaa93 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -6802,9 +6802,7 @@ void SelectionDAGBuilder::visitIntrinsicCall(const 
CallInst &I,
   case Intrinsic::exp10:
   case Intrinsic::floor:
   case Intrinsic::ceil:
-  case Intrinsic::trunc:
   case Intrinsic::rint:
-  case Intrinsic::nearbyint:
   case Intrinsic::round:
   case Intrinsic::roundeven:
   case Intrinsic::canonicalize: {
@@ -6826,9 +6824,7 @@ void SelectionDAGBuilder::visitIntrinsicCall(const 
CallInst &I,
 case Intrinsic::exp10:Opcode = ISD::FEXP10;break;
 case Intrinsic::floor:Opcode = ISD::FFLOOR;break;
 case Intrinsic::ceil: Opcode = ISD::FCEIL; break;
-case Intrinsic::trunc:Opcode = ISD::FTRUNC;break;
 case Intrinsic::rint: Opcode = ISD::FRINT; break;
-case Intrinsic::nearbyint:Opcode = ISD::FNEARBYINT;break;
 case Intrinsic::round:Opcode = ISD::FROUND;break;
 case Intrinsic::roundeven:Opcode = ISD::FROUNDEVEN;break;
 case Intrinsic::canonicalize: Opcode = ISD::FCANONICALIZE; break;
@@ -6959,6 +6955,11 @@ void SelectionDAGBuilder::visitIntrinsicCall(const 
CallInst &I,
 #include "llvm/IR/ConstrainedOps.def"
 visitConstrainedFPIntrinsic(cast(I));
 return;
+#define CONSTRAINED(INTRINSIC, DAGN)
+#define FUNCTION(INTRINSIC, DAGN) case Intrinsic::INTRINSIC:
+#include "llvm/IR/FloatingPointOps.def"
+visitFPOperationIntrinsic(I, Intrinsic);
+return;
 #define BEGIN_REGISTER_VP_INTRINSIC(VPID, ...) case Intrinsic::VPID:
 #include "llvm/IR/VPIntrinsics.def"
 visitVectorPredicationIntrinsic(cast(I));
@@ -8350,6 +8351,59 @@ void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
   setValue(&FPI, FPResult);
 }
 
+void SelectionDAGBuilder::visitFPOperationIntrinsic(const CallInst &CI,
+unsigned Intrinsic) {
+  SDLoc sdl = getCurSDLoc();
+  bool StrictFP =
+  FuncInfo.Fn->getAttributes().hasFnAttr(llvm::Attribute::StrictFP);
+
+  int Opcode = -1;
+  switch (Intrinsic) {
+#define CONSTRAINED(NAME, DAGN)
+#define FUNCTION(NAME, DAGN)   
\
+  case Intrinsic::NAME:
\
+Opcode = StrictFP ? ISD::STRICT_##DAGN : ISD::DAGN;
\
+break;
+#include "llvm/IR/FloatingPointOps.def"
+  }
+
+  SDNodeFlags Flags;
+  if (CI.getExceptionBehavior() == fp::ExceptionBehavior::ebIgnore)
+Flags.setNoFPExcept(true);
+  if (auto *FPOp = dyn_cast(&CI))
+Flags.copyFMF(*FPOp);
+
+  SmallVector Operands;
+  if (StrictFP)
+Operands.push_back(DAG.getRoot());
+  for (unsigned I = 0, E = CI.arg_size(); I != E; ++I)
+Operands.push_back(getValue(CI.getArgOperand(I)));
+
+  const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+  EVT VT = TLI.getValueType(DAG.getDataLayout(), CI.getType());
+  SDVTList VTs = StrictFP ? DAG.getVTList(VT, MVT::Other) : DAG.getVTList(VT);
+
+  SDValue Result = DAG.getNode(Opcode, sdl, VTs, Operands, Flags);
+
+  SDValue OutChain;
+  if (StrictFP) {
+OutChain = Result.getValue(1);
+switch (CI.

[llvm-branch-commits] [flang] [Flang][OpenMP] Minimize host ops remaining in device compilation (PR #137200)

2025-05-05 Thread Sergio Afonso via llvm-branch-commits


https://github.com/skatrak updated 
https://github.com/llvm/llvm-project/pull/137200



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [mlir] [MLIR][OpenMP] Simplify OpenMP device codegen (PR #137201)

2025-05-05 Thread Sergio Afonso via llvm-branch-commits


https://github.com/skatrak updated 
https://github.com/llvm/llvm-project/pull/137201

>From 22f22aa0ca2c98dfcc48a70f2f7e0a5b68d7b1d9 Mon Sep 17 00:00:00 2001
From: Sergio Afonso 
Date: Tue, 22 Apr 2025 12:04:45 +0100
Subject: [PATCH] [MLIR][OpenMP] Simplify OpenMP device codegen

After removing host operations from the device MLIR module, it is no longer
necessary to provide special codegen logic to prevent these operations from
causing compiler crashes or miscompilations.

This patch removes these now unnecessary code paths to simplify codegen logic.
Some MLIR tests are now replaced with Flang tests, since the responsibility of
dealing with host operations has been moved earlier in the compilation flow.

MLIR tests holding target device modules are updated to no longer include now
unsupported host operations.
---
 .../OpenMP/target-nesting-in-host-ops.f90 |  87 
 .../Integration/OpenMP/task-target-device.f90 |  37 ++
 .../OpenMP/threadprivate-target-device.f90|  40 ++
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp  | 423 +++---
 ...arget-constant-indexing-device-region.mlir |  25 +-
 .../Target/LLVMIR/omptarget-debug-var-1.mlir  |  19 +-
 .../omptarget-memcpy-align-metadata.mlir  |  61 +--
 .../LLVMIR/omptarget-target-inside-task.mlir  |  43 --
 ...ptarget-threadprivate-device-lowering.mlir |  31 --
 .../Target/LLVMIR/openmp-llvm-invalid.mlir|  45 ++
 .../openmp-target-nesting-in-host-ops.mlir| 160 ---
 .../LLVMIR/openmp-task-target-device.mlir |  26 --
 12 files changed, 409 insertions(+), 588 deletions(-)
 create mode 100644 flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90
 create mode 100644 flang/test/Integration/OpenMP/task-target-device.f90
 create mode 100644 
flang/test/Integration/OpenMP/threadprivate-target-device.f90
 delete mode 100644 mlir/test/Target/LLVMIR/omptarget-target-inside-task.mlir
 delete mode 100644 
mlir/test/Target/LLVMIR/omptarget-threadprivate-device-lowering.mlir
 delete mode 100644 
mlir/test/Target/LLVMIR/openmp-target-nesting-in-host-ops.mlir
 delete mode 100644 mlir/test/Target/LLVMIR/openmp-task-target-device.mlir

diff --git a/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90 
b/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90
new file mode 100644
index 0..8c85a3c1784ed
--- /dev/null
+++ b/flang/test/Integration/OpenMP/target-nesting-in-host-ops.f90
@@ -0,0 +1,87 @@
+!===--===!
+! This directory can be used to add Integration tests involving multiple
+! stages of the compiler (for eg. from Fortran to LLVM IR). It should not
+! contain executable tests. We should only add tests here sparingly and only
+! if there is no other way to test. Repeat this message in each test that is
+! added to this directory and sub-directories.
+!===--===!
+
+!REQUIRES: amdgpu-registered-target
+!RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-llvm -fopenmp 
-fopenmp-version=50 -fopenmp-is-target-device %s -o - | FileCheck %s
+
+! CHECK-NOT: define void @nested_target_in_parallel
+! CHECK: define weak_odr protected amdgpu_kernel void 
@__omp_offloading_{{.*}}_nested_target_in_parallel_{{.*}}(ptr %{{.*}}, ptr 
%{{.*}})
+subroutine nested_target_in_parallel(v)
+  implicit none
+  integer, intent(inout) :: v(10)
+
+  !$omp parallel
+!$omp target map(tofrom: v)
+!$omp end target
+  !$omp end parallel
+end subroutine
+
+! CHECK-NOT: define void @nested_target_in_wsloop
+! CHECK: define weak_odr protected amdgpu_kernel void 
@__omp_offloading_{{.*}}_nested_target_in_wsloop_{{.*}}(ptr %{{.*}}, ptr 
%{{.*}})
+subroutine nested_target_in_wsloop(v)
+  implicit none
+  integer, intent(inout) :: v(10)
+  integer :: i
+
+  !$omp do
+  do i=1, 10
+!$omp target map(tofrom: v)
+!$omp end target
+  end do
+end subroutine
+
+! CHECK-NOT: define void @nested_target_in_parallel_with_private
+! CHECK: define weak_odr protected amdgpu_kernel void 
@__omp_offloading_{{.*}}_nested_target_in_parallel_with_private_{{.*}}(ptr 
%{{.*}}, ptr %{{.*}}, ptr %{{.*}})
+subroutine nested_target_in_parallel_with_private(v)
+  implicit none
+  integer, intent(inout) :: v(10)
+  integer :: x
+  x = 10
+
+  !$omp parallel firstprivate(x)
+!$omp target map(tofrom: v(1:x))
+!$omp end target
+  !$omp end parallel
+end subroutine
+
+! CHECK-NOT: define void @nested_target_in_task_with_private
+! CHECK: define weak_odr protected amdgpu_kernel void 
@__omp_offloading_{{.*}}_nested_target_in_task_with_private_{{.*}}(ptr %{{.*}}, 
ptr %{{.*}}, ptr %{{.*}})
+subroutine nested_target_in_task_with_private(v)
+  implicit none
+  integer, intent(inout) :: v(10)
+  integer :: x
+  x = 10
+
+  !$omp task firstprivate(x)
+!$omp target map(tofrom: v(1:x))
+!$omp end target
+  !$omp end task
+end subroutine
+
+! CHECK-NOT: define void @target_and_atomic_update
+! CHECK: define weak_odr protec

[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)

2025-05-05 Thread Matt Arsenault via llvm-branch-commits



@@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo,
+   SDValue &Hi) {
+  EVT LoVT, HiVT;
+  SDLoc dl(LD);
+  std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0));
+
+  ISD::LoadExtType ExtType = LD->getExtensionType();
+  SDValue Ch = LD->getChain();
+  SDValue Ptr = LD->getBasePtr();
+  EVT MemoryVT = LD->getMemoryVT();
+
+  EVT LoMemVT, HiMemVT;
+  std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT);

arsenm wrote:

These are unused. You at most should be bit casting the in memory type to the 
target integer memory type, there's no splitting 

https://github.com/llvm/llvm-project/pull/120640
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)

2025-05-05 Thread Matt Arsenault via llvm-branch-commits



@@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo,
+   SDValue &Hi) {
+  EVT LoVT, HiVT;
+  SDLoc dl(LD);
+  std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0));
+
+  ISD::LoadExtType ExtType = LD->getExtensionType();
+  SDValue Ch = LD->getChain();
+  SDValue Ptr = LD->getBasePtr();
+  EVT MemoryVT = LD->getMemoryVT();
+
+  EVT LoMemVT, HiMemVT;
+  std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT);
+
+  EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), 
LD->getValueType(0).getSizeInBits());
+  EVT MemIntVT = EVT::getIntegerVT(*DAG.getContext(), 
LD->getMemoryVT().getSizeInBits());
+  SDValue ALD = DAG.getAtomicLoad(ExtType, dl, MemIntVT, IntVT, Ch, Ptr,
+  LD->getMemOperand());
+
+  EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits());
+  EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits());
+  SDValue ExtractLo = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, LoIntVT, ALD,
+  DAG.getVectorIdxConstant(0, dl));

arsenm wrote:

This isn't a vector extract, so don't use getVectorIdxConstant. I'm not 
actually sure what the type for the second operand is supposed to be for 
extract_element 

https://github.com/llvm/llvm-project/pull/120640
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)

2025-05-05 Thread Matt Arsenault via llvm-branch-commits



@@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo,
+   SDValue &Hi) {
+  EVT LoVT, HiVT;
+  SDLoc dl(LD);
+  std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(LD->getValueType(0));
+
+  ISD::LoadExtType ExtType = LD->getExtensionType();
+  SDValue Ch = LD->getChain();
+  SDValue Ptr = LD->getBasePtr();
+  EVT MemoryVT = LD->getMemoryVT();
+
+  EVT LoMemVT, HiMemVT;
+  std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT);
+
+  EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), 
LD->getValueType(0).getSizeInBits());
+  EVT MemIntVT = EVT::getIntegerVT(*DAG.getContext(), 
LD->getMemoryVT().getSizeInBits());
+  SDValue ALD = DAG.getAtomicLoad(ExtType, dl, MemIntVT, IntVT, Ch, Ptr,
+  LD->getMemOperand());

arsenm wrote:

Maybe should assert this isn't an extending load, although you could in 
principle handle it 

https://github.com/llvm/llvm-project/pull/120640
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [flang][fir] Add locality specifiers modeling to `fir.do_concurrent.loop` (PR #138506)

2025-05-05 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/138506



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [flang][fir] Basci lowering `fir.do_concurrent` locality specs to `fir.do_loop ... unordered` (PR #138512)

2025-05-05 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/138512



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [flang][fir] Basic PFT to MLIR lowering for do concurrent locality specifiers (PR #138534)

2025-05-05 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/138534

>From 098df5a40cae3f7d514bcdb6579c7107ef74c18e Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Mon, 5 May 2025 07:15:52 -0500
Subject: [PATCH] [flang][fir] Basic PFT to MLIR lowering for do concurrent
 locality specifiers

Extends support for `fir.do_concurrent` locality specifiers to the PFT
to MLIR level. This adds code-gen for generating the newly added
`fir.local` ops and referencing these ops from `fir.do_concurrent.loop`
ops that have locality specifiers attached to them. This reuses the
`DataSharingProcessor` component and generalizes it a bit more to allow
for handling `omp.private` ops and `fir.local` ops as well.
---
 flang/include/flang/Lower/AbstractConverter.h |   4 +
 .../include/flang/Optimizer/Dialect/FIROps.h  |   4 +
 .../include/flang/Optimizer/Dialect/FIROps.td |  15 +++
 flang/lib/Lower/Bridge.cpp|  59 --
 .../lib/Lower/OpenMP/DataSharingProcessor.cpp | 104 +-
 flang/lib/Lower/OpenMP/DataSharingProcessor.h |  14 ++-
 .../Lower/do_concurrent_delayed_locality.f90  |  59 ++
 7 files changed, 219 insertions(+), 40 deletions(-)
 create mode 100644 flang/test/Lower/do_concurrent_delayed_locality.f90

diff --git a/flang/include/flang/Lower/AbstractConverter.h 
b/flang/include/flang/Lower/AbstractConverter.h
index 1d1323642bf9c..8ae68e143cd2f 100644
--- a/flang/include/flang/Lower/AbstractConverter.h
+++ b/flang/include/flang/Lower/AbstractConverter.h
@@ -348,6 +348,10 @@ class AbstractConverter {
   virtual Fortran::lower::SymbolBox
   lookupOneLevelUpSymbol(const Fortran::semantics::Symbol &sym) = 0;
 
+  /// Find the symbol in the inner-most level of the local map or return null.
+  virtual Fortran::lower::SymbolBox
+  shallowLookupSymbol(const Fortran::semantics::Symbol &sym) = 0;
+
   /// Return the mlir::SymbolTable associated to the ModuleOp.
   /// Look-ups are faster using it than using module.lookup<>,
   /// but the module op should be queried in case of failure
diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.h 
b/flang/include/flang/Optimizer/Dialect/FIROps.h
index 1bed227afb50d..62ef8b4b502f2 100644
--- a/flang/include/flang/Optimizer/Dialect/FIROps.h
+++ b/flang/include/flang/Optimizer/Dialect/FIROps.h
@@ -147,6 +147,10 @@ class CoordinateIndicesAdaptor {
   mlir::ValueRange values;
 };
 
+struct LocalitySpecifierOperands {
+  llvm::SmallVector<::mlir::Value> privateVars;
+  llvm::SmallVector<::mlir::Attribute> privateSyms;
+};
 } // namespace fir
 
 #endif // FORTRAN_OPTIMIZER_DIALECT_FIROPS_H
diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td 
b/flang/include/flang/Optimizer/Dialect/FIROps.td
index 01248aa0095ec..5c0fc95a47a8e 100644
--- a/flang/include/flang/Optimizer/Dialect/FIROps.td
+++ b/flang/include/flang/Optimizer/Dialect/FIROps.td
@@ -3600,6 +3600,21 @@ def fir_LocalitySpecifierOp : fir_Op<"local", 
[IsolatedFromAbove]> {
   ];
 
   let extraClassDeclaration = [{
+mlir::BlockArgument getInitMoldArg() {
+  auto ®ion = getInitRegion();
+  return region.empty() ? nullptr : region.getArgument(0);
+}
+mlir::BlockArgument getInitPrivateArg() {
+  auto ®ion = getInitRegion();
+  return region.empty() ? nullptr : region.getArgument(1);
+}
+
+/// Returns true if the init region might read from the mold argument
+bool initReadsFromMold() {
+  mlir::BlockArgument moldArg = getInitMoldArg();
+  return moldArg && !moldArg.use_empty();
+}
+
 /// Get the type for arguments to nested regions. This should
 /// generally be either the same as getType() or some pointer
 /// type (pointing to the type allocated by this op).
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 0a61f61ab8f75..bf55402ec4714 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -12,6 +12,8 @@
 
 #include "flang/Lower/Bridge.h"
 
+#include "OpenMP/DataSharingProcessor.h"
+#include "OpenMP/Utils.h"
 #include "flang/Lower/Allocatable.h"
 #include "flang/Lower/CallInterface.h"
 #include "flang/Lower/Coarray.h"
@@ -1144,6 +1146,14 @@ class FirConverter : public 
Fortran::lower::AbstractConverter {
 return name;
   }
 
+  /// Find the symbol in the inner-most level of the local map or return null.
+  Fortran::lower::SymbolBox
+  shallowLookupSymbol(const Fortran::semantics::Symbol &sym) override {
+if (Fortran::lower::SymbolBox v = localSymbols.shallowLookupSymbol(sym))
+  return v;
+return {};
+  }
+
 private:
   FirConverter() = delete;
   FirConverter(const FirConverter &) = delete;
@@ -1218,14 +1228,6 @@ class FirConverter : public 
Fortran::lower::AbstractConverter {
 return {};
   }
 
-  /// Find the symbol in the inner-most level of the local map or return null.
-  Fortran::lower::SymbolBox
-  shallowLookupSymbol(const Fortran::semantics::Symbol &sym) {
-if (Fortran::lower::SymbolBox v = localSymbols.shallowLookupSymbol(sym))

[llvm-branch-commits] [llvm] [CodeGen][NPM] Port PostRAMachineSinking to NPM (PR #138497)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


https://github.com/optimisan updated 
https://github.com/llvm/llvm-project/pull/138497

>From 72ba54e7458fa1b63a9deb01ae3b5131222f516b Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Mon, 5 May 2025 09:17:40 +
Subject: [PATCH] [CodeGen][NPM] Port PostRAMachineSinking to NPM

---
 llvm/include/llvm/CodeGen/MachineSink.h   | 11 +++
 llvm/include/llvm/InitializePasses.h  |  2 +-
 .../llvm/Passes/MachinePassRegistry.def   |  2 +-
 llvm/lib/CodeGen/CodeGen.cpp  |  2 +-
 llvm/lib/CodeGen/MachineSink.cpp  | 31 +++
 .../AArch64/bisect-post-ra-machine-sink.mir   |  1 +
 llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll  |  4 +--
 .../CodeGen/AMDGPU/postra-machine-sink.mir|  1 +
 llvm/test/CodeGen/X86/pr38952.mir |  1 +
 9 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/MachineSink.h 
b/llvm/include/llvm/CodeGen/MachineSink.h
index 71bd7229b7598..eb9548dc82250 100644
--- a/llvm/include/llvm/CodeGen/MachineSink.h
+++ b/llvm/include/llvm/CodeGen/MachineSink.h
@@ -26,5 +26,16 @@ class MachineSinkingPass : public 
PassInfoMixin {
  function_ref MapClassName2PassName);
 };
 
+class PostRAMachineSinkingPass
+: public PassInfoMixin {
+public:
+  PreservedAnalyses run(MachineFunction &MF, MachineFunctionAnalysisManager &);
+
+  MachineFunctionProperties getRequiredProperties() const {
+return MachineFunctionProperties().set(
+MachineFunctionProperties::Property::NoVRegs);
+  }
+};
+
 } // namespace llvm
 #endif // LLVM_CODEGEN_MACHINESINK_H
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index 07dc86c6fccf2..e75f9c7a2cfe8 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -241,7 +241,7 @@ void 
initializePostDominatorTreeWrapperPassPass(PassRegistry &);
 void initializePostInlineEntryExitInstrumenterPass(PassRegistry &);
 void initializePostMachineSchedulerLegacyPass(PassRegistry &);
 void initializePostRAHazardRecognizerLegacyPass(PassRegistry &);
-void initializePostRAMachineSinkingPass(PassRegistry &);
+void initializePostRAMachineSinkingLegacyPass(PassRegistry &);
 void initializePostRASchedulerLegacyPass(PassRegistry &);
 void initializePreISelIntrinsicLoweringLegacyPassPass(PassRegistry &);
 void initializePrintFunctionPassWrapperPass(PassRegistry &);
diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def 
b/llvm/include/llvm/Passes/MachinePassRegistry.def
index 436b26852ce90..c6c00e8f25882 100644
--- a/llvm/include/llvm/Passes/MachinePassRegistry.def
+++ b/llvm/include/llvm/Passes/MachinePassRegistry.def
@@ -164,6 +164,7 @@ MACHINE_FUNCTION_PASS("phi-node-elimination", 
PHIEliminationPass())
 MACHINE_FUNCTION_PASS("post-RA-hazard-rec", PostRAHazardRecognizerPass())
 MACHINE_FUNCTION_PASS("post-RA-sched", PostRASchedulerPass(TM))
 MACHINE_FUNCTION_PASS("postmisched", PostMachineSchedulerPass(TM))
+MACHINE_FUNCTION_PASS("postra-machine-sink", PostRAMachineSinkingPass())
 MACHINE_FUNCTION_PASS("post-ra-pseudos", ExpandPostRAPseudosPass())
 MACHINE_FUNCTION_PASS("print", PrintMIRPass())
 MACHINE_FUNCTION_PASS("print", 
LiveDebugVariablesPrinterPass(errs()))
@@ -315,7 +316,6 @@ DUMMY_MACHINE_FUNCTION_PASS("static-data-splitter", 
StaticDataSplitter)
 DUMMY_MACHINE_FUNCTION_PASS("machine-function-splitter", 
MachineFunctionSplitterPass)
 DUMMY_MACHINE_FUNCTION_PASS("machineinstr-printer", MachineFunctionPrinterPass)
 DUMMY_MACHINE_FUNCTION_PASS("mirfs-discriminators", MIRAddFSDiscriminatorsPass)
-DUMMY_MACHINE_FUNCTION_PASS("postra-machine-sink", PostRAMachineSinkingPass)
 DUMMY_MACHINE_FUNCTION_PASS("processimpdefs", ProcessImplicitDefsPass)
 DUMMY_MACHINE_FUNCTION_PASS("prologepilog-code", PrologEpilogCodeInserterPass)
 DUMMY_MACHINE_FUNCTION_PASS("ra-basic", RABasicPass)
diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp
index aa3591cb6be58..065fd4704ccfb 100644
--- a/llvm/lib/CodeGen/CodeGen.cpp
+++ b/llvm/lib/CodeGen/CodeGen.cpp
@@ -107,7 +107,7 @@ void llvm::initializeCodeGen(PassRegistry &Registry) {
   initializePeepholeOptimizerLegacyPass(Registry);
   initializePostMachineSchedulerLegacyPass(Registry);
   initializePostRAHazardRecognizerLegacyPass(Registry);
-  initializePostRAMachineSinkingPass(Registry);
+  initializePostRAMachineSinkingLegacyPass(Registry);
   initializePostRASchedulerLegacyPass(Registry);
   initializePreISelIntrinsicLoweringLegacyPassPass(Registry);
   initializeProcessImplicitDefsPass(Registry);
diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp
index aa2987b6710a3..be1a3ac125c65 100644
--- a/llvm/lib/CodeGen/MachineSink.cpp
+++ b/llvm/lib/CodeGen/MachineSink.cpp
@@ -2068,12 +2068,12 @@ void MachineSinking::SalvageUnsunkDebugUsersOfCopy(
 
//===--===//
 namespace {
 
-class PostRAMachineSinking : public MachineFunctionPass {

[llvm-branch-commits] [llvm] [CodeGen][NPM] Port PostRAMachineSinking to NPM (PR #138497)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


https://github.com/optimisan updated 
https://github.com/llvm/llvm-project/pull/138497

>From 72ba54e7458fa1b63a9deb01ae3b5131222f516b Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Mon, 5 May 2025 09:17:40 +
Subject: [PATCH] [CodeGen][NPM] Port PostRAMachineSinking to NPM

---
 llvm/include/llvm/CodeGen/MachineSink.h   | 11 +++
 llvm/include/llvm/InitializePasses.h  |  2 +-
 .../llvm/Passes/MachinePassRegistry.def   |  2 +-
 llvm/lib/CodeGen/CodeGen.cpp  |  2 +-
 llvm/lib/CodeGen/MachineSink.cpp  | 31 +++
 .../AArch64/bisect-post-ra-machine-sink.mir   |  1 +
 llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll  |  4 +--
 .../CodeGen/AMDGPU/postra-machine-sink.mir|  1 +
 llvm/test/CodeGen/X86/pr38952.mir |  1 +
 9 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/MachineSink.h 
b/llvm/include/llvm/CodeGen/MachineSink.h
index 71bd7229b7598..eb9548dc82250 100644
--- a/llvm/include/llvm/CodeGen/MachineSink.h
+++ b/llvm/include/llvm/CodeGen/MachineSink.h
@@ -26,5 +26,16 @@ class MachineSinkingPass : public 
PassInfoMixin {
  function_ref MapClassName2PassName);
 };
 
+class PostRAMachineSinkingPass
+: public PassInfoMixin {
+public:
+  PreservedAnalyses run(MachineFunction &MF, MachineFunctionAnalysisManager &);
+
+  MachineFunctionProperties getRequiredProperties() const {
+return MachineFunctionProperties().set(
+MachineFunctionProperties::Property::NoVRegs);
+  }
+};
+
 } // namespace llvm
 #endif // LLVM_CODEGEN_MACHINESINK_H
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index 07dc86c6fccf2..e75f9c7a2cfe8 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -241,7 +241,7 @@ void 
initializePostDominatorTreeWrapperPassPass(PassRegistry &);
 void initializePostInlineEntryExitInstrumenterPass(PassRegistry &);
 void initializePostMachineSchedulerLegacyPass(PassRegistry &);
 void initializePostRAHazardRecognizerLegacyPass(PassRegistry &);
-void initializePostRAMachineSinkingPass(PassRegistry &);
+void initializePostRAMachineSinkingLegacyPass(PassRegistry &);
 void initializePostRASchedulerLegacyPass(PassRegistry &);
 void initializePreISelIntrinsicLoweringLegacyPassPass(PassRegistry &);
 void initializePrintFunctionPassWrapperPass(PassRegistry &);
diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def 
b/llvm/include/llvm/Passes/MachinePassRegistry.def
index 436b26852ce90..c6c00e8f25882 100644
--- a/llvm/include/llvm/Passes/MachinePassRegistry.def
+++ b/llvm/include/llvm/Passes/MachinePassRegistry.def
@@ -164,6 +164,7 @@ MACHINE_FUNCTION_PASS("phi-node-elimination", 
PHIEliminationPass())
 MACHINE_FUNCTION_PASS("post-RA-hazard-rec", PostRAHazardRecognizerPass())
 MACHINE_FUNCTION_PASS("post-RA-sched", PostRASchedulerPass(TM))
 MACHINE_FUNCTION_PASS("postmisched", PostMachineSchedulerPass(TM))
+MACHINE_FUNCTION_PASS("postra-machine-sink", PostRAMachineSinkingPass())
 MACHINE_FUNCTION_PASS("post-ra-pseudos", ExpandPostRAPseudosPass())
 MACHINE_FUNCTION_PASS("print", PrintMIRPass())
 MACHINE_FUNCTION_PASS("print", 
LiveDebugVariablesPrinterPass(errs()))
@@ -315,7 +316,6 @@ DUMMY_MACHINE_FUNCTION_PASS("static-data-splitter", 
StaticDataSplitter)
 DUMMY_MACHINE_FUNCTION_PASS("machine-function-splitter", 
MachineFunctionSplitterPass)
 DUMMY_MACHINE_FUNCTION_PASS("machineinstr-printer", MachineFunctionPrinterPass)
 DUMMY_MACHINE_FUNCTION_PASS("mirfs-discriminators", MIRAddFSDiscriminatorsPass)
-DUMMY_MACHINE_FUNCTION_PASS("postra-machine-sink", PostRAMachineSinkingPass)
 DUMMY_MACHINE_FUNCTION_PASS("processimpdefs", ProcessImplicitDefsPass)
 DUMMY_MACHINE_FUNCTION_PASS("prologepilog-code", PrologEpilogCodeInserterPass)
 DUMMY_MACHINE_FUNCTION_PASS("ra-basic", RABasicPass)
diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp
index aa3591cb6be58..065fd4704ccfb 100644
--- a/llvm/lib/CodeGen/CodeGen.cpp
+++ b/llvm/lib/CodeGen/CodeGen.cpp
@@ -107,7 +107,7 @@ void llvm::initializeCodeGen(PassRegistry &Registry) {
   initializePeepholeOptimizerLegacyPass(Registry);
   initializePostMachineSchedulerLegacyPass(Registry);
   initializePostRAHazardRecognizerLegacyPass(Registry);
-  initializePostRAMachineSinkingPass(Registry);
+  initializePostRAMachineSinkingLegacyPass(Registry);
   initializePostRASchedulerLegacyPass(Registry);
   initializePreISelIntrinsicLoweringLegacyPassPass(Registry);
   initializeProcessImplicitDefsPass(Registry);
diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp
index aa2987b6710a3..be1a3ac125c65 100644
--- a/llvm/lib/CodeGen/MachineSink.cpp
+++ b/llvm/lib/CodeGen/MachineSink.cpp
@@ -2068,12 +2068,12 @@ void MachineSinking::SalvageUnsunkDebugUsersOfCopy(
 
//===--===//
 namespace {
 
-class PostRAMachineSinking : public MachineFunctionPass {

[llvm-branch-commits] [llvm] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass (PR #138496)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


https://github.com/optimisan updated 
https://github.com/llvm/llvm-project/pull/138496

>From 7b10c2ced561a9693f505e187bfb64f593e81562 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Mon, 5 May 2025 08:58:58 +
Subject: [PATCH] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass

---
 llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def | 1 +
 llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll  | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 98a1147ef6d66..f408ced020543 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -106,6 +106,7 @@ MACHINE_FUNCTION_PASS("amdgpu-set-wave-priority", 
AMDGPUSetWavePriorityPass())
 MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations", 
GCNPreRAOptimizationsPass())
 MACHINE_FUNCTION_PASS("amdgpu-preload-kern-arg-prolog", 
AMDGPUPreloadKernArgPrologPass())
 MACHINE_FUNCTION_PASS("amdgpu-nsa-reassign", GCNNSAReassignPass())
+MACHINE_FUNCTION_PASS("amdgpu-wait-sgpr-hazards", AMDGPUWaitSGPRHazardsPass())
 MACHINE_FUNCTION_PASS("gcn-create-vopd", GCNCreateVOPDPass())
 MACHINE_FUNCTION_PASS("gcn-dpp-combine", GCNDPPCombinePass())
 MACHINE_FUNCTION_PASS("si-fix-sgpr-copies", SIFixSGPRCopiesPass())
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll 
b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
index e00b7ff83e322..468e4208c510a 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
@@ -7,11 +7,11 @@
 ; RUN: llc -O3 -enable-new-pm -mtriple=amdgcn--amdhsa -print-pipeline-passes < 
%s 2>&1 \
 ; RUN:   | FileCheck -check-prefix=GCN-O3 %s
 
-; GCN-O0: 
require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,UnreachableBlockElimPass,ee-instrument,scalarize-masked-mem-intrin,ExpandReductionsPass,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,cgscc(function(lower-switch,lower-invoke,UnreachableBlockElimPass,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,AMDGPUWaitSGPRHazardsPass,branch-relaxation,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),invalidate))
+; GCN-O0: 
require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,UnreachableBlockElimPass,ee-instrument,scalarize-masked-mem-intrin,ExpandReductionsPass,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,cgscc(function(lower-switch,lower-invoke,UnreachableBlockElimPass,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,branch-relaxation,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),invalidate))
 
-; GCN-O2: 
require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-image-intrinsic-opt),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(infer-address-spaces,amdgpu-atomic-optimizer,atomic-expand,amdgpu-promote-alloca,separate-const-offset-

[llvm-branch-commits] [llvm] [CodeGen][NPM] Port InitUndef to NPM (PR #138495)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


https://github.com/optimisan updated 
https://github.com/llvm/llvm-project/pull/138495

>From 2db3af07bf3894df69e0336e2c71c4704fd4fca8 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Mon, 5 May 2025 08:47:42 +
Subject: [PATCH 1/2] [CodeGen][NPM] Port InitUndef to NPM

---
 llvm/include/llvm/CodeGen/InitUndef.h | 24 +
 llvm/include/llvm/InitializePasses.h  |  2 +-
 llvm/include/llvm/Passes/CodeGenPassBuilder.h |  1 +
 .../llvm/Passes/MachinePassRegistry.def   |  2 +-
 llvm/lib/CodeGen/CodeGen.cpp  |  2 +-
 llvm/lib/CodeGen/InitUndef.cpp| 50 +--
 llvm/lib/Passes/PassBuilder.cpp   |  1 +
 llvm/test/CodeGen/AArch64/init-undef.mir  |  3 ++
 llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll  |  4 +-
 .../rvv/handle-noreg-with-implicit-def.mir|  2 +
 .../rvv/subregister-undef-early-clobber.mir   |  1 +
 .../RISCV/rvv/undef-earlyclobber-chain.mir|  1 +
 12 files changed, 73 insertions(+), 20 deletions(-)
 create mode 100644 llvm/include/llvm/CodeGen/InitUndef.h

diff --git a/llvm/include/llvm/CodeGen/InitUndef.h 
b/llvm/include/llvm/CodeGen/InitUndef.h
new file mode 100644
index 0..7274824a74905
--- /dev/null
+++ b/llvm/include/llvm/CodeGen/InitUndef.h
@@ -0,0 +1,24 @@
+//===- llvm/CodeGen/InitUndef.h *- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_CODEGEN_INITUNDEF_H
+#define LLVM_CODEGEN_INITUNDEF_H
+
+#include "llvm/CodeGen/MachinePassManager.h"
+
+namespace llvm {
+
+class InitUndefPass : public PassInfoMixin {
+public:
+  PreservedAnalyses run(MachineFunction &MF,
+MachineFunctionAnalysisManager &MFAM);
+};
+
+} // namespace llvm
+
+#endif // LLVM_CODEGEN_INITUNDEF_H
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index bff0526d4177a..07dc86c6fccf2 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -311,7 +311,7 @@ void 
initializeTargetTransformInfoWrapperPassPass(PassRegistry &);
 void initializeTwoAddressInstructionLegacyPassPass(PassRegistry &);
 void initializeTypeBasedAAWrapperPassPass(PassRegistry &);
 void initializeTypePromotionLegacyPass(PassRegistry &);
-void initializeInitUndefPass(PassRegistry &);
+void initializeInitUndefLegacyPass(PassRegistry &);
 void initializeUniformityInfoWrapperPassPass(PassRegistry &);
 void initializeUnifyLoopExitsLegacyPassPass(PassRegistry &);
 void initializeUnpackMachineBundlesPass(PassRegistry &);
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 982bb16e71eab..351ef63af05c0 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -43,6 +43,7 @@
 #include "llvm/CodeGen/GlobalMerge.h"
 #include "llvm/CodeGen/GlobalMergeFunctions.h"
 #include "llvm/CodeGen/IndirectBrExpand.h"
+#include "llvm/CodeGen/InitUndef.h"
 #include "llvm/CodeGen/InterleavedAccess.h"
 #include "llvm/CodeGen/InterleavedLoadCombine.h"
 #include "llvm/CodeGen/JMCInstrumenter.h"
diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def 
b/llvm/include/llvm/Passes/MachinePassRegistry.def
index c69573ee3ed97..436b26852ce90 100644
--- a/llvm/include/llvm/Passes/MachinePassRegistry.def
+++ b/llvm/include/llvm/Passes/MachinePassRegistry.def
@@ -148,6 +148,7 @@ MACHINE_FUNCTION_PASS("early-tailduplication", 
EarlyTailDuplicatePass())
 MACHINE_FUNCTION_PASS("fentry-insert", FEntryInserterPass())
 MACHINE_FUNCTION_PASS("finalize-isel", FinalizeISelPass())
 MACHINE_FUNCTION_PASS("fixup-statepoint-caller-saved", 
FixupStatepointCallerSavedPass())
+MACHINE_FUNCTION_PASS("init-undef", InitUndefPass())
 MACHINE_FUNCTION_PASS("localstackalloc", LocalStackSlotAllocationPass())
 MACHINE_FUNCTION_PASS("machine-cp", MachineCopyPropagationPass())
 MACHINE_FUNCTION_PASS("machine-cse", MachineCSEPass())
@@ -304,7 +305,6 @@ DUMMY_MACHINE_FUNCTION_PASS("fs-profile-loader", 
MIRProfileLoaderNewPass)
 DUMMY_MACHINE_FUNCTION_PASS("funclet-layout", FuncletLayoutPass)
 DUMMY_MACHINE_FUNCTION_PASS("gc-empty-basic-blocks", GCEmptyBasicBlocksPass)
 DUMMY_MACHINE_FUNCTION_PASS("implicit-null-checks", ImplicitNullChecksPass)
-DUMMY_MACHINE_FUNCTION_PASS("init-undef-pass", InitUndefPass)
 DUMMY_MACHINE_FUNCTION_PASS("instruction-select", InstructionSelectPass)
 DUMMY_MACHINE_FUNCTION_PASS("irtranslator", IRTranslatorPass)
 DUMMY_MACHINE_FUNCTION_PASS("kcfi", MachineKCFIPass)
diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp
index 5250534d8a4e4..aa3591cb6be58 100644
--- a/llvm/lib/CodeGen/CodeGen.cpp
+++ b/llvm/lib/CodeGen/CodeGen.cpp
@@ -54,7 +54,7 @@ void llvm::initializeC

[llvm-branch-commits] [llvm] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass (PR #138496)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


https://github.com/optimisan updated 
https://github.com/llvm/llvm-project/pull/138496

>From 7b10c2ced561a9693f505e187bfb64f593e81562 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Mon, 5 May 2025 08:58:58 +
Subject: [PATCH] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass

---
 llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def | 1 +
 llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll  | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 98a1147ef6d66..f408ced020543 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -106,6 +106,7 @@ MACHINE_FUNCTION_PASS("amdgpu-set-wave-priority", 
AMDGPUSetWavePriorityPass())
 MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations", 
GCNPreRAOptimizationsPass())
 MACHINE_FUNCTION_PASS("amdgpu-preload-kern-arg-prolog", 
AMDGPUPreloadKernArgPrologPass())
 MACHINE_FUNCTION_PASS("amdgpu-nsa-reassign", GCNNSAReassignPass())
+MACHINE_FUNCTION_PASS("amdgpu-wait-sgpr-hazards", AMDGPUWaitSGPRHazardsPass())
 MACHINE_FUNCTION_PASS("gcn-create-vopd", GCNCreateVOPDPass())
 MACHINE_FUNCTION_PASS("gcn-dpp-combine", GCNDPPCombinePass())
 MACHINE_FUNCTION_PASS("si-fix-sgpr-copies", SIFixSGPRCopiesPass())
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll 
b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
index e00b7ff83e322..468e4208c510a 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
@@ -7,11 +7,11 @@
 ; RUN: llc -O3 -enable-new-pm -mtriple=amdgcn--amdhsa -print-pipeline-passes < 
%s 2>&1 \
 ; RUN:   | FileCheck -check-prefix=GCN-O3 %s
 
-; GCN-O0: 
require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,UnreachableBlockElimPass,ee-instrument,scalarize-masked-mem-intrin,ExpandReductionsPass,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,cgscc(function(lower-switch,lower-invoke,UnreachableBlockElimPass,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,AMDGPUWaitSGPRHazardsPass,branch-relaxation,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),invalidate))
+; GCN-O0: 
require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,UnreachableBlockElimPass,ee-instrument,scalarize-masked-mem-intrin,ExpandReductionsPass,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,cgscc(function(lower-switch,lower-invoke,UnreachableBlockElimPass,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,branch-relaxation,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),invalidate))
 
-; GCN-O2: 
require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-image-intrinsic-opt),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(infer-address-spaces,amdgpu-atomic-optimizer,atomic-expand,amdgpu-promote-alloca,separate-const-offset-

[llvm-branch-commits] [llvm] [CodeGen][NPM] Port InitUndef to NPM (PR #138495)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


https://github.com/optimisan ready_for_review 
https://github.com/llvm/llvm-project/pull/138495
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass (PR #138496)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


https://github.com/optimisan ready_for_review 
https://github.com/llvm/llvm-project/pull/138496
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [CodeGen][NPM] Port InitUndef to NPM (PR #138495)

2025-05-05 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: Akshat Oke (optimisan)


Changes



---

Patch is 24.47 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/138495.diff


12 Files Affected:

- (added) llvm/include/llvm/CodeGen/InitUndef.h (+24) 
- (modified) llvm/include/llvm/InitializePasses.h (+1-1) 
- (modified) llvm/include/llvm/Passes/CodeGenPassBuilder.h (+1) 
- (modified) llvm/include/llvm/Passes/MachinePassRegistry.def (+1-1) 
- (modified) llvm/lib/CodeGen/CodeGen.cpp (+1-1) 
- (modified) llvm/lib/CodeGen/InitUndef.cpp (+35-15) 
- (modified) llvm/lib/Passes/PassBuilder.cpp (+1) 
- (modified) llvm/test/CodeGen/AArch64/init-undef.mir (+3) 
- (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll (+2-2) 
- (modified) llvm/test/CodeGen/RISCV/rvv/handle-noreg-with-implicit-def.mir 
(+2) 
- (modified) llvm/test/CodeGen/RISCV/rvv/subregister-undef-early-clobber.mir 
(+1) 
- (modified) llvm/test/CodeGen/RISCV/rvv/undef-earlyclobber-chain.mir (+1) 


``diff
diff --git a/llvm/include/llvm/CodeGen/InitUndef.h 
b/llvm/include/llvm/CodeGen/InitUndef.h
new file mode 100644
index 0..be1cf4bfc9872
--- /dev/null
+++ b/llvm/include/llvm/CodeGen/InitUndef.h
@@ -0,0 +1,24 @@
+//===- llvm/CodeGen/InitUndef.h -*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_CODEGEN_INITUNDEF_H
+#define LLVM_CODEGEN_INITUNDEF_H
+
+#include "llvm/CodeGen/MachinePassManager.h"
+
+namespace llvm {
+
+class InitUndefPass : public PassInfoMixin {
+public:
+  PreservedAnalyses run(MachineFunction &MF,
+MachineFunctionAnalysisManager &MFAM);
+};
+
+} // namespace llvm
+
+#endif // LLVM_CODEGEN_INITUNDEF_H
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index bff0526d4177a..07dc86c6fccf2 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -311,7 +311,7 @@ void 
initializeTargetTransformInfoWrapperPassPass(PassRegistry &);
 void initializeTwoAddressInstructionLegacyPassPass(PassRegistry &);
 void initializeTypeBasedAAWrapperPassPass(PassRegistry &);
 void initializeTypePromotionLegacyPass(PassRegistry &);
-void initializeInitUndefPass(PassRegistry &);
+void initializeInitUndefLegacyPass(PassRegistry &);
 void initializeUniformityInfoWrapperPassPass(PassRegistry &);
 void initializeUnifyLoopExitsLegacyPassPass(PassRegistry &);
 void initializeUnpackMachineBundlesPass(PassRegistry &);
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 982bb16e71eab..351ef63af05c0 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -43,6 +43,7 @@
 #include "llvm/CodeGen/GlobalMerge.h"
 #include "llvm/CodeGen/GlobalMergeFunctions.h"
 #include "llvm/CodeGen/IndirectBrExpand.h"
+#include "llvm/CodeGen/InitUndef.h"
 #include "llvm/CodeGen/InterleavedAccess.h"
 #include "llvm/CodeGen/InterleavedLoadCombine.h"
 #include "llvm/CodeGen/JMCInstrumenter.h"
diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def 
b/llvm/include/llvm/Passes/MachinePassRegistry.def
index c69573ee3ed97..436b26852ce90 100644
--- a/llvm/include/llvm/Passes/MachinePassRegistry.def
+++ b/llvm/include/llvm/Passes/MachinePassRegistry.def
@@ -148,6 +148,7 @@ MACHINE_FUNCTION_PASS("early-tailduplication", 
EarlyTailDuplicatePass())
 MACHINE_FUNCTION_PASS("fentry-insert", FEntryInserterPass())
 MACHINE_FUNCTION_PASS("finalize-isel", FinalizeISelPass())
 MACHINE_FUNCTION_PASS("fixup-statepoint-caller-saved", 
FixupStatepointCallerSavedPass())
+MACHINE_FUNCTION_PASS("init-undef", InitUndefPass())
 MACHINE_FUNCTION_PASS("localstackalloc", LocalStackSlotAllocationPass())
 MACHINE_FUNCTION_PASS("machine-cp", MachineCopyPropagationPass())
 MACHINE_FUNCTION_PASS("machine-cse", MachineCSEPass())
@@ -304,7 +305,6 @@ DUMMY_MACHINE_FUNCTION_PASS("fs-profile-loader", 
MIRProfileLoaderNewPass)
 DUMMY_MACHINE_FUNCTION_PASS("funclet-layout", FuncletLayoutPass)
 DUMMY_MACHINE_FUNCTION_PASS("gc-empty-basic-blocks", GCEmptyBasicBlocksPass)
 DUMMY_MACHINE_FUNCTION_PASS("implicit-null-checks", ImplicitNullChecksPass)
-DUMMY_MACHINE_FUNCTION_PASS("init-undef-pass", InitUndefPass)
 DUMMY_MACHINE_FUNCTION_PASS("instruction-select", InstructionSelectPass)
 DUMMY_MACHINE_FUNCTION_PASS("irtranslator", IRTranslatorPass)
 DUMMY_MACHINE_FUNCTION_PASS("kcfi", MachineKCFIPass)
diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp
index 5250534d8a4e4..aa3591cb6be58 100644
--- a/llvm/lib/CodeGen/CodeGen.cpp
+++ b/llvm/lib/CodeGen/CodeGen.cpp
@@ -54,7 +54,7 @@ void llvm::initializeCodeGen(PassRegistry

[llvm-branch-commits] [llvm] [CodeGen][NPM] Port PostRAMachineSinking to NPM (PR #138497)

2025-05-05 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Akshat Oke (optimisan)


Changes



---

Patch is 22.22 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/138497.diff


9 Files Affected:

- (modified) llvm/include/llvm/CodeGen/MachineSink.h (+11) 
- (modified) llvm/include/llvm/InitializePasses.h (+1-1) 
- (modified) llvm/include/llvm/Passes/MachinePassRegistry.def (+1-1) 
- (modified) llvm/lib/CodeGen/CodeGen.cpp (+1-1) 
- (modified) llvm/lib/CodeGen/MachineSink.cpp (+25-6) 
- (modified) llvm/test/CodeGen/AArch64/bisect-post-ra-machine-sink.mir (+1) 
- (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll (+2-2) 
- (modified) llvm/test/CodeGen/AMDGPU/postra-machine-sink.mir (+1) 
- (modified) llvm/test/CodeGen/X86/pr38952.mir (+1) 


``diff
diff --git a/llvm/include/llvm/CodeGen/MachineSink.h 
b/llvm/include/llvm/CodeGen/MachineSink.h
index 71bd7229b7598..eb9548dc82250 100644
--- a/llvm/include/llvm/CodeGen/MachineSink.h
+++ b/llvm/include/llvm/CodeGen/MachineSink.h
@@ -26,5 +26,16 @@ class MachineSinkingPass : public 
PassInfoMixin {
  function_ref MapClassName2PassName);
 };
 
+class PostRAMachineSinkingPass
+: public PassInfoMixin {
+public:
+  PreservedAnalyses run(MachineFunction &MF, MachineFunctionAnalysisManager &);
+
+  MachineFunctionProperties getRequiredProperties() const {
+return MachineFunctionProperties().set(
+MachineFunctionProperties::Property::NoVRegs);
+  }
+};
+
 } // namespace llvm
 #endif // LLVM_CODEGEN_MACHINESINK_H
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index 07dc86c6fccf2..e75f9c7a2cfe8 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -241,7 +241,7 @@ void 
initializePostDominatorTreeWrapperPassPass(PassRegistry &);
 void initializePostInlineEntryExitInstrumenterPass(PassRegistry &);
 void initializePostMachineSchedulerLegacyPass(PassRegistry &);
 void initializePostRAHazardRecognizerLegacyPass(PassRegistry &);
-void initializePostRAMachineSinkingPass(PassRegistry &);
+void initializePostRAMachineSinkingLegacyPass(PassRegistry &);
 void initializePostRASchedulerLegacyPass(PassRegistry &);
 void initializePreISelIntrinsicLoweringLegacyPassPass(PassRegistry &);
 void initializePrintFunctionPassWrapperPass(PassRegistry &);
diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def 
b/llvm/include/llvm/Passes/MachinePassRegistry.def
index 436b26852ce90..c6c00e8f25882 100644
--- a/llvm/include/llvm/Passes/MachinePassRegistry.def
+++ b/llvm/include/llvm/Passes/MachinePassRegistry.def
@@ -164,6 +164,7 @@ MACHINE_FUNCTION_PASS("phi-node-elimination", 
PHIEliminationPass())
 MACHINE_FUNCTION_PASS("post-RA-hazard-rec", PostRAHazardRecognizerPass())
 MACHINE_FUNCTION_PASS("post-RA-sched", PostRASchedulerPass(TM))
 MACHINE_FUNCTION_PASS("postmisched", PostMachineSchedulerPass(TM))
+MACHINE_FUNCTION_PASS("postra-machine-sink", PostRAMachineSinkingPass())
 MACHINE_FUNCTION_PASS("post-ra-pseudos", ExpandPostRAPseudosPass())
 MACHINE_FUNCTION_PASS("print", PrintMIRPass())
 MACHINE_FUNCTION_PASS("print", 
LiveDebugVariablesPrinterPass(errs()))
@@ -315,7 +316,6 @@ DUMMY_MACHINE_FUNCTION_PASS("static-data-splitter", 
StaticDataSplitter)
 DUMMY_MACHINE_FUNCTION_PASS("machine-function-splitter", 
MachineFunctionSplitterPass)
 DUMMY_MACHINE_FUNCTION_PASS("machineinstr-printer", MachineFunctionPrinterPass)
 DUMMY_MACHINE_FUNCTION_PASS("mirfs-discriminators", MIRAddFSDiscriminatorsPass)
-DUMMY_MACHINE_FUNCTION_PASS("postra-machine-sink", PostRAMachineSinkingPass)
 DUMMY_MACHINE_FUNCTION_PASS("processimpdefs", ProcessImplicitDefsPass)
 DUMMY_MACHINE_FUNCTION_PASS("prologepilog-code", PrologEpilogCodeInserterPass)
 DUMMY_MACHINE_FUNCTION_PASS("ra-basic", RABasicPass)
diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp
index aa3591cb6be58..065fd4704ccfb 100644
--- a/llvm/lib/CodeGen/CodeGen.cpp
+++ b/llvm/lib/CodeGen/CodeGen.cpp
@@ -107,7 +107,7 @@ void llvm::initializeCodeGen(PassRegistry &Registry) {
   initializePeepholeOptimizerLegacyPass(Registry);
   initializePostMachineSchedulerLegacyPass(Registry);
   initializePostRAHazardRecognizerLegacyPass(Registry);
-  initializePostRAMachineSinkingPass(Registry);
+  initializePostRAMachineSinkingLegacyPass(Registry);
   initializePostRASchedulerLegacyPass(Registry);
   initializePreISelIntrinsicLoweringLegacyPassPass(Registry);
   initializeProcessImplicitDefsPass(Registry);
diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp
index aa2987b6710a3..be1a3ac125c65 100644
--- a/llvm/lib/CodeGen/MachineSink.cpp
+++ b/llvm/lib/CodeGen/MachineSink.cpp
@@ -2068,12 +2068,12 @@ void MachineSinking::SalvageUnsunkDebugUsersOfCopy(
 
//===--===//
 namespace {
 
-class PostRAMachineSinking : public MachineFunctionPass {
+class PostRAMachineSinkingLegacy

[llvm-branch-commits] [llvm] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass (PR #138496)

2025-05-05 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Akshat Oke (optimisan)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/138496.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def (+1) 
- (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll (+3-3) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 98a1147ef6d66..f408ced020543 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -106,6 +106,7 @@ MACHINE_FUNCTION_PASS("amdgpu-set-wave-priority", 
AMDGPUSetWavePriorityPass())
 MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations", 
GCNPreRAOptimizationsPass())
 MACHINE_FUNCTION_PASS("amdgpu-preload-kern-arg-prolog", 
AMDGPUPreloadKernArgPrologPass())
 MACHINE_FUNCTION_PASS("amdgpu-nsa-reassign", GCNNSAReassignPass())
+MACHINE_FUNCTION_PASS("amdgpu-wait-sgpr-hazards", AMDGPUWaitSGPRHazardsPass())
 MACHINE_FUNCTION_PASS("gcn-create-vopd", GCNCreateVOPDPass())
 MACHINE_FUNCTION_PASS("gcn-dpp-combine", GCNDPPCombinePass())
 MACHINE_FUNCTION_PASS("si-fix-sgpr-copies", SIFixSGPRCopiesPass())
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll 
b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
index e00b7ff83e322..468e4208c510a 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
@@ -7,11 +7,11 @@
 ; RUN: llc -O3 -enable-new-pm -mtriple=amdgcn--amdhsa -print-pipeline-passes < 
%s 2>&1 \
 ; RUN:   | FileCheck -check-prefix=GCN-O3 %s
 
-; GCN-O0: 
require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,UnreachableBlockElimPass,ee-instrument,scalarize-masked-mem-intrin,ExpandReductionsPass,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,cgscc(function(lower-switch,lower-invoke,UnreachableBlockElimPass,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,AMDGPUWaitSGPRHazardsPass,branch-relaxation,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),invalidate))
+; GCN-O0: 
require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,UnreachableBlockElimPass,ee-instrument,scalarize-masked-mem-intrin,ExpandReductionsPass,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,cgscc(function(lower-switch,lower-invoke,UnreachableBlockElimPass,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,branch-relaxation,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),invalidate))
 
-; GCN-O2: 
require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-image-intrinsic-opt),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(infer-address-spaces,amdgpu-atomic-optimizer,atomic-expand,amdgpu-promote-alloca,separate-const-offset-from-gep<>,slsr,early-cse<>,nary-reassociate,early-cse<>,amdgpu-codegenprepare,verify,loop-mssa(loop-reduce),me

[llvm-branch-commits] [llvm] [CodeGen][NPM] Port PostRAMachineSinking to NPM (PR #138497)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


https://github.com/optimisan ready_for_review 
https://github.com/llvm/llvm-project/pull/138497
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][NPM] Complete optimized regalloc pipeline (PR #138491)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


https://github.com/optimisan updated 
https://github.com/llvm/llvm-project/pull/138491

>From fec3016e7cfaefd3fb66a7ec3e9c9a09085e2d49 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Mon, 5 May 2025 06:30:03 +
Subject: [PATCH] [AMDGPU][NPM] Complete optimized regalloc pipeline

Also fill in some other passes.
---
 llvm/include/llvm/Passes/CodeGenPassBuilder.h |  2 +-
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 41 +--
 llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h  |  1 +
 llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll  |  7 +---
 4 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index ddd258c21f593..982bb16e71eab 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -574,7 +574,7 @@ template  class 
CodeGenPassBuilder {
   /// Insert InsertedPass pass after TargetPass pass.
   /// Only machine function passes are supported.
   template 
-  void insertPass(InsertedPassT &&Pass) {
+  void insertPass(InsertedPassT &&Pass) const {
 AfterCallbacks.emplace_back(
 [&](StringRef Name, MachineFunctionPassManager &MFPM) mutable {
   if (Name == TargetPassT::name())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 56f808a553388..076440f869cd0 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -2162,7 +2162,44 @@ void AMDGPUCodeGenPassBuilder::addMachineSSAOptimization(
   addPass(SIShrinkInstructionsPass());
 }
 
+void AMDGPUCodeGenPassBuilder::addOptimizedRegAlloc(
+AddMachinePass &addPass) const {
+  if (EnableDCEInRA)
+insertPass(DeadMachineInstructionElimPass());
+
+  // FIXME: when an instruction has a Killed operand, and the instruction is
+  // inside a bundle, seems only the BUNDLE instruction appears as the Kills of
+  // the register in LiveVariables, this would trigger a failure in verifier,
+  // we should fix it and enable the verifier.
+  if (OptVGPRLiveRange)
+insertPass>(
+SIOptimizeVGPRLiveRangePass());
+
+  // This must be run immediately after phi elimination and before
+  // TwoAddressInstructions, otherwise the processing of the tied operand of
+  // SI_ELSE will introduce a copy of the tied operand source after the else.
+  insertPass(SILowerControlFlowPass());
+
+  if (EnableRewritePartialRegUses)
+insertPass(GCNRewritePartialRegUsesPass());
+
+  if (isPassEnabled(EnablePreRAOptimizations))
+insertPass(GCNPreRAOptimizationsPass());
 
+  // Allow the scheduler to run before SIWholeQuadMode inserts exec 
manipulation
+  // instructions that cause scheduling barriers.
+  insertPass(SIWholeQuadModePass());
+
+  if (OptExecMaskPreRA)
+insertPass(SIOptimizeExecMaskingPreRAPass());
+
+  // This is not an essential optimization and it has a noticeable impact on
+  // compilation time, so we only enable it from O2.
+  if (TM.getOptLevel() > CodeGenOptLevel::Less)
+insertPass(SIFormMemoryClausesPass());
+
+  Base::addOptimizedRegAlloc(addPass);
+}
 
 Error AMDGPUCodeGenPassBuilder::addRegAssignmentOptimized(
 AddMachinePass &addPass) const {
@@ -2190,21 +2227,19 @@ Error 
AMDGPUCodeGenPassBuilder::addRegAssignmentOptimized(
   addPass(SIPreAllocateWWMRegsPass());
 
   // For allocating other wwm register operands.
-  // addRegAlloc(addPass, RegAllocPhase::WWM);
   addPass(RAGreedyPass({onlyAllocateWWMRegs, "wwm"}));
   addPass(SILowerWWMCopiesPass());
   addPass(VirtRegRewriterPass(false));
   addPass(AMDGPUReserveWWMRegsPass());
 
   // For allocating per-thread VGPRs.
-  // addRegAlloc(addPass, RegAllocPhase::VGPR);
   addPass(RAGreedyPass({onlyAllocateVGPRs, "vgpr"}));
 
 
   addPreRewrite(addPass);
   addPass(VirtRegRewriterPass(true));
 
-  // TODO: addPass(AMDGPUMarkLastScratchLoadPass());
+  addPass(AMDGPUMarkLastScratchLoadPass());
   return Error::success();
 }
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h
index 589123274d0f5..3c62cd19c6e57 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h
@@ -182,6 +182,7 @@ class AMDGPUCodeGenPassBuilder
   void addPostRegAlloc(AddMachinePass &) const;
   void addPreEmitPass(AddMachinePass &) const;
   Error addRegAssignmentOptimized(AddMachinePass &) const;
+  void addOptimizedRegAlloc(AddMachinePass &) const;
 
   /// Check if a pass is enabled given \p Opt option. The option always
   /// overrides defaults if explicitly used. Otherwise its default will be used
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll 
b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
index e9b57515e71e0..91c15565762de 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
@@ -7,14 +7,11 @@
 ; RUN: llc -O3 -enable-new-pm -mtrip

[llvm-branch-commits] [llvm] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass (PR #138496)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


https://github.com/optimisan created 
https://github.com/llvm/llvm-project/pull/138496

None

>From bd51fd929097dba2593a6a0f3e7cb7982ff75b57 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Mon, 5 May 2025 08:58:58 +
Subject: [PATCH] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass

---
 llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def | 1 +
 llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll  | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 98a1147ef6d66..f408ced020543 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -106,6 +106,7 @@ MACHINE_FUNCTION_PASS("amdgpu-set-wave-priority", 
AMDGPUSetWavePriorityPass())
 MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations", 
GCNPreRAOptimizationsPass())
 MACHINE_FUNCTION_PASS("amdgpu-preload-kern-arg-prolog", 
AMDGPUPreloadKernArgPrologPass())
 MACHINE_FUNCTION_PASS("amdgpu-nsa-reassign", GCNNSAReassignPass())
+MACHINE_FUNCTION_PASS("amdgpu-wait-sgpr-hazards", AMDGPUWaitSGPRHazardsPass())
 MACHINE_FUNCTION_PASS("gcn-create-vopd", GCNCreateVOPDPass())
 MACHINE_FUNCTION_PASS("gcn-dpp-combine", GCNDPPCombinePass())
 MACHINE_FUNCTION_PASS("si-fix-sgpr-copies", SIFixSGPRCopiesPass())
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll 
b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
index e00b7ff83e322..468e4208c510a 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
@@ -7,11 +7,11 @@
 ; RUN: llc -O3 -enable-new-pm -mtriple=amdgcn--amdhsa -print-pipeline-passes < 
%s 2>&1 \
 ; RUN:   | FileCheck -check-prefix=GCN-O3 %s
 
-; GCN-O0: 
require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,UnreachableBlockElimPass,ee-instrument,scalarize-masked-mem-intrin,ExpandReductionsPass,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,cgscc(function(lower-switch,lower-invoke,UnreachableBlockElimPass,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,AMDGPUWaitSGPRHazardsPass,branch-relaxation,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),invalidate))
+; GCN-O0: 
require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,UnreachableBlockElimPass,ee-instrument,scalarize-masked-mem-intrin,ExpandReductionsPass,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,cgscc(function(lower-switch,lower-invoke,UnreachableBlockElimPass,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,branch-relaxation,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),invalidate))
 
-; GCN-O2: 
require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-image-intrinsic-opt),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(infer-address-spaces,amdgpu-atomic-optimizer,atomic-expand,amdgpu-promote-alloca,separate-const-o

[llvm-branch-commits] [llvm] [CodeGen][NPM] Port PostRAMachineSinking to NPM (PR #138497)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


https://github.com/optimisan created 
https://github.com/llvm/llvm-project/pull/138497

None



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [CodeGen][NPM] Port InitUndef to NPM (PR #138495)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


https://github.com/optimisan created 
https://github.com/llvm/llvm-project/pull/138495

None

>From 2db3af07bf3894df69e0336e2c71c4704fd4fca8 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Mon, 5 May 2025 08:47:42 +
Subject: [PATCH] [CodeGen][NPM] Port InitUndef to NPM

---
 llvm/include/llvm/CodeGen/InitUndef.h | 24 +
 llvm/include/llvm/InitializePasses.h  |  2 +-
 llvm/include/llvm/Passes/CodeGenPassBuilder.h |  1 +
 .../llvm/Passes/MachinePassRegistry.def   |  2 +-
 llvm/lib/CodeGen/CodeGen.cpp  |  2 +-
 llvm/lib/CodeGen/InitUndef.cpp| 50 +--
 llvm/lib/Passes/PassBuilder.cpp   |  1 +
 llvm/test/CodeGen/AArch64/init-undef.mir  |  3 ++
 llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll  |  4 +-
 .../rvv/handle-noreg-with-implicit-def.mir|  2 +
 .../rvv/subregister-undef-early-clobber.mir   |  1 +
 .../RISCV/rvv/undef-earlyclobber-chain.mir|  1 +
 12 files changed, 73 insertions(+), 20 deletions(-)
 create mode 100644 llvm/include/llvm/CodeGen/InitUndef.h

diff --git a/llvm/include/llvm/CodeGen/InitUndef.h 
b/llvm/include/llvm/CodeGen/InitUndef.h
new file mode 100644
index 0..7274824a74905
--- /dev/null
+++ b/llvm/include/llvm/CodeGen/InitUndef.h
@@ -0,0 +1,24 @@
+//===- llvm/CodeGen/InitUndef.h *- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_CODEGEN_INITUNDEF_H
+#define LLVM_CODEGEN_INITUNDEF_H
+
+#include "llvm/CodeGen/MachinePassManager.h"
+
+namespace llvm {
+
+class InitUndefPass : public PassInfoMixin {
+public:
+  PreservedAnalyses run(MachineFunction &MF,
+MachineFunctionAnalysisManager &MFAM);
+};
+
+} // namespace llvm
+
+#endif // LLVM_CODEGEN_INITUNDEF_H
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index bff0526d4177a..07dc86c6fccf2 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -311,7 +311,7 @@ void 
initializeTargetTransformInfoWrapperPassPass(PassRegistry &);
 void initializeTwoAddressInstructionLegacyPassPass(PassRegistry &);
 void initializeTypeBasedAAWrapperPassPass(PassRegistry &);
 void initializeTypePromotionLegacyPass(PassRegistry &);
-void initializeInitUndefPass(PassRegistry &);
+void initializeInitUndefLegacyPass(PassRegistry &);
 void initializeUniformityInfoWrapperPassPass(PassRegistry &);
 void initializeUnifyLoopExitsLegacyPassPass(PassRegistry &);
 void initializeUnpackMachineBundlesPass(PassRegistry &);
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 982bb16e71eab..351ef63af05c0 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -43,6 +43,7 @@
 #include "llvm/CodeGen/GlobalMerge.h"
 #include "llvm/CodeGen/GlobalMergeFunctions.h"
 #include "llvm/CodeGen/IndirectBrExpand.h"
+#include "llvm/CodeGen/InitUndef.h"
 #include "llvm/CodeGen/InterleavedAccess.h"
 #include "llvm/CodeGen/InterleavedLoadCombine.h"
 #include "llvm/CodeGen/JMCInstrumenter.h"
diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def 
b/llvm/include/llvm/Passes/MachinePassRegistry.def
index c69573ee3ed97..436b26852ce90 100644
--- a/llvm/include/llvm/Passes/MachinePassRegistry.def
+++ b/llvm/include/llvm/Passes/MachinePassRegistry.def
@@ -148,6 +148,7 @@ MACHINE_FUNCTION_PASS("early-tailduplication", 
EarlyTailDuplicatePass())
 MACHINE_FUNCTION_PASS("fentry-insert", FEntryInserterPass())
 MACHINE_FUNCTION_PASS("finalize-isel", FinalizeISelPass())
 MACHINE_FUNCTION_PASS("fixup-statepoint-caller-saved", 
FixupStatepointCallerSavedPass())
+MACHINE_FUNCTION_PASS("init-undef", InitUndefPass())
 MACHINE_FUNCTION_PASS("localstackalloc", LocalStackSlotAllocationPass())
 MACHINE_FUNCTION_PASS("machine-cp", MachineCopyPropagationPass())
 MACHINE_FUNCTION_PASS("machine-cse", MachineCSEPass())
@@ -304,7 +305,6 @@ DUMMY_MACHINE_FUNCTION_PASS("fs-profile-loader", 
MIRProfileLoaderNewPass)
 DUMMY_MACHINE_FUNCTION_PASS("funclet-layout", FuncletLayoutPass)
 DUMMY_MACHINE_FUNCTION_PASS("gc-empty-basic-blocks", GCEmptyBasicBlocksPass)
 DUMMY_MACHINE_FUNCTION_PASS("implicit-null-checks", ImplicitNullChecksPass)
-DUMMY_MACHINE_FUNCTION_PASS("init-undef-pass", InitUndefPass)
 DUMMY_MACHINE_FUNCTION_PASS("instruction-select", InstructionSelectPass)
 DUMMY_MACHINE_FUNCTION_PASS("irtranslator", IRTranslatorPass)
 DUMMY_MACHINE_FUNCTION_PASS("kcfi", MachineKCFIPass)
diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp
index 5250534d8a4e4..aa3591cb6be58 100644
--- a/llvm/lib/CodeGen/CodeGen.cpp
+++ b/llvm/lib/CodeGen/CodeGen.cpp
@@ -54,7 +54,7 @@ void llvm::initializ

[llvm-branch-commits] [llvm] [AMDGPU][NPM] Register AMDGPUWaitSGPRHazards pass (PR #138496)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


optimisan wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/138496?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#138497** https://app.graphite.dev/github/pr/llvm/llvm-project/138497?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#138496** https://app.graphite.dev/github/pr/llvm/llvm-project/138496?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/138496?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#138495** https://app.graphite.dev/github/pr/llvm/llvm-project/138495?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#138491** https://app.graphite.dev/github/pr/llvm/llvm-project/138491?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#136818** https://app.graphite.dev/github/pr/llvm/llvm-project/136818?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/138496
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [CodeGen][NPM] Port InitUndef to NPM (PR #138495)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


optimisan wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/138495?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#138497** https://app.graphite.dev/github/pr/llvm/llvm-project/138497?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#138496** https://app.graphite.dev/github/pr/llvm/llvm-project/138496?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#138495** https://app.graphite.dev/github/pr/llvm/llvm-project/138495?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/138495?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#138491** https://app.graphite.dev/github/pr/llvm/llvm-project/138491?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#136818** https://app.graphite.dev/github/pr/llvm/llvm-project/136818?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/138495
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [CodeGen][NPM] Port PostRAMachineSinking to NPM (PR #138497)

2025-05-05 Thread Akshat Oke via llvm-branch-commits


optimisan wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/138497?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#138497** https://app.graphite.dev/github/pr/llvm/llvm-project/138497?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/138497?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#138496** https://app.graphite.dev/github/pr/llvm/llvm-project/138496?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#138495** https://app.graphite.dev/github/pr/llvm/llvm-project/138495?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#138491** https://app.graphite.dev/github/pr/llvm/llvm-project/138491?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#136818** https://app.graphite.dev/github/pr/llvm/llvm-project/136818?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/138497
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

1 2 >

1 - 100 of 188 matches

Mail list logo