[llvm-branch-commits] [flang] [llvm] [flang][OpenMP] Parse WHEN, OTHERWISE, MATCH clauses plus METADIRECTIVE (PR #121817)

2025-01-15 Thread Kiran Chandramohan via llvm-branch-commits

https://github.com/kiranchandramohan edited 
https://github.com/llvm/llvm-project/pull/121817
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [flang][OpenMP] Parse WHEN, OTHERWISE, MATCH clauses plus METADIRECTIVE (PR #121817)

2025-01-15 Thread Kiran Chandramohan via llvm-branch-commits


@@ -214,6 +214,11 @@ class AssociatedLoopChecker {
 };
 
 bool OmpStructureChecker::CheckAllowedClause(llvmOmpClause clause) {
+  // Do not do clause checks while processing METADIRECTIVE.

kiranchandramohan wrote:

Could you add the reason also?

https://github.com/llvm/llvm-project/pull/121817
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [flang][OpenMP] Parse WHEN, OTHERWISE, MATCH clauses plus METADIRECTIVE (PR #121817)

2025-01-15 Thread Kiran Chandramohan via llvm-branch-commits


@@ -845,7 +851,8 @@ def OMP_Metadirective : Directive<"metadirective"> {
 VersionedClause,
   ];
   let allowedOnceClauses = [
-VersionedClause,
+VersionedClause,
+VersionedClause,

kiranchandramohan wrote:

Is this tested in Flang?

https://github.com/llvm/llvm-project/pull/121817
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [flang][OpenMP] Parse WHEN, OTHERWISE, MATCH clauses plus METADIRECTIVE (PR #121817)

2025-01-15 Thread Kiran Chandramohan via llvm-branch-commits

https://github.com/kiranchandramohan commented:

Minor comments.

https://github.com/llvm/llvm-project/pull/121817
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [flang][OpenMP] Parse WHEN, OTHERWISE, MATCH clauses plus METADIRECTIVE (PR #121817)

2025-01-15 Thread Kiran Chandramohan via llvm-branch-commits


@@ -265,6 +265,7 @@ def OMPC_Map : Clause<"map"> {
   let flangClass = "OmpMapClause";
 }
 def OMPC_Match : Clause<"match"> {
+  let flangClass = "OmpMatchClause";

kiranchandramohan wrote:

Is there a test for Match?

https://github.com/llvm/llvm-project/pull/121817
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][NewPM] Port SILowerControlFlow pass into NPM. (PR #123045)

2025-01-15 Thread Christudasan Devadasan via llvm-branch-commits

https://github.com/cdevadas created 
https://github.com/llvm/llvm-project/pull/123045

None

>From 740d759821a443fdeb9d093cb9fc4fae81c0da4c Mon Sep 17 00:00:00 2001
From: Christudasan Devadasan 
Date: Thu, 5 Dec 2024 12:07:40 +0530
Subject: [PATCH] [AMDGPU][NewPM] Port SILowerControlFlow pass into NPM.

---
 llvm/lib/Target/AMDGPU/AMDGPU.h   |  4 +-
 llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def |  1 +
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |  7 ++-
 llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp | 63 ++-
 llvm/lib/Target/AMDGPU/SILowerControlFlow.h   | 22 +++
 llvm/test/CodeGen/AMDGPU/collapse-endcf.mir   |  1 +
 .../lower-control-flow-live-intervals.mir |  1 +
 .../lower-control-flow-other-terminators.mir  |  1 +
 8 files changed, 80 insertions(+), 20 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/SILowerControlFlow.h

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 400c5f219cc702..89356df39724a4 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -207,8 +207,8 @@ extern char &SILoadStoreOptimizerLegacyID;
 void initializeSIWholeQuadModePass(PassRegistry &);
 extern char &SIWholeQuadModeID;
 
-void initializeSILowerControlFlowPass(PassRegistry &);
-extern char &SILowerControlFlowID;
+void initializeSILowerControlFlowLegacyPass(PassRegistry &);
+extern char &SILowerControlFlowLegacyID;
 
 void initializeSIPreEmitPeepholePass(PassRegistry &);
 extern char &SIPreEmitPeepholeID;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 6f322074ba74cc..fbcf83e2fdd60b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -102,6 +102,7 @@ MACHINE_FUNCTION_PASS("si-i1-copies", SILowerI1CopiesPass())
 MACHINE_FUNCTION_PASS("si-fold-operands", SIFoldOperandsPass());
 MACHINE_FUNCTION_PASS("gcn-dpp-combine", GCNDPPCombinePass())
 MACHINE_FUNCTION_PASS("si-load-store-opt", SILoadStoreOptimizerPass())
+MACHINE_FUNCTION_PASS("si-lower-control-flow", SILowerControlFlowPass())
 MACHINE_FUNCTION_PASS("si-lower-sgpr-spills", SILowerSGPRSpillsPass())
 MACHINE_FUNCTION_PASS("si-opt-vgpr-liverange", SIOptimizeVGPRLiveRangePass())
 MACHINE_FUNCTION_PASS("si-peephole-sdwa", SIPeepholeSDWAPass())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 6d4547dbc82c3b..98268b848f5ce6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -38,6 +38,7 @@
 #include "SIFixSGPRCopies.h"
 #include "SIFoldOperands.h"
 #include "SILoadStoreOptimizer.h"
+#include "SILowerControlFlow.h"
 #include "SILowerSGPRSpills.h"
 #include "SIMachineFunctionInfo.h"
 #include "SIMachineScheduler.h"
@@ -523,7 +524,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void 
LLVMInitializeAMDGPUTarget() {
   initializeSIInsertWaitcntsPass(*PR);
   initializeSIModeRegisterPass(*PR);
   initializeSIWholeQuadModePass(*PR);
-  initializeSILowerControlFlowPass(*PR);
+  initializeSILowerControlFlowLegacyPass(*PR);
   initializeSIPreEmitPeepholePass(*PR);
   initializeSILateBranchLoweringPass(*PR);
   initializeSIMemoryLegalizerPass(*PR);
@@ -1459,7 +1460,7 @@ void GCNPassConfig::addFastRegAlloc() {
   // This must be run immediately after phi elimination and before
   // TwoAddressInstructions, otherwise the processing of the tied operand of
   // SI_ELSE will introduce a copy of the tied operand source after the else.
-  insertPass(&PHIEliminationID, &SILowerControlFlowID);
+  insertPass(&PHIEliminationID, &SILowerControlFlowLegacyID);
 
   insertPass(&TwoAddressInstructionPassID, &SIWholeQuadModeID);
 
@@ -1480,7 +1481,7 @@ void GCNPassConfig::addOptimizedRegAlloc() {
   // This must be run immediately after phi elimination and before
   // TwoAddressInstructions, otherwise the processing of the tied operand of
   // SI_ELSE will introduce a copy of the tied operand source after the else.
-  insertPass(&PHIEliminationID, &SILowerControlFlowID);
+  insertPass(&PHIEliminationID, &SILowerControlFlowLegacyID);
 
   if (EnableRewritePartialRegUses)
 insertPass(&RenameIndependentSubregsID, &GCNRewritePartialRegUsesID);
diff --git a/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp 
b/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
index ed5390b96ed4b5..f8878f32f829d1 100644
--- a/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
+++ b/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
@@ -48,6 +48,7 @@
 /// %exec = S_OR_B64 %exec, %sgpr0 // Re-enable saved exec mask bits
 
//===--===//
 
+#include "SILowerControlFlow.h"
 #include "AMDGPU.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
@@ -68,7 +69,7 @@ RemoveRedundantEndcf("amdgpu-remove-redundant-endcf",
 
 namespace {
 
-class SILowerControlFlow : public MachineFunctionPass {
+class S

[llvm-branch-commits] [llvm] [AMDGPU][NewPM] Port SILowerControlFlow pass into NPM. (PR #123045)

2025-01-15 Thread Christudasan Devadasan via llvm-branch-commits

cdevadas wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/123045?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#123045** https://app.graphite.dev/github/pr/llvm/llvm-project/123045?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> šŸ‘ˆ https://app.graphite.dev/github/pr/llvm/llvm-project/123045?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#123044** https://app.graphite.dev/github/pr/llvm/llvm-project/123044?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/123045
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][NewPM] Port SILowerControlFlow pass into NPM. (PR #123045)

2025-01-15 Thread Christudasan Devadasan via llvm-branch-commits

https://github.com/cdevadas ready_for_review 
https://github.com/llvm/llvm-project/pull/123045
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][NewPM] Port SILowerControlFlow pass into NPM. (PR #123045)

2025-01-15 Thread Christudasan Devadasan via llvm-branch-commits

https://github.com/cdevadas updated 
https://github.com/llvm/llvm-project/pull/123045

>From 740d759821a443fdeb9d093cb9fc4fae81c0da4c Mon Sep 17 00:00:00 2001
From: Christudasan Devadasan 
Date: Thu, 5 Dec 2024 12:07:40 +0530
Subject: [PATCH 1/2] [AMDGPU][NewPM] Port SILowerControlFlow pass into NPM.

---
 llvm/lib/Target/AMDGPU/AMDGPU.h   |  4 +-
 llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def |  1 +
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |  7 ++-
 llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp | 63 ++-
 llvm/lib/Target/AMDGPU/SILowerControlFlow.h   | 22 +++
 llvm/test/CodeGen/AMDGPU/collapse-endcf.mir   |  1 +
 .../lower-control-flow-live-intervals.mir |  1 +
 .../lower-control-flow-other-terminators.mir  |  1 +
 8 files changed, 80 insertions(+), 20 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/SILowerControlFlow.h

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 400c5f219cc702..89356df39724a4 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -207,8 +207,8 @@ extern char &SILoadStoreOptimizerLegacyID;
 void initializeSIWholeQuadModePass(PassRegistry &);
 extern char &SIWholeQuadModeID;
 
-void initializeSILowerControlFlowPass(PassRegistry &);
-extern char &SILowerControlFlowID;
+void initializeSILowerControlFlowLegacyPass(PassRegistry &);
+extern char &SILowerControlFlowLegacyID;
 
 void initializeSIPreEmitPeepholePass(PassRegistry &);
 extern char &SIPreEmitPeepholeID;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 6f322074ba74cc..fbcf83e2fdd60b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -102,6 +102,7 @@ MACHINE_FUNCTION_PASS("si-i1-copies", SILowerI1CopiesPass())
 MACHINE_FUNCTION_PASS("si-fold-operands", SIFoldOperandsPass());
 MACHINE_FUNCTION_PASS("gcn-dpp-combine", GCNDPPCombinePass())
 MACHINE_FUNCTION_PASS("si-load-store-opt", SILoadStoreOptimizerPass())
+MACHINE_FUNCTION_PASS("si-lower-control-flow", SILowerControlFlowPass())
 MACHINE_FUNCTION_PASS("si-lower-sgpr-spills", SILowerSGPRSpillsPass())
 MACHINE_FUNCTION_PASS("si-opt-vgpr-liverange", SIOptimizeVGPRLiveRangePass())
 MACHINE_FUNCTION_PASS("si-peephole-sdwa", SIPeepholeSDWAPass())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 6d4547dbc82c3b..98268b848f5ce6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -38,6 +38,7 @@
 #include "SIFixSGPRCopies.h"
 #include "SIFoldOperands.h"
 #include "SILoadStoreOptimizer.h"
+#include "SILowerControlFlow.h"
 #include "SILowerSGPRSpills.h"
 #include "SIMachineFunctionInfo.h"
 #include "SIMachineScheduler.h"
@@ -523,7 +524,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void 
LLVMInitializeAMDGPUTarget() {
   initializeSIInsertWaitcntsPass(*PR);
   initializeSIModeRegisterPass(*PR);
   initializeSIWholeQuadModePass(*PR);
-  initializeSILowerControlFlowPass(*PR);
+  initializeSILowerControlFlowLegacyPass(*PR);
   initializeSIPreEmitPeepholePass(*PR);
   initializeSILateBranchLoweringPass(*PR);
   initializeSIMemoryLegalizerPass(*PR);
@@ -1459,7 +1460,7 @@ void GCNPassConfig::addFastRegAlloc() {
   // This must be run immediately after phi elimination and before
   // TwoAddressInstructions, otherwise the processing of the tied operand of
   // SI_ELSE will introduce a copy of the tied operand source after the else.
-  insertPass(&PHIEliminationID, &SILowerControlFlowID);
+  insertPass(&PHIEliminationID, &SILowerControlFlowLegacyID);
 
   insertPass(&TwoAddressInstructionPassID, &SIWholeQuadModeID);
 
@@ -1480,7 +1481,7 @@ void GCNPassConfig::addOptimizedRegAlloc() {
   // This must be run immediately after phi elimination and before
   // TwoAddressInstructions, otherwise the processing of the tied operand of
   // SI_ELSE will introduce a copy of the tied operand source after the else.
-  insertPass(&PHIEliminationID, &SILowerControlFlowID);
+  insertPass(&PHIEliminationID, &SILowerControlFlowLegacyID);
 
   if (EnableRewritePartialRegUses)
 insertPass(&RenameIndependentSubregsID, &GCNRewritePartialRegUsesID);
diff --git a/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp 
b/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
index ed5390b96ed4b5..f8878f32f829d1 100644
--- a/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
+++ b/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
@@ -48,6 +48,7 @@
 /// %exec = S_OR_B64 %exec, %sgpr0 // Re-enable saved exec mask bits
 
//===--===//
 
+#include "SILowerControlFlow.h"
 #include "AMDGPU.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
@@ -68,7 +69,7 @@ RemoveRedundantEndcf("amdgpu-remove-redundant-endcf",
 
 namespace {
 
-class SILowerControlFlow : public MachineFunctionPass {
+class SIL

[llvm-branch-commits] [llvm] [AMDGPU][NewPM] Port SILowerControlFlow pass into NPM. (PR #123045)

2025-01-15 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/123045
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120387

>From 84a167203da1ceb4f7c2e74afce06c74b9fc3b0d Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:40:32 -0500
Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes.

Unaligned atomic vectors with size >1 are lowered to calls.
Adding their tests separately here.

commit-id:a06a5cc6
---
 llvm/test/CodeGen/X86/atomic-load-store.ll | 253 +
 1 file changed, 253 insertions(+)

diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 6efcbb80c0ce6d..39e9fdfa5e62b0 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_ptr:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_ptr:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
+
 define <1 x half> @atomic_vec1_half(ptr %x) {
 ; CHECK3-LABEL: atomic_vec1_half:
 ; CHECK3:   ## %bb.0:
@@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) 
nounwind {
   %ret = load atomic <1 x double>, ptr %x acquire, align 8
   ret <1 x double> %ret
 }
+
+define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_i64:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i64:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i64>, ptr %x acquire, align 4
+  ret <1 x i64> %ret
+}
+
+define <1 x double> @atomic_vec1_double(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_double:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_double:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 4
+  ret <1 x double> %ret
+}
+
+define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec2_i32:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i32:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i32>, ptr %x acquire, align 4
+  ret <2 x i32> %ret
+}
+
+define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_float_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From 5d4bb0bfa7ab38a6d0f792e26192be1c857095ff Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 25 +++--
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 52 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 15 ++
 .../X86/expand-atomic-non-integer.ll  | 14 +
 4 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index a75fa688d87a8d..490cc9aa4c3c80 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2060,9 +2060,28 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's T scalar type to I's <2 x T/2> vector type
+  if (I->getType()->getScalarType()->isIntOrPtrTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+TypeSize Size = Result->getType()->getPrimitiveSizeInBits();
+assert((unsigned)Size % 2 == 0);
+unsigned HalfSize = (unsigned)Size / 2;
+Value *Lo =
+Builder.CreateTrunc(Result, IntegerType::get(Ctx, HalfSize));
+Value *RS = Builder.CreateLShr(
+Result, ConstantInt::get(IntegerType::get(Ctx, Size), HalfSize));
+Value *Hi = Builder.CreateTrunc(RS, IntegerType::get(Ctx, HalfSize));
+Value *Vec = Builder.CreateInsertElement(
+VectorType::get(IntegerType::get(Ctx, HalfSize),
+cast(I->getType())->getElementCount()),
+Lo, ConstantInt::get(IntegerType::get(Ctx, 32), 0));
+Vec = Builder.CreateInsertElement(
+Vec, Hi, ConstantInt::get(IntegerType::get(Ctx, 32), 1));
+V = Builder.CreateBitOrPointerCast(Vec, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29d..8736c5bd9a9451 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,55 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:mov r0, #0
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 2d8cb14cf5fcde..3ce05f75fdcfe5 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -362,6 +362,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEX

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Split <2 x T> vector types for atomic load (PR #120640)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From f129a3c92a4be2c7263f6838317330ca76a17117 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG][X86] Split <2 x T> vector types for atomic
 load

Vector types of 2 elements that aren't widened are split
so that they can be vectorized within SelectionDAG. This change
utilizes the load vectorization infrastructure in order to
regroup the split elements. This enables SelectionDAG to
translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |  4 +-
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 31 
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 24 -
 .../SelectionDAGAddressAnalysis.cpp   | 30 ++-
 .../SelectionDAG/SelectionDAGBuilder.cpp  |  6 ++-
 llvm/lib/Target/X86/X86ISelLowering.cpp   | 50 ---
 llvm/test/CodeGen/X86/atomic-load-store.ll| 36 +
 8 files changed, 147 insertions(+), 35 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba0538f7084eec..648ad14a67cb6c 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1837,7 +1837,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2263,6 +2263,8 @@ class SelectionDAG {
   /// location that the 'Base' load is loading from.
   bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
   unsigned Bytes, int Dist) const;
+  bool areNonVolatileConsecutiveLoads(AtomicSDNode *LD, AtomicSDNode *Base,
+  unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
   /// cannot be inferred.
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index e45df5c40953c0..569777d3dfaab7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -946,6 +946,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index d412d8eafda4e5..0fa318fcdca2b0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1152,6 +1152,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1395,6 +1398,34 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  SDLoc dl(LD);
+
+  EVT MemoryVT = LD->getMemoryVT();
+  unsigned NumElts = MemoryVT.getVectorMinNumElements();
+
+  EVT IntMemoryVT = EVT::getVectorVT(*DAG.getContext(), MVT::i16, NumElts);
+  EVT ElemVT = EVT::getVectorVT(*DAG.getContext(),
+MemoryVT.getVectorElementType(), 1);
+
+  // Create a single atomic to load all the elements at once.
+  SDValue Atomic = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, IntMemoryVT, 
IntMemoryVT,
+ LD->getChain(), LD->getBasePtr(),
+ LD->getMemOperand());
+
+  // Instead of splitting, put all the elements back into a vector.
+  SmallVector Ops;
+  for (unsigned i = 0; i < NumElts; ++i) {
+SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i16, Atomic,
+  DAG.getVectorIdxConstant(i, dl));
+Elt = DAG.getBitcast(ElemVT, Elt);
+Ops.push_back(Elt);
+ 

[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385

>From 11174c52a0543aa16bb3da0dcd4c65aea44bcb2c Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:37:17 -0500
Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load

`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  15 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +-
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index caaa40a64c7e1d..4117da0f31c10b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -861,6 +861,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
   SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
+  SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N);
   SDValue ScalarizeVecRes_VSELECT(SDNode *N);
   SDValue ScalarizeVecRes_SELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index f39d9ca15496a9..980c157fe6f3f1 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, 
unsigned ResNo) {
 R = ScalarizeVecRes_UnaryOpWithExtraInput(N);
 break;
   case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+R = ScalarizeVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:   R = 
ScalarizeVecRes_LOAD(cast(N));break;
   case ISD::SCALAR_TO_VECTOR:  R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break;
   case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break;
@@ -457,6 +460,18 @@ SDValue 
DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) {
   return Op;
 }
 
+SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SDValue Result = DAG.getAtomic(
+  ISD::ATOMIC_LOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(),
+  N->getValueType(0).getVectorElementType(), N->getChain(), 
N->getBasePtr(),
+  N->getMemOperand());
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
+  return Result;
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) {
   assert(N->isUnindexed() && "Indexed vector load?");
 
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 5bce4401f7bdb0..d23cfb89f9fc87 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s --check-prefixes=CHECK,CHECK0
 
 define void @test1(ptr %ptr, i32 %val1) {
 ; CHECK-LABEL: test1:
@@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) {
   %val = load atomic i32, ptr %ptr seq_cst, align 4
   ret i32 %val
 }
+
+define <1 x i32> @atomic_vec1_i32(ptr %x) {
+; CHECK-LABEL: atomic_vec1_i32:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x i32>, ptr %x acquire, align 4
+  ret <1 x i32> %ret
+}
+
+define <1 x i8> @atomic_vec1_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzbl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movb (%rdi), %al
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i8>, ptr %x acquire, align 1
+  ret <1 x i8> %ret
+}
+
+define <1 x i16> @atomic_vec1_i16(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i16>, ptr %x acquire, align 2
+  ret <1 x i16> %ret
+}
+
+define <1 x i32> @atomic_vec1_i8_zext(ptr %x) {
+; CHECK3-LABEL: atomic_

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From e07e225cd8454b4421d852ff7d0e98e4a14800b0 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  5 +-
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 38 +-
 llvm/lib/Target/X86/X86InstrCompiler.td   |  7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 72 +++
 llvm/test/CodeGen/X86/atomic-unordered.ll |  3 +-
 5 files changed, 118 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 4117da0f31c10b..e45df5c40953c0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1046,6 +1046,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
@@ -1129,8 +1130,8 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   /// resulting wider type. It takes:
   ///   LdChain: list of chains for the load to be generated.
   ///   Ld:  load to widen
-  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain,
-  LoadSDNode *LD);
+  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain, MemSDNode *LD,
+  bool IsAtomic = false);
 
   /// Helper function to generate a set of extension loads to load a vector 
with
   /// a resulting wider type. It takes:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 980c157fe6f3f1..d412d8eafda4e5 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4521,6 +4521,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -5907,6 +5910,30 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SmallVector LdChain; // Chain for the series of load
+  SDValue Result = GenWidenVectorLoads(LdChain, N, /*IsAtomic=*/true);
+
+  if (Result) {
+// If we generate a single load, we can use that for the chain.  Otherwise,
+// build a factor node to remember the multiple loads are independent and
+// chain to that.
+SDValue NewChain;
+if (LdChain.size() == 1)
+  NewChain = LdChain[0];
+else
+  NewChain = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, LdChain);
+
+// Modified the chain - switch anything that used the old chain to use
+// the new one.
+ReplaceValueWith(SDValue(N, 1), NewChain);
+
+return Result;
+  }
+
+  report_fatal_error("Unable to widen atomic vector load");
+}
+
 SDValue DAGTypeLegalizer::WidenVecRes_LOAD(SDNode *N) {
   LoadSDNode *LD = cast(N);
   ISD::LoadExtType ExtType = LD->getExtensionType();
@@ -7706,7 +7733,7 @@ static SDValue BuildVectorFromScalar(SelectionDAG& DAG, 
EVT VecTy,
 }
 
 SDValue DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl 
&LdChain,
-  LoadSDNode *LD) {
+  MemSDNode *LD, bool IsAtomic) {
   // The strategy assumes that we can efficiently load power-of-two widths.
   // The routine chops the vector into the largest vector loads with the same
   // element type or scalar loads and then recombines it to the widen vector
@@ -7763,8 +7790,13 @@ SDValue 
DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl &LdChain,
 } while (TypeSize::isKnownGT(RemainingWidth, NewVTWidth));
   }
 
-  SDValue LdOp = DAG.getLoad(*FirstVT, dl, Chain, BasePtr, 
LD->getPointerInfo(),
- LD->getOriginalAlign(), MMOFlags, AAInfo);
+  SDValue LdOp;
+  if (IsAtomic)
+LdOp = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, *FirstVT, *FirstVT, Chain,
+  

[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120387

>From 84a167203da1ceb4f7c2e74afce06c74b9fc3b0d Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:40:32 -0500
Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes.

Unaligned atomic vectors with size >1 are lowered to calls.
Adding their tests separately here.

commit-id:a06a5cc6
---
 llvm/test/CodeGen/X86/atomic-load-store.ll | 253 +
 1 file changed, 253 insertions(+)

diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 6efcbb80c0ce6d..39e9fdfa5e62b0 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_ptr:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_ptr:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
+
 define <1 x half> @atomic_vec1_half(ptr %x) {
 ; CHECK3-LABEL: atomic_vec1_half:
 ; CHECK3:   ## %bb.0:
@@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) 
nounwind {
   %ret = load atomic <1 x double>, ptr %x acquire, align 8
   ret <1 x double> %ret
 }
+
+define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_i64:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i64:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i64>, ptr %x acquire, align 4
+  ret <1 x i64> %ret
+}
+
+define <1 x double> @atomic_vec1_double(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_double:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_double:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 4
+  ret <1 x double> %ret
+}
+
+define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec2_i32:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i32:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i32>, ptr %x acquire, align 4
+  ret <2 x i32> %ret
+}
+
+define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_float_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From 5d4bb0bfa7ab38a6d0f792e26192be1c857095ff Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 25 +++--
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 52 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 15 ++
 .../X86/expand-atomic-non-integer.ll  | 14 +
 4 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index a75fa688d87a8d..490cc9aa4c3c80 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2060,9 +2060,28 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's T scalar type to I's <2 x T/2> vector type
+  if (I->getType()->getScalarType()->isIntOrPtrTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+TypeSize Size = Result->getType()->getPrimitiveSizeInBits();
+assert((unsigned)Size % 2 == 0);
+unsigned HalfSize = (unsigned)Size / 2;
+Value *Lo =
+Builder.CreateTrunc(Result, IntegerType::get(Ctx, HalfSize));
+Value *RS = Builder.CreateLShr(
+Result, ConstantInt::get(IntegerType::get(Ctx, Size), HalfSize));
+Value *Hi = Builder.CreateTrunc(RS, IntegerType::get(Ctx, HalfSize));
+Value *Vec = Builder.CreateInsertElement(
+VectorType::get(IntegerType::get(Ctx, HalfSize),
+cast(I->getType())->getElementCount()),
+Lo, ConstantInt::get(IntegerType::get(Ctx, 32), 0));
+Vec = Builder.CreateInsertElement(
+Vec, Hi, ConstantInt::get(IntegerType::get(Ctx, 32), 1));
+V = Builder.CreateBitOrPointerCast(Vec, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29d..8736c5bd9a9451 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,55 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:mov r0, #0
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 2d8cb14cf5fcde..3ce05f75fdcfe5 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -362,6 +362,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEX

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From e07e225cd8454b4421d852ff7d0e98e4a14800b0 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  5 +-
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 38 +-
 llvm/lib/Target/X86/X86InstrCompiler.td   |  7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 72 +++
 llvm/test/CodeGen/X86/atomic-unordered.ll |  3 +-
 5 files changed, 118 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 4117da0f31c10b..e45df5c40953c0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1046,6 +1046,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
@@ -1129,8 +1130,8 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   /// resulting wider type. It takes:
   ///   LdChain: list of chains for the load to be generated.
   ///   Ld:  load to widen
-  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain,
-  LoadSDNode *LD);
+  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain, MemSDNode *LD,
+  bool IsAtomic = false);
 
   /// Helper function to generate a set of extension loads to load a vector 
with
   /// a resulting wider type. It takes:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 980c157fe6f3f1..d412d8eafda4e5 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4521,6 +4521,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -5907,6 +5910,30 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SmallVector LdChain; // Chain for the series of load
+  SDValue Result = GenWidenVectorLoads(LdChain, N, /*IsAtomic=*/true);
+
+  if (Result) {
+// If we generate a single load, we can use that for the chain.  Otherwise,
+// build a factor node to remember the multiple loads are independent and
+// chain to that.
+SDValue NewChain;
+if (LdChain.size() == 1)
+  NewChain = LdChain[0];
+else
+  NewChain = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, LdChain);
+
+// Modified the chain - switch anything that used the old chain to use
+// the new one.
+ReplaceValueWith(SDValue(N, 1), NewChain);
+
+return Result;
+  }
+
+  report_fatal_error("Unable to widen atomic vector load");
+}
+
 SDValue DAGTypeLegalizer::WidenVecRes_LOAD(SDNode *N) {
   LoadSDNode *LD = cast(N);
   ISD::LoadExtType ExtType = LD->getExtensionType();
@@ -7706,7 +7733,7 @@ static SDValue BuildVectorFromScalar(SelectionDAG& DAG, 
EVT VecTy,
 }
 
 SDValue DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl 
&LdChain,
-  LoadSDNode *LD) {
+  MemSDNode *LD, bool IsAtomic) {
   // The strategy assumes that we can efficiently load power-of-two widths.
   // The routine chops the vector into the largest vector loads with the same
   // element type or scalar loads and then recombines it to the widen vector
@@ -7763,8 +7790,13 @@ SDValue 
DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl &LdChain,
 } while (TypeSize::isKnownGT(RemainingWidth, NewVTWidth));
   }
 
-  SDValue LdOp = DAG.getLoad(*FirstVT, dl, Chain, BasePtr, 
LD->getPointerInfo(),
- LD->getOriginalAlign(), MMOFlags, AAInfo);
+  SDValue LdOp;
+  if (IsAtomic)
+LdOp = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, *FirstVT, *FirstVT, Chain,
+  

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From e07e225cd8454b4421d852ff7d0e98e4a14800b0 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  5 +-
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 38 +-
 llvm/lib/Target/X86/X86InstrCompiler.td   |  7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 72 +++
 llvm/test/CodeGen/X86/atomic-unordered.ll |  3 +-
 5 files changed, 118 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 4117da0f31c10b..e45df5c40953c0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1046,6 +1046,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
@@ -1129,8 +1130,8 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   /// resulting wider type. It takes:
   ///   LdChain: list of chains for the load to be generated.
   ///   Ld:  load to widen
-  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain,
-  LoadSDNode *LD);
+  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain, MemSDNode *LD,
+  bool IsAtomic = false);
 
   /// Helper function to generate a set of extension loads to load a vector 
with
   /// a resulting wider type. It takes:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 980c157fe6f3f1..d412d8eafda4e5 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4521,6 +4521,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -5907,6 +5910,30 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SmallVector LdChain; // Chain for the series of load
+  SDValue Result = GenWidenVectorLoads(LdChain, N, /*IsAtomic=*/true);
+
+  if (Result) {
+// If we generate a single load, we can use that for the chain.  Otherwise,
+// build a factor node to remember the multiple loads are independent and
+// chain to that.
+SDValue NewChain;
+if (LdChain.size() == 1)
+  NewChain = LdChain[0];
+else
+  NewChain = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, LdChain);
+
+// Modified the chain - switch anything that used the old chain to use
+// the new one.
+ReplaceValueWith(SDValue(N, 1), NewChain);
+
+return Result;
+  }
+
+  report_fatal_error("Unable to widen atomic vector load");
+}
+
 SDValue DAGTypeLegalizer::WidenVecRes_LOAD(SDNode *N) {
   LoadSDNode *LD = cast(N);
   ISD::LoadExtType ExtType = LD->getExtensionType();
@@ -7706,7 +7733,7 @@ static SDValue BuildVectorFromScalar(SelectionDAG& DAG, 
EVT VecTy,
 }
 
 SDValue DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl 
&LdChain,
-  LoadSDNode *LD) {
+  MemSDNode *LD, bool IsAtomic) {
   // The strategy assumes that we can efficiently load power-of-two widths.
   // The routine chops the vector into the largest vector loads with the same
   // element type or scalar loads and then recombines it to the widen vector
@@ -7763,8 +7790,13 @@ SDValue 
DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl &LdChain,
 } while (TypeSize::isKnownGT(RemainingWidth, NewVTWidth));
   }
 
-  SDValue LdOp = DAG.getLoad(*FirstVT, dl, Chain, BasePtr, 
LD->getPointerInfo(),
- LD->getOriginalAlign(), MMOFlags, AAInfo);
+  SDValue LdOp;
+  if (IsAtomic)
+LdOp = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, *FirstVT, *FirstVT, Chain,
+  

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From 5d4bb0bfa7ab38a6d0f792e26192be1c857095ff Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 25 +++--
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 52 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 15 ++
 .../X86/expand-atomic-non-integer.ll  | 14 +
 4 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index a75fa688d87a8d..490cc9aa4c3c80 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2060,9 +2060,28 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's T scalar type to I's <2 x T/2> vector type
+  if (I->getType()->getScalarType()->isIntOrPtrTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+TypeSize Size = Result->getType()->getPrimitiveSizeInBits();
+assert((unsigned)Size % 2 == 0);
+unsigned HalfSize = (unsigned)Size / 2;
+Value *Lo =
+Builder.CreateTrunc(Result, IntegerType::get(Ctx, HalfSize));
+Value *RS = Builder.CreateLShr(
+Result, ConstantInt::get(IntegerType::get(Ctx, Size), HalfSize));
+Value *Hi = Builder.CreateTrunc(RS, IntegerType::get(Ctx, HalfSize));
+Value *Vec = Builder.CreateInsertElement(
+VectorType::get(IntegerType::get(Ctx, HalfSize),
+cast(I->getType())->getElementCount()),
+Lo, ConstantInt::get(IntegerType::get(Ctx, 32), 0));
+Vec = Builder.CreateInsertElement(
+Vec, Hi, ConstantInt::get(IntegerType::get(Ctx, 32), 1));
+V = Builder.CreateBitOrPointerCast(Vec, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29d..8736c5bd9a9451 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,55 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:mov r0, #0
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 2d8cb14cf5fcde..3ce05f75fdcfe5 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -362,6 +362,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEX

[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385

>From 11174c52a0543aa16bb3da0dcd4c65aea44bcb2c Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:37:17 -0500
Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load

`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  15 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +-
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index caaa40a64c7e1d..4117da0f31c10b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -861,6 +861,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
   SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
+  SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N);
   SDValue ScalarizeVecRes_VSELECT(SDNode *N);
   SDValue ScalarizeVecRes_SELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index f39d9ca15496a9..980c157fe6f3f1 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, 
unsigned ResNo) {
 R = ScalarizeVecRes_UnaryOpWithExtraInput(N);
 break;
   case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+R = ScalarizeVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:   R = 
ScalarizeVecRes_LOAD(cast(N));break;
   case ISD::SCALAR_TO_VECTOR:  R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break;
   case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break;
@@ -457,6 +460,18 @@ SDValue 
DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) {
   return Op;
 }
 
+SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SDValue Result = DAG.getAtomic(
+  ISD::ATOMIC_LOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(),
+  N->getValueType(0).getVectorElementType(), N->getChain(), 
N->getBasePtr(),
+  N->getMemOperand());
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
+  return Result;
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) {
   assert(N->isUnindexed() && "Indexed vector load?");
 
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 5bce4401f7bdb0..d23cfb89f9fc87 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s --check-prefixes=CHECK,CHECK0
 
 define void @test1(ptr %ptr, i32 %val1) {
 ; CHECK-LABEL: test1:
@@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) {
   %val = load atomic i32, ptr %ptr seq_cst, align 4
   ret i32 %val
 }
+
+define <1 x i32> @atomic_vec1_i32(ptr %x) {
+; CHECK-LABEL: atomic_vec1_i32:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x i32>, ptr %x acquire, align 4
+  ret <1 x i32> %ret
+}
+
+define <1 x i8> @atomic_vec1_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzbl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movb (%rdi), %al
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i8>, ptr %x acquire, align 1
+  ret <1 x i8> %ret
+}
+
+define <1 x i16> @atomic_vec1_i16(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i16>, ptr %x acquire, align 2
+  ret <1 x i16> %ret
+}
+
+define <1 x i32> @atomic_vec1_i8_zext(ptr %x) {
+; CHECK3-LABEL: atomic_

[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120386

>From 9a7ee2183272386923385a9e857a33972385cf13 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:38:23 -0500
Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
---
 llvm/lib/Target/X86/X86ISelLowering.cpp|  4 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 90e3e15b1fb46c..124134db0f55ac 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2639,6 +2639,10 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
 setOperationAction(Op, MVT::f32, Promote);
   }
 
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64);
+
   // We have target-specific dag combine patterns for the following nodes:
   setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::SCALAR_TO_VECTOR,
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index d23cfb89f9fc87..6efcbb80c0ce6d 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   %ret = load atomic <1 x i64>, ptr %x acquire, align 8
   ret <1 x i64> %ret
 }
+
+define <1 x half> @atomic_vec1_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %cx
+; CHECK0-NEXT:## implicit-def: $eax
+; CHECK0-NEXT:movw %cx, %ax
+; CHECK0-NEXT:## implicit-def: $xmm0
+; CHECK0-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x half>, ptr %x acquire, align 2
+  ret <1 x half> %ret
+}
+
+define <1 x float> @atomic_vec1_float(ptr %x) {
+; CHECK-LABEL: atomic_vec1_float:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x float>, ptr %x acquire, align 4
+  ret <1 x float> %ret
+}
+
+define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec1_double_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 8
+  ret <1 x double> %ret
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SelectionDAG][X86] Split <2 x T> vector types for atomic load (PR #120640)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From f129a3c92a4be2c7263f6838317330ca76a17117 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG][X86] Split <2 x T> vector types for atomic
 load

Vector types of 2 elements that aren't widened are split
so that they can be vectorized within SelectionDAG. This change
utilizes the load vectorization infrastructure in order to
regroup the split elements. This enables SelectionDAG to
translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |  4 +-
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 31 
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 24 -
 .../SelectionDAGAddressAnalysis.cpp   | 30 ++-
 .../SelectionDAG/SelectionDAGBuilder.cpp  |  6 ++-
 llvm/lib/Target/X86/X86ISelLowering.cpp   | 50 ---
 llvm/test/CodeGen/X86/atomic-load-store.ll| 36 +
 8 files changed, 147 insertions(+), 35 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba0538f7084eec..648ad14a67cb6c 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1837,7 +1837,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2263,6 +2263,8 @@ class SelectionDAG {
   /// location that the 'Base' load is loading from.
   bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
   unsigned Bytes, int Dist) const;
+  bool areNonVolatileConsecutiveLoads(AtomicSDNode *LD, AtomicSDNode *Base,
+  unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
   /// cannot be inferred.
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index e45df5c40953c0..569777d3dfaab7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -946,6 +946,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index d412d8eafda4e5..0fa318fcdca2b0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1152,6 +1152,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1395,6 +1398,34 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  SDLoc dl(LD);
+
+  EVT MemoryVT = LD->getMemoryVT();
+  unsigned NumElts = MemoryVT.getVectorMinNumElements();
+
+  EVT IntMemoryVT = EVT::getVectorVT(*DAG.getContext(), MVT::i16, NumElts);
+  EVT ElemVT = EVT::getVectorVT(*DAG.getContext(),
+MemoryVT.getVectorElementType(), 1);
+
+  // Create a single atomic to load all the elements at once.
+  SDValue Atomic = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, IntMemoryVT, 
IntMemoryVT,
+ LD->getChain(), LD->getBasePtr(),
+ LD->getMemOperand());
+
+  // Instead of splitting, put all the elements back into a vector.
+  SmallVector Ops;
+  for (unsigned i = 0; i < NumElts; ++i) {
+SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i16, Atomic,
+  DAG.getVectorIdxConstant(i, dl));
+Elt = DAG.getBitcast(ElemVT, Elt);
+Ops.push_back(Elt);
+ 

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Split <2 x T> vector types for atomic load (PR #120640)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From f129a3c92a4be2c7263f6838317330ca76a17117 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG][X86] Split <2 x T> vector types for atomic
 load

Vector types of 2 elements that aren't widened are split
so that they can be vectorized within SelectionDAG. This change
utilizes the load vectorization infrastructure in order to
regroup the split elements. This enables SelectionDAG to
translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |  4 +-
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 31 
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 24 -
 .../SelectionDAGAddressAnalysis.cpp   | 30 ++-
 .../SelectionDAG/SelectionDAGBuilder.cpp  |  6 ++-
 llvm/lib/Target/X86/X86ISelLowering.cpp   | 50 ---
 llvm/test/CodeGen/X86/atomic-load-store.ll| 36 +
 8 files changed, 147 insertions(+), 35 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba0538f7084eec..648ad14a67cb6c 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1837,7 +1837,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2263,6 +2263,8 @@ class SelectionDAG {
   /// location that the 'Base' load is loading from.
   bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
   unsigned Bytes, int Dist) const;
+  bool areNonVolatileConsecutiveLoads(AtomicSDNode *LD, AtomicSDNode *Base,
+  unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
   /// cannot be inferred.
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index e45df5c40953c0..569777d3dfaab7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -946,6 +946,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index d412d8eafda4e5..0fa318fcdca2b0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1152,6 +1152,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1395,6 +1398,34 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  SDLoc dl(LD);
+
+  EVT MemoryVT = LD->getMemoryVT();
+  unsigned NumElts = MemoryVT.getVectorMinNumElements();
+
+  EVT IntMemoryVT = EVT::getVectorVT(*DAG.getContext(), MVT::i16, NumElts);
+  EVT ElemVT = EVT::getVectorVT(*DAG.getContext(),
+MemoryVT.getVectorElementType(), 1);
+
+  // Create a single atomic to load all the elements at once.
+  SDValue Atomic = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, IntMemoryVT, 
IntMemoryVT,
+ LD->getChain(), LD->getBasePtr(),
+ LD->getMemOperand());
+
+  // Instead of splitting, put all the elements back into a vector.
+  SmallVector Ops;
+  for (unsigned i = 0; i < NumElts; ++i) {
+SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i16, Atomic,
+  DAG.getVectorIdxConstant(i, dl));
+Elt = DAG.getBitcast(ElemVT, Elt);
+Ops.push_back(Elt);
+ 

[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120386

>From 9a7ee2183272386923385a9e857a33972385cf13 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:38:23 -0500
Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
---
 llvm/lib/Target/X86/X86ISelLowering.cpp|  4 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 90e3e15b1fb46c..124134db0f55ac 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2639,6 +2639,10 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
 setOperationAction(Op, MVT::f32, Promote);
   }
 
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64);
+
   // We have target-specific dag combine patterns for the following nodes:
   setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::SCALAR_TO_VECTOR,
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index d23cfb89f9fc87..6efcbb80c0ce6d 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   %ret = load atomic <1 x i64>, ptr %x acquire, align 8
   ret <1 x i64> %ret
 }
+
+define <1 x half> @atomic_vec1_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %cx
+; CHECK0-NEXT:## implicit-def: $eax
+; CHECK0-NEXT:movw %cx, %ax
+; CHECK0-NEXT:## implicit-def: $xmm0
+; CHECK0-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x half>, ptr %x acquire, align 2
+  ret <1 x half> %ret
+}
+
+define <1 x float> @atomic_vec1_float(ptr %x) {
+; CHECK-LABEL: atomic_vec1_float:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x float>, ptr %x acquire, align 4
+  ret <1 x float> %ret
+}
+
+define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec1_double_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 8
+  ret <1 x double> %ret
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From 36964209fb53025cf81abe170913bc60f64ba66c Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 25 +++--
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 52 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 15 ++
 .../X86/expand-atomic-non-integer.ll  | 14 +
 4 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index a75fa688d87a8d..490cc9aa4c3c80 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2060,9 +2060,28 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's T scalar type to I's <2 x T/2> vector type
+  if (I->getType()->getScalarType()->isIntOrPtrTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+TypeSize Size = Result->getType()->getPrimitiveSizeInBits();
+assert((unsigned)Size % 2 == 0);
+unsigned HalfSize = (unsigned)Size / 2;
+Value *Lo =
+Builder.CreateTrunc(Result, IntegerType::get(Ctx, HalfSize));
+Value *RS = Builder.CreateLShr(
+Result, ConstantInt::get(IntegerType::get(Ctx, Size), HalfSize));
+Value *Hi = Builder.CreateTrunc(RS, IntegerType::get(Ctx, HalfSize));
+Value *Vec = Builder.CreateInsertElement(
+VectorType::get(IntegerType::get(Ctx, HalfSize),
+cast(I->getType())->getElementCount()),
+Lo, ConstantInt::get(IntegerType::get(Ctx, 32), 0));
+Vec = Builder.CreateInsertElement(
+Vec, Hi, ConstantInt::get(IntegerType::get(Ctx, 32), 1));
+V = Builder.CreateBitOrPointerCast(Vec, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29d..8736c5bd9a9451 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,55 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:mov r0, #0
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 2d8cb14cf5fcde..3ce05f75fdcfe5 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -362,6 +362,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEX

[llvm-branch-commits] [llvm] [AMDGPU][NewPM] Port SILowerControlFlow pass into NPM. (PR #123045)

2025-01-15 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Christudasan Devadasan (cdevadas)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/123045.diff


8 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPU.h (+2-2) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def (+1) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+4-3) 
- (modified) llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp (+48-15) 
- (added) llvm/lib/Target/AMDGPU/SILowerControlFlow.h (+22) 
- (modified) llvm/test/CodeGen/AMDGPU/collapse-endcf.mir (+1) 
- (modified) llvm/test/CodeGen/AMDGPU/lower-control-flow-live-intervals.mir 
(+1) 
- (modified) llvm/test/CodeGen/AMDGPU/lower-control-flow-other-terminators.mir 
(+1) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 400c5f219cc702..89356df39724a4 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -207,8 +207,8 @@ extern char &SILoadStoreOptimizerLegacyID;
 void initializeSIWholeQuadModePass(PassRegistry &);
 extern char &SIWholeQuadModeID;
 
-void initializeSILowerControlFlowPass(PassRegistry &);
-extern char &SILowerControlFlowID;
+void initializeSILowerControlFlowLegacyPass(PassRegistry &);
+extern char &SILowerControlFlowLegacyID;
 
 void initializeSIPreEmitPeepholePass(PassRegistry &);
 extern char &SIPreEmitPeepholeID;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 6f322074ba74cc..fbcf83e2fdd60b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -102,6 +102,7 @@ MACHINE_FUNCTION_PASS("si-i1-copies", SILowerI1CopiesPass())
 MACHINE_FUNCTION_PASS("si-fold-operands", SIFoldOperandsPass());
 MACHINE_FUNCTION_PASS("gcn-dpp-combine", GCNDPPCombinePass())
 MACHINE_FUNCTION_PASS("si-load-store-opt", SILoadStoreOptimizerPass())
+MACHINE_FUNCTION_PASS("si-lower-control-flow", SILowerControlFlowPass())
 MACHINE_FUNCTION_PASS("si-lower-sgpr-spills", SILowerSGPRSpillsPass())
 MACHINE_FUNCTION_PASS("si-opt-vgpr-liverange", SIOptimizeVGPRLiveRangePass())
 MACHINE_FUNCTION_PASS("si-peephole-sdwa", SIPeepholeSDWAPass())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 6d4547dbc82c3b..98268b848f5ce6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -38,6 +38,7 @@
 #include "SIFixSGPRCopies.h"
 #include "SIFoldOperands.h"
 #include "SILoadStoreOptimizer.h"
+#include "SILowerControlFlow.h"
 #include "SILowerSGPRSpills.h"
 #include "SIMachineFunctionInfo.h"
 #include "SIMachineScheduler.h"
@@ -523,7 +524,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void 
LLVMInitializeAMDGPUTarget() {
   initializeSIInsertWaitcntsPass(*PR);
   initializeSIModeRegisterPass(*PR);
   initializeSIWholeQuadModePass(*PR);
-  initializeSILowerControlFlowPass(*PR);
+  initializeSILowerControlFlowLegacyPass(*PR);
   initializeSIPreEmitPeepholePass(*PR);
   initializeSILateBranchLoweringPass(*PR);
   initializeSIMemoryLegalizerPass(*PR);
@@ -1459,7 +1460,7 @@ void GCNPassConfig::addFastRegAlloc() {
   // This must be run immediately after phi elimination and before
   // TwoAddressInstructions, otherwise the processing of the tied operand of
   // SI_ELSE will introduce a copy of the tied operand source after the else.
-  insertPass(&PHIEliminationID, &SILowerControlFlowID);
+  insertPass(&PHIEliminationID, &SILowerControlFlowLegacyID);
 
   insertPass(&TwoAddressInstructionPassID, &SIWholeQuadModeID);
 
@@ -1480,7 +1481,7 @@ void GCNPassConfig::addOptimizedRegAlloc() {
   // This must be run immediately after phi elimination and before
   // TwoAddressInstructions, otherwise the processing of the tied operand of
   // SI_ELSE will introduce a copy of the tied operand source after the else.
-  insertPass(&PHIEliminationID, &SILowerControlFlowID);
+  insertPass(&PHIEliminationID, &SILowerControlFlowLegacyID);
 
   if (EnableRewritePartialRegUses)
 insertPass(&RenameIndependentSubregsID, &GCNRewritePartialRegUsesID);
diff --git a/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp 
b/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
index ed5390b96ed4b5..f8878f32f829d1 100644
--- a/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
+++ b/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
@@ -48,6 +48,7 @@
 /// %exec = S_OR_B64 %exec, %sgpr0 // Re-enable saved exec mask bits
 
//===--===//
 
+#include "SILowerControlFlow.h"
 #include "AMDGPU.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
@@ -68,7 +69,7 @@ RemoveRedundantEndcf("amdgpu-remove-redundant-endcf",
 
 namespace {
 
-class SILowerControlFlow : public MachineFunctionPass {
+class SILowerControlFlow {
 private:
   const SIRegisterInfo *TRI = nullptr;
   const SIInstrInfo *TII = nullptr;
@@ -135,10 +

[llvm-branch-commits] [llvm] [AMDGPU][NewPM] Port SILowerControlFlow pass into NPM. (PR #123045)

2025-01-15 Thread Christudasan Devadasan via llvm-branch-commits

cdevadas wrote:

I observed something while porting this pass.
The analysis LiveIntervals (LIS) uses the SlotIndexes (SI). There is no 
explicit use of SI in this pass. If we have to preserve LIS, it required us to 
preserve SI as well. When I initially failed to preserve SI, the following lit 
test went into an infinite loop (the cleanup phase hung).
llvm/test/CodeGen/AMDGPU/lower-control-flow-live-intervals.mir.
Then I realized that the Legacy pass flow preserved SI in getAnalysisUsage(). 
Is it expected to preserve all the dependent analyses alongside a parent 
analysis?

https://github.com/llvm/llvm-project/pull/123045
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][NewPM] Port SILowerControlFlow pass into NPM. (PR #123045)

2025-01-15 Thread Christudasan Devadasan via llvm-branch-commits

cdevadas wrote:

> I don't know what the rules are for the new pass manager.

Thanks @jayfoad. The assertion would definitely hint to the users what's amiss 
here. But the hang was quite baffling and difficult to track down. If 
PassManager can't automatically determine the dependent analyses and get them 
all preserved, we should at least have this assertion in place to avoid such 
strange behaviors.


https://github.com/llvm/llvm-project/pull/123045
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][NewPM] Port SILowerControlFlow pass into NPM. (PR #123045)

2025-01-15 Thread via llvm-branch-commits

paperchalice wrote:

In new pass manager it depends on `SomeAnalysis::Result::invalidate`. See 
https://llvm.org/docs/NewPassManager.html#implementing-analysis-invalidation
An analysis can always check related results by using 
[`PreservedAnalysisChecker`](https://llvm.org/doxygen/classllvm_1_1PreservedAnalyses_1_1PreservedAnalysisChecker.html)
 in `invalidate`.

https://github.com/llvm/llvm-project/pull/123045
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SelectionDAG][X86] Split via Concat vector types for atomic load (PR #120640)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From bec6d57db1c13a0840d2a60f5d4ea860a7710853 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG][X86] Split via Concat  vector types for
 atomic load

Vector types that aren't widened are 'split' via CONCAT_VECTORS
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |  4 +-
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 31 
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 24 -
 .../SelectionDAGAddressAnalysis.cpp   | 30 ++-
 .../SelectionDAG/SelectionDAGBuilder.cpp  |  6 ++-
 llvm/lib/Target/X86/X86ISelLowering.cpp   | 50 ---
 llvm/test/CodeGen/X86/atomic-load-store.ll| 36 +
 8 files changed, 147 insertions(+), 35 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba0538f7084eec..648ad14a67cb6c 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1837,7 +1837,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2263,6 +2263,8 @@ class SelectionDAG {
   /// location that the 'Base' load is loading from.
   bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
   unsigned Bytes, int Dist) const;
+  bool areNonVolatileConsecutiveLoads(AtomicSDNode *LD, AtomicSDNode *Base,
+  unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
   /// cannot be inferred.
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index e45df5c40953c0..569777d3dfaab7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -946,6 +946,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index d412d8eafda4e5..0fa318fcdca2b0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1152,6 +1152,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1395,6 +1398,34 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  SDLoc dl(LD);
+
+  EVT MemoryVT = LD->getMemoryVT();
+  unsigned NumElts = MemoryVT.getVectorMinNumElements();
+
+  EVT IntMemoryVT = EVT::getVectorVT(*DAG.getContext(), MVT::i16, NumElts);
+  EVT ElemVT = EVT::getVectorVT(*DAG.getContext(),
+MemoryVT.getVectorElementType(), 1);
+
+  // Create a single atomic to load all the elements at once.
+  SDValue Atomic = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, IntMemoryVT, 
IntMemoryVT,
+ LD->getChain(), LD->getBasePtr(),
+ LD->getMemOperand());
+
+  // Instead of splitting, put all the elements back into a vector.
+  SmallVector Ops;
+  for (unsigned i = 0; i < NumElts; ++i) {
+SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i16, Atomic,
+  DAG.getVectorIdxConstant(i, dl));
+Elt = DAG.getBitcast(ElemV

[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385

>From cdbe7273a1cd8e2efbbcbcebd3dd0cecbd5347cf Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:37:17 -0500
Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load

`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  15 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +-
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index caaa40a64c7e1d..4117da0f31c10b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -861,6 +861,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
   SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
+  SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N);
   SDValue ScalarizeVecRes_VSELECT(SDNode *N);
   SDValue ScalarizeVecRes_SELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index f39d9ca15496a9..980c157fe6f3f1 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, 
unsigned ResNo) {
 R = ScalarizeVecRes_UnaryOpWithExtraInput(N);
 break;
   case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+R = ScalarizeVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:   R = 
ScalarizeVecRes_LOAD(cast(N));break;
   case ISD::SCALAR_TO_VECTOR:  R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break;
   case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break;
@@ -457,6 +460,18 @@ SDValue 
DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) {
   return Op;
 }
 
+SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SDValue Result = DAG.getAtomic(
+  ISD::ATOMIC_LOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(),
+  N->getValueType(0).getVectorElementType(), N->getChain(), 
N->getBasePtr(),
+  N->getMemOperand());
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
+  return Result;
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) {
   assert(N->isUnindexed() && "Indexed vector load?");
 
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 5bce4401f7bdb0..d23cfb89f9fc87 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s --check-prefixes=CHECK,CHECK0
 
 define void @test1(ptr %ptr, i32 %val1) {
 ; CHECK-LABEL: test1:
@@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) {
   %val = load atomic i32, ptr %ptr seq_cst, align 4
   ret i32 %val
 }
+
+define <1 x i32> @atomic_vec1_i32(ptr %x) {
+; CHECK-LABEL: atomic_vec1_i32:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x i32>, ptr %x acquire, align 4
+  ret <1 x i32> %ret
+}
+
+define <1 x i8> @atomic_vec1_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzbl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movb (%rdi), %al
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i8>, ptr %x acquire, align 1
+  ret <1 x i8> %ret
+}
+
+define <1 x i16> @atomic_vec1_i16(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i16>, ptr %x acquire, align 2
+  ret <1 x i16> %ret
+}
+
+define <1 x i32> @atomic_vec1_i8_zext(ptr %x) {
+; CHECK3-LABEL: atomic_

[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120386

>From b2541fd83c6712aa2ea3eb6ae4fa5b0acf59b3c7 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:38:23 -0500
Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
---
 llvm/lib/Target/X86/X86ISelLowering.cpp|  4 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 90e3e15b1fb46c..124134db0f55ac 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2639,6 +2639,10 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
 setOperationAction(Op, MVT::f32, Promote);
   }
 
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64);
+
   // We have target-specific dag combine patterns for the following nodes:
   setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::SCALAR_TO_VECTOR,
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index d23cfb89f9fc87..6efcbb80c0ce6d 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   %ret = load atomic <1 x i64>, ptr %x acquire, align 8
   ret <1 x i64> %ret
 }
+
+define <1 x half> @atomic_vec1_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %cx
+; CHECK0-NEXT:## implicit-def: $eax
+; CHECK0-NEXT:movw %cx, %ax
+; CHECK0-NEXT:## implicit-def: $xmm0
+; CHECK0-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x half>, ptr %x acquire, align 2
+  ret <1 x half> %ret
+}
+
+define <1 x float> @atomic_vec1_float(ptr %x) {
+; CHECK-LABEL: atomic_vec1_float:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x float>, ptr %x acquire, align 4
+  ret <1 x float> %ret
+}
+
+define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec1_double_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 8
+  ret <1 x double> %ret
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From 04f0fd9229a40ce8b3ee556bcce0486fa6bc9fa5 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  5 +-
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 38 +-
 llvm/lib/Target/X86/X86InstrCompiler.td   |  7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 72 +++
 llvm/test/CodeGen/X86/atomic-unordered.ll |  3 +-
 5 files changed, 118 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 4117da0f31c10b..e45df5c40953c0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1046,6 +1046,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
@@ -1129,8 +1130,8 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   /// resulting wider type. It takes:
   ///   LdChain: list of chains for the load to be generated.
   ///   Ld:  load to widen
-  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain,
-  LoadSDNode *LD);
+  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain, MemSDNode *LD,
+  bool IsAtomic = false);
 
   /// Helper function to generate a set of extension loads to load a vector 
with
   /// a resulting wider type. It takes:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 980c157fe6f3f1..d412d8eafda4e5 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4521,6 +4521,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -5907,6 +5910,30 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SmallVector LdChain; // Chain for the series of load
+  SDValue Result = GenWidenVectorLoads(LdChain, N, /*IsAtomic=*/true);
+
+  if (Result) {
+// If we generate a single load, we can use that for the chain.  Otherwise,
+// build a factor node to remember the multiple loads are independent and
+// chain to that.
+SDValue NewChain;
+if (LdChain.size() == 1)
+  NewChain = LdChain[0];
+else
+  NewChain = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, LdChain);
+
+// Modified the chain - switch anything that used the old chain to use
+// the new one.
+ReplaceValueWith(SDValue(N, 1), NewChain);
+
+return Result;
+  }
+
+  report_fatal_error("Unable to widen atomic vector load");
+}
+
 SDValue DAGTypeLegalizer::WidenVecRes_LOAD(SDNode *N) {
   LoadSDNode *LD = cast(N);
   ISD::LoadExtType ExtType = LD->getExtensionType();
@@ -7706,7 +7733,7 @@ static SDValue BuildVectorFromScalar(SelectionDAG& DAG, 
EVT VecTy,
 }
 
 SDValue DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl 
&LdChain,
-  LoadSDNode *LD) {
+  MemSDNode *LD, bool IsAtomic) {
   // The strategy assumes that we can efficiently load power-of-two widths.
   // The routine chops the vector into the largest vector loads with the same
   // element type or scalar loads and then recombines it to the widen vector
@@ -7763,8 +7790,13 @@ SDValue 
DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl &LdChain,
 } while (TypeSize::isKnownGT(RemainingWidth, NewVTWidth));
   }
 
-  SDValue LdOp = DAG.getLoad(*FirstVT, dl, Chain, BasePtr, 
LD->getPointerInfo(),
- LD->getOriginalAlign(), MMOFlags, AAInfo);
+  SDValue LdOp;
+  if (IsAtomic)
+LdOp = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, *FirstVT, *FirstVT, Chain,
+  

[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120387

>From 6d84ebec14a4fcf88e34ff2eb201a2cd213ed15c Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:40:32 -0500
Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes.

Unaligned atomic vectors with size >1 are lowered to calls.
Adding their tests separately here.

commit-id:a06a5cc6
---
 llvm/test/CodeGen/X86/atomic-load-store.ll | 253 +
 1 file changed, 253 insertions(+)

diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 6efcbb80c0ce6d..39e9fdfa5e62b0 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_ptr:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_ptr:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
+
 define <1 x half> @atomic_vec1_half(ptr %x) {
 ; CHECK3-LABEL: atomic_vec1_half:
 ; CHECK3:   ## %bb.0:
@@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) 
nounwind {
   %ret = load atomic <1 x double>, ptr %x acquire, align 8
   ret <1 x double> %ret
 }
+
+define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_i64:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i64:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i64>, ptr %x acquire, align 4
+  ret <1 x i64> %ret
+}
+
+define <1 x double> @atomic_vec1_double(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_double:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_double:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 4
+  ret <1 x double> %ret
+}
+
+define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec2_i32:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i32:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i32>, ptr %x acquire, align 4
+  ret <2 x i32> %ret
+}
+
+define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_float_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Split via Concat vector types for atomic load (PR #120640)

2025-01-15 Thread via llvm-branch-commits

https://github.com/jofrn edited https://github.com/llvm/llvm-project/pull/120640
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ctxprof] add `toYAML` conversion to llvm-ctxprof-utils (PR #123131)

2025-01-15 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin edited 
https://github.com/llvm/llvm-project/pull/123131
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ctxprof] add conversion to yaml to llvm-ctxprof-utils (PR #123131)

2025-01-15 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin edited 
https://github.com/llvm/llvm-project/pull/123131
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ctxprof] add "toYaml" conversion to llvm-ctxprof-utils (PR #123131)

2025-01-15 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin edited 
https://github.com/llvm/llvm-project/pull/123131
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ctxprof] add `toYAML` conversion to llvm-ctxprof-utils (PR #123131)

2025-01-15 Thread Mircea Trofin via llvm-branch-commits

mtrofin wrote:

### Merge activity

* **Jan 15, 7:47 PM EST**: A user started a stack merge that includes this pull 
request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/123131).


https://github.com/llvm/llvm-project/pull/123131
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ctxprof] add `toYAML` conversion to llvm-ctxprof-utils (PR #123131)

2025-01-15 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin ready_for_review 
https://github.com/llvm/llvm-project/pull/123131
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ctxprof] add conversion to yaml to llvm-ctxprof-utils (PR #123131)

2025-01-15 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin created 
https://github.com/llvm/llvm-project/pull/123131

None

>From c8e600b9969239c2491fd97f011172f4032524bb Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Wed, 15 Jan 2025 14:10:19 -0800
Subject: [PATCH] [ctxprof] add conversion to yaml to llvm-ctxprof-utils

---
 .../tools/llvm-ctxprof-util/Inputs/valid.yaml | 26 ++--
 .../llvm-ctxprof-util/llvm-ctxprof-util.test  |  4 +-
 .../llvm-ctxprof-util/llvm-ctxprof-util.cpp   | 41 ---
 3 files changed, 51 insertions(+), 20 deletions(-)

diff --git a/llvm/test/tools/llvm-ctxprof-util/Inputs/valid.yaml 
b/llvm/test/tools/llvm-ctxprof-util/Inputs/valid.yaml
index 6080c2414d64a0..9bbf82d59c9134 100644
--- a/llvm/test/tools/llvm-ctxprof-util/Inputs/valid.yaml
+++ b/llvm/test/tools/llvm-ctxprof-util/Inputs/valid.yaml
@@ -1,13 +1,13 @@
-- Guid: 1000
-  Counters: [1, 2, 3]
-  Callsites:  - []
-  - 
-- Guid: 2000
-  Counters: [4, 5]
-- Guid: 18446744073709551613
-  Counters: [6, 7, 8]
-  - 
-- Guid: 3000
-  Counters: [40, 50]
-- Guid: 18446744073709551612
-  Counters: [5, 9, 10]
+
+- Guid:1000
+  Counters:[ 1, 2, 3 ]
+  Callsites:
+- [  ]
+- - Guid:2000
+Counters:[ 4, 5 ]
+  - Guid:18446744073709551613
+Counters:[ 6, 7, 8 ]
+- - Guid:3000
+Counters:[ 40, 50 ]
+- Guid:18446744073709551612
+  Counters:[ 5, 9, 10 ]
diff --git a/llvm/test/tools/llvm-ctxprof-util/llvm-ctxprof-util.test 
b/llvm/test/tools/llvm-ctxprof-util/llvm-ctxprof-util.test
index 91ebd1de59bb54..30bc8bce054101 100644
--- a/llvm/test/tools/llvm-ctxprof-util/llvm-ctxprof-util.test
+++ b/llvm/test/tools/llvm-ctxprof-util/llvm-ctxprof-util.test
@@ -4,7 +4,9 @@
 ; RUN: llvm-ctxprof-util fromYAML --input %S/Inputs/empty.yaml -output 
%t/empty.bitstream
 ; RUN: llvm-bcanalyzer --dump %t/empty.bitstream | FileCheck %s 
--check-prefix=EMPTY
 
-; RUN: llvm-ctxprof-util fromYAML --input %S/Inputs/valid.yaml -output 
%t/valid.bitstream
+; RUN: llvm-ctxprof-util fromYAML -input %S/Inputs/valid.yaml -output 
%t/valid.bitstream
+; RUN: llvm-ctxprof-util toYAML -input %t/valid.bitstream -output 
%t/valid2.yaml
+; RUN: diff %t/valid2.yaml %S/Inputs/valid.yaml
 
 ; For the valid case, check against a reference output.
 ; Note that uint64_t are printed as signed values by llvm-bcanalyzer:
diff --git a/llvm/tools/llvm-ctxprof-util/llvm-ctxprof-util.cpp 
b/llvm/tools/llvm-ctxprof-util/llvm-ctxprof-util.cpp
index cfa14b22c14698..314144ac6624c9 100644
--- a/llvm/tools/llvm-ctxprof-util/llvm-ctxprof-util.cpp
+++ b/llvm/tools/llvm-ctxprof-util/llvm-ctxprof-util.cpp
@@ -13,6 +13,7 @@
 
//===--===//
 
 #include "llvm/IR/GlobalValue.h"
+#include "llvm/ProfileData/PGOCtxProfReader.h"
 #include "llvm/ProfileData/PGOCtxProfWriter.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Error.h"
@@ -23,6 +24,7 @@
 using namespace llvm;
 
 static cl::SubCommand FromYAML("fromYAML", "Convert from yaml");
+static cl::SubCommand ToYAML("toYAML", "Convert to yaml");
 
 static cl::opt InputFilename(
 "input", cl::value_desc("input"), cl::init("-"),
@@ -35,15 +37,16 @@ static cl::opt InputFilename(
 "'Contexts', optional. An array containing arrays of contexts. The "
 "context array at a position 'i' is the set of callees at that "
 "callsite index. Use an empty array to indicate no callees."),
-cl::sub(FromYAML));
+cl::sub(FromYAML), cl::sub(ToYAML));
 
 static cl::opt OutputFilename("output", cl::value_desc("output"),
cl::init("-"),
cl::desc("Output file"),
-   cl::sub(FromYAML));
+   cl::sub(FromYAML), cl::sub(ToYAML));
 
+namespace {
 // Save the bitstream profile from the JSON representation.
-Error convertFromYAML() {
+Error convertFromYaml() {
   auto BufOrError =
   MemoryBuffer::getFileOrSTDIN(InputFilename, /*IsText=*/true);
   if (!BufOrError)
@@ -61,11 +64,30 @@ Error convertFromYAML() {
   return llvm::createCtxProfFromYAML(BufOrError.get()->getBuffer(), Out);
 }
 
+Error convertToYaml() {
+  auto BufOrError = MemoryBuffer::getFileOrSTDIN(InputFilename);
+  if (!BufOrError)
+return createFileError(InputFilename, BufOrError.getError());
+
+  std::error_code EC;
+  raw_fd_ostream Out(OutputFilename, EC);
+  if (EC)
+return createStringError(EC, "failed to open output");
+  PGOCtxProfileReader Reader(BufOrError.get()->getBuffer());
+  auto Prof = Reader.loadContexts();
+  if (!Prof)
+return Prof.takeError();
+  llvm::convertCtxProfToYaml(Out, *Prof);
+  Out << "\n";
+  return Error::success();
+}
+} // namespace
+
 int main(int 

[llvm-branch-commits] [llvm] [ctxprof] add conversion to yaml to llvm-ctxprof-utils (PR #123131)

2025-01-15 Thread Mircea Trofin via llvm-branch-commits

mtrofin wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/123131?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#123131** https://app.graphite.dev/github/pr/llvm/llvm-project/123131?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> šŸ‘ˆ https://app.graphite.dev/github/pr/llvm/llvm-project/123131?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#123108** https://app.graphite.dev/github/pr/llvm/llvm-project/123108?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/123131
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ctxprof] add `toYAML` conversion to llvm-ctxprof-utils (PR #123131)

2025-01-15 Thread Kazu Hirata via llvm-branch-commits

https://github.com/kazutakahirata edited 
https://github.com/llvm/llvm-project/pull/123131
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ctxprof] add `toYAML` conversion to llvm-ctxprof-utils (PR #123131)

2025-01-15 Thread Kazu Hirata via llvm-branch-commits

https://github.com/kazutakahirata approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/123131
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [Flang][OpenMP][MLIR] Initial declare target to for variables implementation (PR #119589)

2025-01-15 Thread via llvm-branch-commits

https://github.com/agozillon updated 
https://github.com/llvm/llvm-project/pull/119589

>From 39cc07bf3522646531ed458c2a11f4804b24f05f Mon Sep 17 00:00:00 2001
From: agozillon 
Date: Wed, 11 Dec 2024 04:52:00 -0600
Subject: [PATCH] [Flang][OpenMP][MLIR] Initial declare target to for variables
 implementation

While the infrastructure for declare target to/enter and link for variables 
exists in the MLIR dialect and at the Flang level, the current lowering from 
MLIR -> LLVM IR isn't in place, it's only in place for variables that have the 
link clause applied.

This PR aims to extend that lowering to an initial implementation that 
incorporates declare target to as well, which primarily requires changes in the 
OpenMPToLLVMIRTranslation phase. However, a minor addition to the OpenMP 
dialect was required to extend the declare target enumerator to include a 
default None field as well.

This also requires a minor change to the Flang lowering's 
MapInfoFinlization.cpp pass to alter the map type for descriptors to deal with 
cases where a variable is marked declare to.  Currently, when a descriptor 
variable is mapped declare target to the  descriptor component can become 
attatched, and cannot be updated, this results in issues when an unusual 
allocation range is specified (effectively an off-by X error). The current 
solution is to map the descriptor always, as we always require an up-to-date 
version of this data. However, this also requires an interlinked PR that adds a 
more intricate type of mapping of structures/record types that clang currently 
implements, to circumvent the overwriting of the pointer in the descriptor.

3/3 required PRs to enable declare target to mapping, this PR should pass all 
tests and provide an all green CI.

Co-authored-by: Raghu Maddhipatla raghu.maddhipa...@amd.com
---
 .../Optimizer/OpenMP/MapInfoFinalization.cpp  |   6 +-
 .../Lower/OpenMP/allocatable-array-bounds.f90 |   7 +-
 flang/test/Lower/OpenMP/allocatable-map.f90   |   2 +-
 flang/test/Lower/OpenMP/array-bounds.f90  |   2 +-
 .../OpenMP/declare-target-link-tarop-cap.f90  |   4 +-
 .../OpenMP/derived-type-allocatable-map.f90   |  12 +-
 flang/test/Lower/OpenMP/target.f90|   4 +-
 .../Transforms/omp-map-info-finalization.fir  |  20 +--
 .../mlir/Dialect/OpenMP/OpenMPEnums.td|   8 +-
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp  | 127 ++
 .../omptarget-declare-target-to-device.mlir   |  38 ++
 .../omptarget-declare-target-to-host.mlir |  35 +
 ...allocatable-vars-in-target-with-update.f90 |  52 +++
 ...arget-to-vars-target-region-and-update.f90 |  36 +
 ...t-to-zero-index-allocatable-target-map.f90 |  30 +
 15 files changed, 326 insertions(+), 57 deletions(-)
 create mode 100644 
mlir/test/Target/LLVMIR/omptarget-declare-target-to-device.mlir
 create mode 100644 
mlir/test/Target/LLVMIR/omptarget-declare-target-to-host.mlir
 create mode 100644 
offload/test/offloading/fortran/declare-target-to-allocatable-vars-in-target-with-update.f90
 create mode 100644 
offload/test/offloading/fortran/declare-target-to-vars-target-region-and-update.f90
 create mode 100644 
offload/test/offloading/fortran/declare-target-to-zero-index-allocatable-target-map.f90

diff --git a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp 
b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
index c63d2f4531a6f1..cb21769bfb8dda 100644
--- a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
+++ b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
@@ -254,8 +254,10 @@ class MapInfoFinalizationPass
 return llvm::to_underlying(
 hasImplicitMap
 ? llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO |
-  llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT
-: llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO);
+  llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT |
+  llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_ALWAYS
+: llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO |
+  llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_ALWAYS);
   }
 
   mlir::omp::MapInfoOp genDescriptorMemberMaps(mlir::omp::MapInfoOp op,
diff --git a/flang/test/Lower/OpenMP/allocatable-array-bounds.f90 
b/flang/test/Lower/OpenMP/allocatable-array-bounds.f90
index e66b6f17d8858a..ff54464dc328fd 100644
--- a/flang/test/Lower/OpenMP/allocatable-array-bounds.f90
+++ b/flang/test/Lower/OpenMP/allocatable-array-bounds.f90
@@ -24,7 +24,7 @@
 !HOST: %[[BOUNDS_1:.*]] = omp.map.bounds lower_bound(%[[LB_1]] : index) 
upper_bound(%[[UB_1]] : index) extent(%[[BOX_3]]#1 : index) stride(%[[BOX_2]]#2 
: index) start_idx(%[[BOX_1]]#0 : index) {stride_in_bytes = true}
 !HOST: %[[VAR_PTR_PTR:.*]] = fir.box_offset %[[DECLARE_1]]#1 base_addr : 
(!fir.ref>>>) -> 
!fir.llvm_ptr>>
 !HOST: %[[MAP_INFO_MEMBER:.*]] = omp.map.info var_ptr(%[[DECLARE_1]]#1 : 
!fir.ref>>>, i32) 
var_ptr_ptr(%[[VAR_PTR_PTR]] : !fir.llvm_ptr>>) 
map_c

[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-01-15 Thread via llvm-branch-commits

https://github.com/agozillon updated 
https://github.com/llvm/llvm-project/pull/119588

>From 86761377d8fddd81ee77dc97bda16b2e70601546 Mon Sep 17 00:00:00 2001
From: agozillon 
Date: Wed, 11 Dec 2024 09:11:58 -0600
Subject: [PATCH] [MLIR][OpenMP] Introduce overlapped record type map support

This PR introduces a new additional type of map lowering for record types that 
Clang currently supports, in which a user can map a top-level record type and 
then individual members with different mapping, effectively creating a sort of 
"overlapping" mapping that we attempt to cut around.

This is currently most predominantly used in Fortran, when mapping descriptors 
and there data, we map the descriptor and its data with separate map modifiers 
and "cut around" the pointer data, so that wedo not overwrite it unless the 
runtime deems it a neccesary action based on its reference counting mechanism. 
However, it is a mechanism that will come in handy/trigger when a user 
explitily maps a record type (derived type or structure) and then explicitly 
maps a member with a different map type.

These additions were predominantly in the OpenMPToLLVMIRTranslation.cpp file 
and phase, however, one Flang test that checks end-to-end IR compilation (as 
far as we care for now at least) was altered.

2/3 required PRs to enable declare target to mapping, should look at PR 3/3 to 
check for full green passes (this one will fail a number due to some 
dependencies).

Co-authored-by: Raghu Maddhipatla raghu.maddhipa...@amd.com
---
 .../OpenMP/map-types-and-sizes.f90| 130 +
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp  | 260 ++
 .../omptarget-data-use-dev-ordering.mlir  |  20 +-
 ...ptarget-overlapping-record-member-map.mlir |  65 +
 ...rget-record-type-with-ptr-member-host.mlir |  95 ---
 .../fortran/dtype-member-overlap-map.f90  |  47 
 6 files changed, 463 insertions(+), 154 deletions(-)
 create mode 100644 
mlir/test/Target/LLVMIR/omptarget-overlapping-record-member-map.mlir
 create mode 100644 offload/test/offloading/fortran/dtype-member-overlap-map.f90

diff --git a/flang/test/Integration/OpenMP/map-types-and-sizes.f90 
b/flang/test/Integration/OpenMP/map-types-and-sizes.f90
index dac4938157a60d..efa73245361895 100644
--- a/flang/test/Integration/OpenMP/map-types-and-sizes.f90
+++ b/flang/test/Integration/OpenMP/map-types-and-sizes.f90
@@ -30,8 +30,8 @@ subroutine mapType_array
   !$omp end target
 end subroutine mapType_array
 
-!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 
0, i64 24, i64 8, i64 4]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] 
[i64 32, i64 281474976711169, i64 281474976711171, i64 281474976711187]
+!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [5 x i64] [i64 
0, i64 0, i64 0, i64 8, i64 4]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [5 x i64] 
[i64 32, i64 281474976711173, i64 281474976711173, i64 281474976711171, i64 
281474976711187]
 subroutine mapType_ptr
   integer, pointer :: a
   !$omp target
@@ -39,8 +39,8 @@ subroutine mapType_ptr
   !$omp end target
 end subroutine mapType_ptr
 
-!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 
0, i64 24, i64 8, i64 4]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] 
[i64 32, i64 281474976711169, i64 281474976711171, i64 281474976711187]
+!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [5 x i64] [i64 
0, i64 0, i64 0, i64 8, i64 4]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [5 x i64] 
[i64 32, i64 281474976711173, i64 281474976711173, i64 281474976711171, i64 
281474976711187]
 subroutine mapType_allocatable
   integer, allocatable :: a
   allocate(a)
@@ -50,8 +50,8 @@ subroutine mapType_allocatable
   deallocate(a)
 end subroutine mapType_allocatable
 
-!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 
0, i64 24, i64 8, i64 4]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] 
[i64 32, i64 281474976710657, i64 281474976710659, i64 281474976710675]
+!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [5 x i64] [i64 
0, i64 0, i64 0, i64 8, i64 4]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [5 x i64] 
[i64 32, i64 281474976710661, i64 281474976710661, i64 281474976710659, i64 
281474976710675]
 subroutine mapType_ptr_explicit
   integer, pointer :: a
   !$omp target map(tofrom: a)
@@ -59,8 +59,8 @@ subroutine mapType_ptr_explicit
   !$omp end target
 end subroutine mapType_ptr_explicit
 
-!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 
0, i64 24, i64 8, i64 4]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] 
[i64 32, i64 281474976710657, i64 281474976710659, i64 281474976710675]
+!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [5 x i64] [i64 
0, i64 0, i64 0, i64 8, 

[llvm-branch-commits] [flang] [llvm] [Flang-RT] Build libflang_rt.so (PR #121782)

2025-01-15 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur updated 
https://github.com/llvm/llvm-project/pull/121782

>From a3037ab5557dcc4a4deb5bb40f801ca9770e3854 Mon Sep 17 00:00:00 2001
From: Michael Kruse 
Date: Mon, 6 Jan 2025 16:44:08 +0100
Subject: [PATCH 1/5] Add FLANG_RT_ENABLE_STATIC and FLANG_RT_ENABLE_SHARED

---
 flang-rt/CMakeLists.txt   |  30 ++
 flang-rt/cmake/modules/AddFlangRT.cmake   | 291 --
 .../cmake/modules/AddFlangRTOffload.cmake |   8 +-
 flang-rt/cmake/modules/GetToolchainDirs.cmake | 254 +++
 flang-rt/lib/flang_rt/CMakeLists.txt  |  20 +-
 flang-rt/test/CMakeLists.txt  |   2 +-
 flang-rt/test/lit.cfg.py  |   2 +-
 7 files changed, 366 insertions(+), 241 deletions(-)

diff --git a/flang-rt/CMakeLists.txt b/flang-rt/CMakeLists.txt
index 7b3d22e454a108..7effa6012a078f 100644
--- a/flang-rt/CMakeLists.txt
+++ b/flang-rt/CMakeLists.txt
@@ -113,6 +113,15 @@ cmake_path(NORMAL_PATH FLANG_RT_OUTPUT_RESOURCE_DIR)
 cmake_path(NORMAL_PATH FLANG_RT_INSTALL_RESOURCE_PATH)
 
 # Determine subdirectories for build output and install destinations.
+# FIXME: For the libflang_rt.so, the toolchain resource lib dir is not a good
+#destination because it is not a ld.so default search path.
+#The machine where the executable is eventually executed may not be the
+#machine where the Flang compiler and its resource dir is installed, so
+#setting RPath by the driver is not an solution. It should belong into
+#/usr/lib//libflang_rt.so, like e.g. libgcc_s.so.
+#But the linker as invoked by the Flang driver also requires
+#libflang_rt.so to be found when linking and the resource lib dir is
+#the only reliable location.
 get_toolchain_library_subdir(toolchain_lib_subdir)
 extend_path(FLANG_RT_OUTPUT_RESOURCE_LIB_DIR "${FLANG_RT_OUTPUT_RESOURCE_DIR}" 
"${toolchain_lib_subdir}")
 extend_path(FLANG_RT_INSTALL_RESOURCE_LIB_PATH 
"${FLANG_RT_INSTALL_RESOURCE_PATH}" "${toolchain_lib_subdir}")
@@ -130,6 +139,27 @@ cmake_path(NORMAL_PATH FLANG_RT_INSTALL_RESOURCE_LIB_PATH)
 option(FLANG_RT_INCLUDE_TESTS "Generate build targets for the flang-rt unit 
and regression-tests." "${LLVM_INCLUDE_TESTS}")
 
 
+option(FLANG_RT_ENABLE_STATIC "Build Flang-RT as a static library." ON)
+if (WIN32)
+  # Windows DLL currently not implemented.
+  set(FLANG_RT_ENABLE_SHARED OFF)
+else ()
+  # TODO: Enable by default to increase test coverage, and which version of the
+  #   library should be the user's choice anyway.
+  #   Currently, the Flang driver adds `-L"libdir" -lflang_rt` as linker
+  #   argument, which leaves the choice which library to use to the linker.
+  #   Since most linkers prefer the shared library, this would constitute a
+  #   breaking change unless the driver is changed.
+  option(FLANG_RT_ENABLE_SHARED "Build Flang-RT as a shared library." OFF)
+endif ()
+if (NOT FLANG_RT_ENABLE_STATIC AND NOT FLANG_RT_ENABLE_SHARED)
+  message(FATAL_ERROR "
+  Must build at least one type of library
+  (FLANG_RT_ENABLE_STATIC=ON, FLANG_RT_ENABLE_SHARED=ON, or both)
+")
+endif ()
+
+
 set(FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT "" CACHE STRING "Compile Flang-RT 
with GPU support (CUDA or OpenMP)")
 set_property(CACHE FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT PROPERTY STRINGS
 ""
diff --git a/flang-rt/cmake/modules/AddFlangRT.cmake 
b/flang-rt/cmake/modules/AddFlangRT.cmake
index 1f8b5111433825..5f493a80c35f20 100644
--- a/flang-rt/cmake/modules/AddFlangRT.cmake
+++ b/flang-rt/cmake/modules/AddFlangRT.cmake
@@ -16,7 +16,8 @@
 #   STATIC
 # Build a static (.a/.lib) library
 #   OBJECT
-# Create only object files without static/dynamic library
+# Always create an object library.
+# Without SHARED/STATIC, build only the object library.
 #   INSTALL_WITH_TOOLCHAIN
 # Install library into Clang's resource directory so it can be found by the
 # Flang driver during compilation, including tests
@@ -44,17 +45,73 @@ function (add_flangrt_library name)
   ")
   endif ()
 
-  # Forward libtype to add_library
-  set(extra_args "")
-  if (ARG_SHARED)
-list(APPEND extra_args SHARED)
+  # Internal names of libraries. If called with just single type option, use
+  # the default name for it. Name of targets must only depend on function
+  # arguments to be predictable for callers.
+  set(name_static "${name}.static")
+  set(name_shared "${name}.shared")
+  set(name_object "obj.${name}")
+  if (ARG_STATIC AND NOT ARG_SHARED)
+set(name_static "${name}")
+  elseif (NOT ARG_STATIC AND ARG_SHARED)
+set(name_shared "${name}")
+  elseif (NOT ARG_STATIC AND NOT ARG_SHARED AND ARG_OBJECT)
+set(name_object "${name}")
+  elseif (NOT ARG_STATIC AND NOT ARG_SHARED AND NOT ARG_OBJECT)
+# Only one of them will actually be built.
+set(name_static "${name}")
+set(name_shared "${name}")
+  endif ()
+
+  # Determine what to build. If not explicitly

[llvm-branch-commits] [llvm] [AsmPrinter][ELF] Support profile-guided section prefix for jump tables sections (PR #122215)

2025-01-15 Thread Mingming Liu via llvm-branch-commits

https://github.com/mingmingl-llvm updated 
https://github.com/llvm/llvm-project/pull/122215

>From dd748277dff2b30ed02bfa466eeca7102aa93eb4 Mon Sep 17 00:00:00 2001
From: mingmingl 
Date: Fri, 10 Jan 2025 13:53:08 -0800
Subject: [PATCH 1/4] rely to upstream

---
 llvm/include/llvm/CodeGen/MachineFunction.h   |   2 +-
 .../llvm/CodeGen/MachineJumpTableInfo.h   |   9 +-
 llvm/include/llvm/CodeGen/Passes.h|   2 +-
 llvm/lib/CodeGen/MachineFunction.cpp  |  12 +-
 llvm/lib/CodeGen/StaticDataSplitter.cpp   |  87 +++---
 llvm/test/CodeGen/X86/jump-table-partition.ll | 251 +++---
 6 files changed, 223 insertions(+), 140 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/MachineFunction.h 
b/llvm/include/llvm/CodeGen/MachineFunction.h
index c0f983d1c6787b..dcdbcaec168d22 100644
--- a/llvm/include/llvm/CodeGen/MachineFunction.h
+++ b/llvm/include/llvm/CodeGen/MachineFunction.h
@@ -91,7 +91,7 @@ template <> struct ilist_callback_traits {
 // The hotness of static data tracked by a MachineFunction and not represented
 // as a global object in the module IR / MIR. Typical examples are
 // MachineJumpTableInfo and MachineConstantPool.
-enum class DataHotness {
+enum class MachineFunctionDataHotness {
   Unknown,
   Cold,
   Hot,
diff --git a/llvm/include/llvm/CodeGen/MachineJumpTableInfo.h 
b/llvm/include/llvm/CodeGen/MachineJumpTableInfo.h
index cc1f54a81b9bb4..e3675d6489b350 100644
--- a/llvm/include/llvm/CodeGen/MachineJumpTableInfo.h
+++ b/llvm/include/llvm/CodeGen/MachineJumpTableInfo.h
@@ -28,7 +28,7 @@ namespace llvm {
 class MachineBasicBlock;
 class DataLayout;
 class raw_ostream;
-enum class DataHotness;
+enum class MachineFunctionDataHotness;
 
 /// MachineJumpTableEntry - One jump table in the jump table info.
 ///
@@ -36,7 +36,7 @@ struct MachineJumpTableEntry {
   /// MBBs - The vector of basic blocks from which to create the jump table.
   std::vector MBBs;
 
-  DataHotness Hotness;
+  MachineFunctionDataHotness Hotness;
 
   explicit MachineJumpTableEntry(const std::vector &M);
 };
@@ -109,7 +109,10 @@ class MachineJumpTableInfo {
 return JumpTables;
   }
 
-  void updateJumpTableHotness(size_t JTI, DataHotness Hotness);
+  // Update machine jump table entry's hotness. Return true if the hotness is
+  // updated.
+  bool updateJumpTableEntryHotness(size_t JTI,
+   MachineFunctionDataHotness Hotness);
 
   /// RemoveJumpTable - Mark the specific index as being dead.  This will
   /// prevent it from being emitted.
diff --git a/llvm/include/llvm/CodeGen/Passes.h 
b/llvm/include/llvm/CodeGen/Passes.h
index 16423d03ff7018..b5d2a7e6bf035b 100644
--- a/llvm/include/llvm/CodeGen/Passes.h
+++ b/llvm/include/llvm/CodeGen/Passes.h
@@ -71,7 +71,7 @@ namespace llvm {
   /// using profile information.
   MachineFunctionPass *createMachineFunctionSplitterPass();
 
-  /// createStaticDataSplitterPass - This pass partions static data sections
+  /// createStaticDataSplitterPass - This pass partitions a static data section
   /// into a hot and cold section using profile information.
   MachineFunctionPass *createStaticDataSplitterPass();
 
diff --git a/llvm/lib/CodeGen/MachineFunction.cpp 
b/llvm/lib/CodeGen/MachineFunction.cpp
index b5a89f3bcf42f1..d09e93d79aae6c 100644
--- a/llvm/lib/CodeGen/MachineFunction.cpp
+++ b/llvm/lib/CodeGen/MachineFunction.cpp
@@ -1293,7 +1293,7 @@ const unsigned MachineFunction::DebugOperandMemNumber = 
100;
 
 MachineJumpTableEntry::MachineJumpTableEntry(
 const std::vector &MBBs)
-: MBBs(MBBs), Hotness(DataHotness::Unknown) {}
+: MBBs(MBBs), Hotness(MachineFunctionDataHotness::Unknown) {}
 
 /// Return the size of each entry in the jump table.
 unsigned MachineJumpTableInfo::getEntrySize(const DataLayout &TD) const {
@@ -1344,13 +1344,17 @@ unsigned MachineJumpTableInfo::createJumpTableIndex(
   return JumpTables.size()-1;
 }
 
-void MachineJumpTableInfo::updateJumpTableHotness(size_t JTI,
-  DataHotness Hotness) {
+bool MachineJumpTableInfo::updateJumpTableEntryHotness(
+size_t JTI, MachineFunctionDataHotness Hotness) {
   assert(JTI < JumpTables.size() && "Invalid JTI!");
   // Note record the largest hotness is important for mergable data (constant
   // pools). Even if jump table instances are not merged, record the largest
   // value seen fwiw.
-  JumpTables[JTI].Hotness = std::max(JumpTables[JTI].Hotness, Hotness);
+  if (Hotness <= JumpTables[JTI].Hotness)
+return false;
+
+  JumpTables[JTI].Hotness = Hotness;
+  return true;
 }
 
 /// If Old is the target of any jump tables, update the jump tables to branch
diff --git a/llvm/lib/CodeGen/StaticDataSplitter.cpp 
b/llvm/lib/CodeGen/StaticDataSplitter.cpp
index 482a61027cf985..9e2cfe18256e35 100644
--- a/llvm/lib/CodeGen/StaticDataSplitter.cpp
+++ b/llvm/lib/CodeGen/StaticDataSplitter.cpp
@@ -6,13 +6,16 @@
 //
 
//===-

[llvm-branch-commits] [llvm] [AsmPrinter][ELF] Support profile-guided section prefix for jump tables sections (PR #122215)

2025-01-15 Thread Mingming Liu via llvm-branch-commits

https://github.com/mingmingl-llvm edited 
https://github.com/llvm/llvm-project/pull/122215
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AsmPrinter][ELF] Support profile-guided section prefix for jump tables' (read-only) data sections (PR #122215)

2025-01-15 Thread Mingming Liu via llvm-branch-commits

https://github.com/mingmingl-llvm edited 
https://github.com/llvm/llvm-project/pull/122215
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AsmPrinter][ELF] Support profile-guided section prefix for jump tables' (read-only) data sections (PR #122215)

2025-01-15 Thread Mingming Liu via llvm-branch-commits

https://github.com/mingmingl-llvm edited 
https://github.com/llvm/llvm-project/pull/122215
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-15 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm edited 
https://github.com/llvm/llvm-project/pull/120598
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-15 Thread Matt Arsenault via llvm-branch-commits


@@ -5907,6 +5910,30 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SmallVector LdChain; // Chain for the series of load
+  SDValue Result = GenWidenVectorLoads(LdChain, N, /*IsAtomic=*/true);

arsenm wrote:

I do not think you should call GenWidenVectorLoads. That has paths where the 
memory operation will be decomposed. I think you should try just replacing the 
load with an equivalent width integer, and extracting the parts to conform to 
the expectations of the type legalization n

https://github.com/llvm/llvm-project/pull/120598
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-15 Thread Matt Arsenault via llvm-branch-commits


@@ -146,6 +146,47 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <2 x i8> @atomic_vec2_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec2_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i8>, ptr %x acquire, align 4
+  ret <2 x i8> %ret
+}
+
+define <2 x i16> @atomic_vec2_i16(ptr %x) {
+; CHECK-LABEL: atomic_vec2_i16:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <2 x i16>, ptr %x acquire, align 4
+  ret <2 x i16> %ret
+}

arsenm wrote:

These tests still missing 

https://github.com/llvm/llvm-project/pull/120598
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Avoid forming shufflevector from a single extract_vector_elt (PR #122672)

2025-01-15 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> This is exactly the distinct I'm trying to get at. Avoiding the creation of a 
> 1-2 element shuffle seems quite reasonable. Avoiding the creation of a 100 
> element splat shuffle does not. I think you need to add an explicit condition 
> in terms of the number elements in the result, not the number of _unique_ 
> elements in the result.

This is enough to solve my immediate issue to restrict it to a single extract 
use in the build_vector. However it does seem to be much more conservative than 
necessary. This new heuristic gives up the x86 improvements. I see many lit 
improvements with various more aggressive variants, but regressions in others. 
I think this would require more time working on x86, I think some of the 
regressions are due to missing optimizations there

https://github.com/llvm/llvm-project/pull/122672
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Avoid forming shufflevector from a single extract_vector_elt (PR #122672)

2025-01-15 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: undef deprecator found issues in your code. :warning:



You can test this locally with the following command:


``bash
git diff -U0 --pickaxe-regex -S 
'([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 
7facaf480944ff20279c029639cb611e7d7d7cff 
013b1aa25dc8b33c6916de532019efd91e69ee13 
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp 
llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll
``




The following files introduce new uses of undef:
 - llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

[Undef](https://llvm.org/docs/LangRef.html#undefined-values) is now deprecated 
and should only be used in the rare cases where no replacement is possible. For 
example, a load of uninitialized memory yields `undef`. You should use `poison` 
values for placeholders instead.

In tests, avoid using `undef` and having tests that trigger undefined behavior. 
If you need an operand with some unimportant value, you can add a new argument 
to the function and use that instead.

For example, this is considered a bad practice:
```llvm
define void @fn() {
  ...
  br i1 undef, ...
}
```

Please use the following instead:
```llvm
define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}
```

Please refer to the [Undefined Behavior 
Manual](https://llvm.org/docs/UndefinedBehavior.html) for more information.



https://github.com/llvm/llvm-project/pull/122672
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Implement isExtractVecEltCheap (PR #122460)

2025-01-15 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/122460

>From 924abbb3903a00f55cdee361985c74982c492963 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 10 Jan 2025 14:57:24 +0700
Subject: [PATCH 1/2] AMDGPU: Implement isExtractVecEltCheap

Once again we have excessive TLI hooks with bad defaults. Permit this
for 32-bit element vectors, which are just use-different-register.
We should permit 16-bit vectors as cheap with legal packed instructions,
but I see some mixed improvements and regressions that need investigation.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  7 +
 llvm/lib/Target/AMDGPU/SIISelLowering.h   |  1 +
 llvm/test/CodeGen/AMDGPU/mad-mix.ll   | 12 -
 llvm/test/CodeGen/AMDGPU/packed-fp32.ll   | 32 +++
 llvm/test/CodeGen/AMDGPU/trunc-combine.ll |  9 ---
 5 files changed, 45 insertions(+), 16 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 21f1f20e5e69a82..a63e18b33034f30 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1952,6 +1952,13 @@ bool SITargetLowering::isExtractSubvectorCheap(EVT 
ResVT, EVT SrcVT,
   return Index == 0;
 }
 
+bool SITargetLowering::isExtractVecEltCheap(EVT VT, unsigned Index) const {
+  // TODO: This should be more aggressive, particular for 16-bit element
+  // vectors. However there are some mixed improvements and regressions.
+  EVT EltTy = VT.getVectorElementType();
+  return EltTy.getSizeInBits() % 32 == 0;
+}
+
 bool SITargetLowering::isTypeDesirableForOp(unsigned Op, EVT VT) const {
   if (Subtarget->has16BitInsts() && VT == MVT::i16) {
 switch (Op) {
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.h 
b/llvm/lib/Target/AMDGPU/SIISelLowering.h
index 5c215f76552d9cf..bbb96d9115a0a93 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.h
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.h
@@ -365,6 +365,7 @@ class SITargetLowering final : public AMDGPUTargetLowering {
 
   bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
unsigned Index) const override;
+  bool isExtractVecEltCheap(EVT VT, unsigned Index) const override;
 
   bool isTypeDesirableForOp(unsigned Op, EVT VT) const override;
 
diff --git a/llvm/test/CodeGen/AMDGPU/mad-mix.ll 
b/llvm/test/CodeGen/AMDGPU/mad-mix.ll
index b520dd1060ec8cf..30e3bc3ba5da85e 100644
--- a/llvm/test/CodeGen/AMDGPU/mad-mix.ll
+++ b/llvm/test/CodeGen/AMDGPU/mad-mix.ll
@@ -385,17 +385,15 @@ define <2 x float> @v_mad_mix_v2f32_shuffle(<2 x half> 
%src0, <2 x half> %src1,
 ; SDAG-CI:   ; %bb.0:
 ; SDAG-CI-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; SDAG-CI-NEXT:v_cvt_f16_f32_e32 v3, v3
-; SDAG-CI-NEXT:v_cvt_f16_f32_e32 v4, v5
 ; SDAG-CI-NEXT:v_cvt_f16_f32_e32 v2, v2
-; SDAG-CI-NEXT:v_cvt_f16_f32_e32 v5, v1
+; SDAG-CI-NEXT:v_cvt_f16_f32_e32 v1, v1
 ; SDAG-CI-NEXT:v_cvt_f16_f32_e32 v0, v0
 ; SDAG-CI-NEXT:v_cvt_f32_f16_e32 v3, v3
-; SDAG-CI-NEXT:v_cvt_f32_f16_e32 v1, v4
 ; SDAG-CI-NEXT:v_cvt_f32_f16_e32 v2, v2
-; SDAG-CI-NEXT:v_cvt_f32_f16_e32 v4, v5
-; SDAG-CI-NEXT:v_cvt_f32_f16_e32 v5, v0
-; SDAG-CI-NEXT:v_mad_f32 v0, v4, v2, v1
-; SDAG-CI-NEXT:v_mac_f32_e32 v1, v5, v3
+; SDAG-CI-NEXT:v_cvt_f32_f16_e32 v1, v1
+; SDAG-CI-NEXT:v_cvt_f32_f16_e32 v4, v0
+; SDAG-CI-NEXT:v_mad_f32 v0, v1, v2, v5
+; SDAG-CI-NEXT:v_mad_f32 v1, v4, v3, v5
 ; SDAG-CI-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GISEL-CI-LABEL: v_mad_mix_v2f32_shuffle:
diff --git a/llvm/test/CodeGen/AMDGPU/packed-fp32.ll 
b/llvm/test/CodeGen/AMDGPU/packed-fp32.ll
index 6b7eff316fe95bc..0833dada43e4d5d 100644
--- a/llvm/test/CodeGen/AMDGPU/packed-fp32.ll
+++ b/llvm/test/CodeGen/AMDGPU/packed-fp32.ll
@@ -549,17 +549,19 @@ bb:
   ret void
 }
 
-; GCN-LABEL: {{^}}fadd_fadd_fsub:
+; GCN-LABEL: {{^}}fadd_fadd_fsub_0:
 ; GFX900:   v_add_f32_e64 v{{[0-9]+}}, s{{[0-9]+}}, 0
 ; GFX900:   v_add_f32_e32 v{{[0-9]+}}, 0, v{{[0-9]+}}
-; PACKED-SDAG:  v_pk_add_f32 v[{{[0-9:]+}}], s[{{[0-9:]+}}], 0 
op_sel_hi:[1,0]{{$}}
-; PACKED-SDAG:  v_pk_add_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 0 
op_sel_hi:[1,0]{{$}}
+
+; PACKED-SDAG: v_add_f32_e64 v{{[0-9]+}}, s{{[0-9]+}}, 0
+; PACKED-SDAG: v_add_f32_e32 v{{[0-9]+}}, 0, v{{[0-9]+}}
+
 ; PACKED-GISEL: v_pk_add_f32 v[{{[0-9:]+}}], s[{{[0-9:]+}}], 
v[{{[0-9:]+}}]{{$}}
 ; PACKED-GISEL: v_pk_add_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 
s[{{[0-9:]+}}]{{$}}
-define amdgpu_kernel void @fadd_fadd_fsub(<2 x float> %arg) {
+define amdgpu_kernel void @fadd_fadd_fsub_0(<2 x float> %arg) {
 bb:
   %i12 = fadd <2 x float> zeroinitializer, %arg
-  %shift8 = shufflevector <2 x float> %i12, <2 x float> undef, <2 x i32> 
+  %shift8 = shufflevector <2 x float> %i12, <2 x float> poison, <2 x i32> 
   %i13 = fadd <2 x float> zeroinitializer, %shift8
   %i14 = shufflevector <2 x float> %arg, <2 x float> %i13, <2 x i32> 
   %i15 = fsub <2 x float> %i14, zeroinitialize

[llvm-branch-commits] [flang] [llvm] [flang][OpenMP] Parse WHEN, OTHERWISE, MATCH clauses plus METADIRECTIVE (PR #121817)

2025-01-15 Thread Krzysztof Parzyszek via llvm-branch-commits


@@ -214,6 +214,11 @@ class AssociatedLoopChecker {
 };
 
 bool OmpStructureChecker::CheckAllowedClause(llvmOmpClause clause) {
+  // Do not do clause checks while processing METADIRECTIVE.

kparzysz wrote:

Yes, I'll add a comment.  The reason was that when you specify a clause inside 
of a context selector, the visitor will think it was applied to METADIRECTIVE 
itself...

https://github.com/llvm/llvm-project/pull/121817
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] c2e8334 - Revert "[Flang][Driver] Add a flag to control zero initialization of global v…"

2025-01-15 Thread via llvm-branch-commits

Author: Kiran Chandramohan
Date: 2025-01-15T15:21:55Z
New Revision: c2e833476d5a2ceb0c9dabe7bd6e206c01f5ce1f

URL: 
https://github.com/llvm/llvm-project/commit/c2e833476d5a2ceb0c9dabe7bd6e206c01f5ce1f
DIFF: 
https://github.com/llvm/llvm-project/commit/c2e833476d5a2ceb0c9dabe7bd6e206c01f5ce1f.diff

LOG: Revert "[Flang][Driver] Add a flag to control zero initialization of 
global v…"

This reverts commit c593e3d0f77509ce65a6f5bd744f2d1ea9935c47.

Added: 


Modified: 
clang/include/clang/Driver/Options.td
clang/lib/Driver/ToolChains/Flang.cpp
flang/include/flang/Lower/LoweringOptions.def
flang/lib/Frontend/CompilerInvocation.cpp
flang/lib/Lower/ConvertVariable.cpp

Removed: 
flang/test/Driver/fno-zero-init.f90
flang/test/Lower/zero_init.f90
flang/test/Lower/zero_init_default_init.f90



diff  --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index dacfca910acc40..2721c1b5d8dc55 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3505,11 +3505,6 @@ def fno_struct_path_tbaa : Flag<["-"], 
"fno-struct-path-tbaa">, Group;
 def fno_strict_enums : Flag<["-"], "fno-strict-enums">, Group;
 def fno_strict_overflow : Flag<["-"], "fno-strict-overflow">, Group,
   Visibility<[ClangOption, FlangOption]>;
-defm init_global_zero : BoolOptionWithoutMarshalling<"f", "init-global-zero",
-  PosFlag,
-  NegFlag>;
 def fno_pointer_tbaa : Flag<["-"], "fno-pointer-tbaa">, Group;
 def fno_temp_file : Flag<["-"], "fno-temp-file">, Group,
   Visibility<[ClangOption, CC1Option, CLOption, DXCOption]>, HelpText<

diff  --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index c46e8222a9631a..a7d0cc99f27d2d 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -155,9 +155,7 @@ void Flang::addCodegenOptions(const ArgList &Args,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
-   options::OPT_finit_global_zero,
-   options::OPT_fno_init_global_zero, 
options::OPT_ftime_report,
-   options::OPT_ftime_report_EQ});
+   options::OPT_ftime_report, options::OPT_ftime_report_EQ});
 }
 
 void Flang::addPicOptions(const ArgList &Args, ArgStringList &CmdArgs) const {

diff  --git a/flang/include/flang/Lower/LoweringOptions.def 
b/flang/include/flang/Lower/LoweringOptions.def
index 396c91948be36b..5a6debfdffe030 100644
--- a/flang/include/flang/Lower/LoweringOptions.def
+++ b/flang/include/flang/Lower/LoweringOptions.def
@@ -44,8 +44,5 @@ ENUM_LOWERINGOPT(IntegerWrapAround, unsigned, 1, 0)
 /// If false, assume that the shapes/types/allocation-status match.
 ENUM_LOWERINGOPT(ReallocateLHS, unsigned, 1, 1)
 
-/// If true, initialize globals without initialization to zero.
-/// On by default.
-ENUM_LOWERINGOPT(InitGlobalZero, unsigned, 1, 1)
 #undef LOWERINGOPT
 #undef ENUM_LOWERINGOPT

diff  --git a/flang/lib/Frontend/CompilerInvocation.cpp 
b/flang/lib/Frontend/CompilerInvocation.cpp
index 78d1199c19749b..5e7127313c1335 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -1373,14 +1373,6 @@ bool CompilerInvocation::createFromArgs(
 invoc.loweringOpts.setNoPPCNativeVecElemOrder(true);
   }
 
-  // -f[no-]init-global-zero
-  if (args.hasFlag(clang::driver::options::OPT_finit_global_zero,
-   clang::driver::options::OPT_fno_init_global_zero,
-   /*default=*/true))
-invoc.loweringOpts.setInitGlobalZero(true);
-  else
-invoc.loweringOpts.setInitGlobalZero(false);
-
   // Preserve all the remark options requested, i.e. -Rpass, -Rpass-missed or
   // -Rpass-analysis. This will be used later when processing and outputting 
the
   // remarks generated by LLVM in ExecuteCompilerInvocation.cpp.

diff  --git a/flang/lib/Lower/ConvertVariable.cpp 
b/flang/lib/Lower/ConvertVariable.cpp
index 87236dc293ebbc..9ee42d5cd88002 100644
--- a/flang/lib/Lower/ConvertVariable.cpp
+++ b/flang/lib/Lower/ConvertVariable.cpp
@@ -635,11 +635,7 @@ static fir::GlobalOp 
defineGlobal(Fortran::lower::AbstractConverter &converter,
   global.setLinkName(builder.createCommonLinkage());
 Fortran::lower::createGlobalInitialization(
 builder, global, [&](fir::FirOpBuilder &builder) {
-  mlir::Value initValue;
-  if (converter.getLoweringOptions().getInitGlobalZero())
-initValue = builder.create(loc, symTy);
-  else
-initValue = builder.create(loc, symTy);
+  mlir::Value initValue = builder.create(loc, symTy);
   builder.create(loc, initValue);
 });
   }

diff  --git a/flang/test/Driver/fno-zero-init.f90 
b/flang/test/Driver/fno-zero-in

[llvm-branch-commits] [llvm] [AMDGPU][NewPM] Port SILowerControlFlow pass into NPM. (PR #123045)

2025-01-15 Thread Christudasan Devadasan via llvm-branch-commits

cdevadas wrote:

> Currently `LiveIntervals` and `SlotIndexes` are using the default 
> `invalidate` implementation.

Ok, we should be able to fix it by overriding this instance in the LIS and SI. 
However, the initial question is still valid. Can we have meaningful error 
reporting when one fails to capture this dependency?



https://github.com/llvm/llvm-project/pull/123045
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][NewPM] Port SILowerControlFlow pass into NPM. (PR #123045)

2025-01-15 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> I observed something while porting this pass. The analysis LiveIntervals 
> (LIS) uses the SlotIndexes (SI). There is no explicit use of SI in this pass. 
> If we have to preserve LIS, it required us to preserve SI as well. When I 
> initially failed to preserve SI, the following lit test went into an infinite 
> loop (the cleanup phase hung). 
> llvm/test/CodeGen/AMDGPU/lower-control-flow-live-intervals.mir. Then I 
> realized that the Legacy pass flow preserved SI in getAnalysisUsage(). Is it 
> expected to preserve all the dependent analyses alongside a parent analysis?

In the old pass manager this is kind of an unwritten rule, and if you break the 
rule you tend to get assertion failures in the pass manager itself. See: See 
https://discourse.llvm.org/t/legacypm-preserving-passes-and-addrequiredtransitive/57043

I don't know what the rules are for the new pass manager.

https://github.com/llvm/llvm-project/pull/123045
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits