[llvm-branch-commits] [llvm] AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (PR #123684)

2025-01-21 Thread Joe Nash via llvm-branch-commits


@@ -489,6 +489,90 @@ void AMDGPUDAGToDAGISel::SelectBuildVector(SDNode *N, 
unsigned RegClassID) {
   CurDAG->SelectNodeTo(N, AMDGPU::REG_SEQUENCE, N->getVTList(), RegSeqArgs);
 }
 
+void AMDGPUDAGToDAGISel::SelectVectorShuffle(SDNode *N) {
+  EVT VT = N->getValueType(0);
+  EVT EltVT = VT.getVectorElementType();
+
+  // TODO: Handle 16-bit element vectors with even aligned masks.
+  if (!Subtarget->hasPkMovB32() || !EltVT.bitsEq(MVT::i32) ||
+  VT.getVectorNumElements() != 2) {
+SelectCode(N);
+return;
+  }
+
+  auto *SVN = cast(N);
+
+  SDValue Src0 = SVN->getOperand(0);
+  SDValue Src1 = SVN->getOperand(1);
+  ArrayRef Mask = SVN->getMask();
+  SDLoc DL(N);
+
+  assert(Src0.getValueType().getVectorNumElements() == 2 && Mask.size() == 2 &&
+ Mask[0] < 4 && Mask[1] < 4);
+
+  SDValue VSrc0 = Mask[0] < 2 ? Src0 : Src1;
+  SDValue VSrc1 = Mask[1] < 2 ? Src0 : Src1;
+  unsigned Src0SubReg = Mask[0] & 1 ? AMDGPU::sub1 : AMDGPU::sub0;
+  unsigned Src1SubReg = Mask[1] & 1 ? AMDGPU::sub1 : AMDGPU::sub0;
+
+  if (Mask[0] < 0) {
+Src0SubReg = Src1SubReg;
+MachineSDNode *ImpDef =
+CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT);
+VSrc0 = SDValue(ImpDef, 0);
+  }
+
+  if (Mask[1] < 0) {
+Src1SubReg = Src0SubReg;
+MachineSDNode *ImpDef =
+CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT);
+VSrc1 = SDValue(ImpDef, 0);
+  }
+
+  // SGPR case needs to lower to copies.
+  //
+  // Also use subregister extract when we can directly blend the registers with
+  // a simple subregister copy.
+  //
+  // TODO: Maybe we should fold this out earlier
+  if (N->isDivergent() && Src0SubReg == AMDGPU::sub1 &&
+  Src1SubReg == AMDGPU::sub0) {
+// The low element of the result always comes from src0.
+// The high element of the result always comes from src1.
+// op_sel selects the high half of src0.
+// op_sel_hi selects the high half of src1.
+
+unsigned Src0OpSel =
+Src0SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE;
+unsigned Src1OpSel =
+Src1SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE;

Sisyph wrote:

It is written in a very confusing way in the docs, but I think you have it 
correct in the code. Out of the 6 bits (op_sel[0-2] and op_sel_hi[0-2]) only 
op_sel[0] and op_sel[1] do anything iiuc. 

https://github.com/llvm/llvm-project/pull/123684
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][NewPM] Port SILowerWWMCopies to NPM (PR #123695)

2025-01-21 Thread Akshat Oke via llvm-branch-commits

https://github.com/optimisan edited 
https://github.com/llvm/llvm-project/pull/123695
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Custom lower 32-bit element shuffles (PR #123711)

2025-01-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/123711

This is so we can try to make use of v_pk_mov_b32 when available.
Note this currently has little observable effect. The combiner
will undo the common extract of shuffle pattern. The lack
of test changes should demonstrate this change is minimally
correct.

We should probably try to make better use of wider extracts in
even aligned cases, but I'm trying to avoid some really ugly
regalloc regressions in some MFMA tests. The DAG scheduler ends
up doing a worse job if we use vector extracts, resulting
in failure to do 3 address conversion of MFMAs.

>From 1c0f60a72d6b06541ce611613a83c429ad205713 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sat, 18 Jan 2025 08:19:10 +0700
Subject: [PATCH] AMDGPU: Custom lower 32-bit element shuffles

This is so we can try to make use of v_pk_mov_b32 when available.
Note this currently has little observable effect. The combiner
will undo the common extract of shuffle pattern. The lack
of test changes should demonstrate this change is minimally
correct.

We should probably try to make better use of wider extracts in
even aligned cases, but I'm trying to avoid some really ugly
regalloc regressions in some MFMA tests. The DAG scheduler ends
up doing a worse job if we use vector extracts, resulting
in failure to do 3 address conversion of MFMAs.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 85 +--
 1 file changed, 80 insertions(+), 5 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 1aeca7f370aa1b..b632c50dae0e35 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -419,8 +419,9 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
   }
 
   setOperationAction(ISD::VECTOR_SHUFFLE,
- {MVT::v8i32, MVT::v8f32, MVT::v16i32, MVT::v16f32},
- Expand);
+ {MVT::v4i32, MVT::v4f32, MVT::v8i32, MVT::v8f32,
+  MVT::v16i32, MVT::v16f32, MVT::v32i32, MVT::v32f32},
+ Custom);
 
   if (Subtarget->hasPkMovB32()) {
 // TODO: 16-bit element vectors should be legal with even aligned elements.
@@ -7589,15 +7590,38 @@ static bool elementPairIsContiguous(ArrayRef Mask, 
int Elt) {
   return Mask[Elt + 1] == Mask[Elt] + 1 && (Mask[Elt] % 2 == 0);
 }
 
+static bool elementPairIsOddToEven(ArrayRef Mask, int Elt) {
+  assert(Elt % 2 == 0);
+  return Mask[Elt] >= 0 && Mask[Elt + 1] >= 0 && (Mask[Elt] & 1) &&
+ !(Mask[Elt + 1] & 1);
+}
+
 SDValue SITargetLowering::lowerVECTOR_SHUFFLE(SDValue Op,
   SelectionDAG &DAG) const {
   SDLoc SL(Op);
   EVT ResultVT = Op.getValueType();
   ShuffleVectorSDNode *SVN = cast(Op);
   MVT EltVT = ResultVT.getVectorElementType().getSimpleVT();
-  MVT PackVT = MVT::getVectorVT(EltVT, 2);
+  const int NewSrcNumElts = 2;
+  MVT PackVT = MVT::getVectorVT(EltVT, NewSrcNumElts);
   int SrcNumElts = Op.getOperand(0).getValueType().getVectorNumElements();
 
+  // Break up the shuffle into registers sized pieces.
+  //
+  // We're trying to form sub-shuffles that the register allocation pipeline
+  // won't be able to figure out, like how to use v_pk_mov_b32 to do a register
+  // blend or 16-bit op_sel. It should be able to figure out how to reassemble 
a
+  // pair of copies into a consecutive register copy, so use the ordinary
+  // extract_vector_elt lowering unless we can use the shuffle.
+  //
+  // TODO: This is a bit of hack, and we should probably always use
+  // extract_subvector for the largest possible subvector we can (or at least
+  // use it for PackVT aligned pieces). However we have worse support for
+  // combines on them don't directly treat extract_subvector / insert_subvector
+  // as legal. The DAG scheduler also ends up doing a worse job with the
+  // extract_subvectors.
+  const bool ShouldUseConsecutiveExtract = EltVT.getSizeInBits() == 16;
+
   // vector_shuffle <0,1,6,7> lhs, rhs
   // -> concat_vectors (extract_subvector lhs, 0), (extract_subvector rhs, 2)
   //
@@ -7608,9 +7632,18 @@ SDValue SITargetLowering::lowerVECTOR_SHUFFLE(SDValue Op,
   // -> concat_vectors (extract_subvector rhs, 2), (extract_subvector lhs, 0)
 
   // Avoid scalarizing when both halves are reading from consecutive elements.
-  SmallVector Pieces;
+
+  // If we're treating 2 element shuffles as legal, also create odd-to-even
+  // shuffles of neighboring pairs.
+  //
+  // vector_shuffle <3,2,7,6> lhs, rhs
+  //  -> concat_vectors vector_shuffle <1, 0> (extract_subvector lhs, 0)
+  //vector_shuffle <1, 0> (extract_subvector rhs, 2)
+
+  SmallVector Pieces;
   for (int I = 0, N = ResultVT.getVectorNumElements(); I != N; I += 2) {
-if (elementPairIsContiguous(SVN->getMask(), I)) {
+if (ShouldUseConsecutiveExtract &&
+elementPairIsContiguous(SVN->getMas

[llvm-branch-commits] [llvm] AMDGPU: Custom lower 32-bit element shuffles (PR #123711)

2025-01-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

This is so we can try to make use of v_pk_mov_b32 when available.
Note this currently has little observable effect. The combiner
will undo the common extract of shuffle pattern. The lack
of test changes should demonstrate this change is minimally
correct.

We should probably try to make better use of wider extracts in
even aligned cases, but I'm trying to avoid some really ugly
regalloc regressions in some MFMA tests. The DAG scheduler ends
up doing a worse job if we use vector extracts, resulting
in failure to do 3 address conversion of MFMAs.

---
Full diff: https://github.com/llvm/llvm-project/pull/123711.diff


1 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+80-5) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 1aeca7f370aa1b..b632c50dae0e35 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -419,8 +419,9 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
   }
 
   setOperationAction(ISD::VECTOR_SHUFFLE,
- {MVT::v8i32, MVT::v8f32, MVT::v16i32, MVT::v16f32},
- Expand);
+ {MVT::v4i32, MVT::v4f32, MVT::v8i32, MVT::v8f32,
+  MVT::v16i32, MVT::v16f32, MVT::v32i32, MVT::v32f32},
+ Custom);
 
   if (Subtarget->hasPkMovB32()) {
 // TODO: 16-bit element vectors should be legal with even aligned elements.
@@ -7589,15 +7590,38 @@ static bool elementPairIsContiguous(ArrayRef Mask, 
int Elt) {
   return Mask[Elt + 1] == Mask[Elt] + 1 && (Mask[Elt] % 2 == 0);
 }
 
+static bool elementPairIsOddToEven(ArrayRef Mask, int Elt) {
+  assert(Elt % 2 == 0);
+  return Mask[Elt] >= 0 && Mask[Elt + 1] >= 0 && (Mask[Elt] & 1) &&
+ !(Mask[Elt + 1] & 1);
+}
+
 SDValue SITargetLowering::lowerVECTOR_SHUFFLE(SDValue Op,
   SelectionDAG &DAG) const {
   SDLoc SL(Op);
   EVT ResultVT = Op.getValueType();
   ShuffleVectorSDNode *SVN = cast(Op);
   MVT EltVT = ResultVT.getVectorElementType().getSimpleVT();
-  MVT PackVT = MVT::getVectorVT(EltVT, 2);
+  const int NewSrcNumElts = 2;
+  MVT PackVT = MVT::getVectorVT(EltVT, NewSrcNumElts);
   int SrcNumElts = Op.getOperand(0).getValueType().getVectorNumElements();
 
+  // Break up the shuffle into registers sized pieces.
+  //
+  // We're trying to form sub-shuffles that the register allocation pipeline
+  // won't be able to figure out, like how to use v_pk_mov_b32 to do a register
+  // blend or 16-bit op_sel. It should be able to figure out how to reassemble 
a
+  // pair of copies into a consecutive register copy, so use the ordinary
+  // extract_vector_elt lowering unless we can use the shuffle.
+  //
+  // TODO: This is a bit of hack, and we should probably always use
+  // extract_subvector for the largest possible subvector we can (or at least
+  // use it for PackVT aligned pieces). However we have worse support for
+  // combines on them don't directly treat extract_subvector / insert_subvector
+  // as legal. The DAG scheduler also ends up doing a worse job with the
+  // extract_subvectors.
+  const bool ShouldUseConsecutiveExtract = EltVT.getSizeInBits() == 16;
+
   // vector_shuffle <0,1,6,7> lhs, rhs
   // -> concat_vectors (extract_subvector lhs, 0), (extract_subvector rhs, 2)
   //
@@ -7608,9 +7632,18 @@ SDValue SITargetLowering::lowerVECTOR_SHUFFLE(SDValue Op,
   // -> concat_vectors (extract_subvector rhs, 2), (extract_subvector lhs, 0)
 
   // Avoid scalarizing when both halves are reading from consecutive elements.
-  SmallVector Pieces;
+
+  // If we're treating 2 element shuffles as legal, also create odd-to-even
+  // shuffles of neighboring pairs.
+  //
+  // vector_shuffle <3,2,7,6> lhs, rhs
+  //  -> concat_vectors vector_shuffle <1, 0> (extract_subvector lhs, 0)
+  //vector_shuffle <1, 0> (extract_subvector rhs, 2)
+
+  SmallVector Pieces;
   for (int I = 0, N = ResultVT.getVectorNumElements(); I != N; I += 2) {
-if (elementPairIsContiguous(SVN->getMask(), I)) {
+if (ShouldUseConsecutiveExtract &&
+elementPairIsContiguous(SVN->getMask(), I)) {
   const int Idx = SVN->getMaskElt(I);
   int VecIdx = Idx < SrcNumElts ? 0 : 1;
   int EltIdx = Idx < SrcNumElts ? Idx : Idx - SrcNumElts;
@@ -7618,6 +7651,48 @@ SDValue SITargetLowering::lowerVECTOR_SHUFFLE(SDValue Op,
SVN->getOperand(VecIdx),
DAG.getConstant(EltIdx, SL, MVT::i32));
   Pieces.push_back(SubVec);
+} else if (elementPairIsOddToEven(SVN->getMask(), I) &&
+   isOperationLegal(ISD::VECTOR_SHUFFLE, PackVT)) {
+  int Idx0 = SVN->getMaskElt(I);
+  int Idx1 = SVN->getMaskElt(I + 1);
+
+  SDValue SrcOp0 = SVN->getOperand(0);
+  SDValue SrcOp1 

[llvm-branch-commits] [llvm] AMDGPU: Custom lower 32-bit element shuffles (PR #123711)

2025-01-21 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/123711?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#123712** https://app.graphite.dev/github/pr/llvm/llvm-project/123712?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#123711** https://app.graphite.dev/github/pr/llvm/llvm-project/123711?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/123711?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#123684** https://app.graphite.dev/github/pr/llvm/llvm-project/123684?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#123596** https://app.graphite.dev/github/pr/llvm/llvm-project/123596?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/123711
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Custom lower 32-bit element shuffles (PR #123711)

2025-01-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/123711
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Avoid breaking legal vector_shuffle with multiple uses (PR #123712)

2025-01-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/123712
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Avoid breaking legal vector_shuffle with multiple uses (PR #123712)

2025-01-21 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/123712?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#123712** https://app.graphite.dev/github/pr/llvm/llvm-project/123712?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/123712?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#123711** https://app.graphite.dev/github/pr/llvm/llvm-project/123711?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#123684** https://app.graphite.dev/github/pr/llvm/llvm-project/123684?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#123596** https://app.graphite.dev/github/pr/llvm/llvm-project/123596?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/123712
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Avoid breaking legal vector_shuffle with multiple uses (PR #123712)

2025-01-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

Previously this combine would undo AMDGPU's new custom legalization of
wide vector shuffles into 2 element pieces. The comment also
states that this combine is only done before legalization,
but the case with a build_vector source was unconditional.

We probably don't want to do this if the multiple uses are full
scalarization of the vector, but this seems to work well enough.
Scalarizing extracts should have folded out pre-legalize.

---

Patch is 393.55 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/123712.diff


10 Files Affected:

- (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+5) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4f32.v2f32.ll (+114-166) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4f32.v3f32.ll (+129-143) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4f32.v4f32.ll (+343-466) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4i32.v2i32.ll (+114-166) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4i32.v3i32.ll (+129-143) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4i32.v4i32.ll (+343-466) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4p3.v2p3.ll (+114-166) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4p3.v3p3.ll (+129-143) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4p3.v4p3.ll (+343-466) 


``diff
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp 
b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 671a14e6250fc6..eaa08b0ef32af2 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -23172,6 +23172,11 @@ SDValue DAGCombiner::visitEXTRACT_VECTOR_ELT(SDNode 
*N) {
 }
 
 if (SVInVec.getOpcode() == ISD::BUILD_VECTOR) {
+  // TODO: Check if shuffle mask is legal?
+  if (LegalOperations && TLI.isOperationLegal(ISD::VECTOR_SHUFFLE, VecVT) 
&&
+  !VecOp.hasOneUse())
+return SDValue();
+
   SDValue InOp = SVInVec.getOperand(OrigElt);
   if (InOp.getValueType() != ScalarVT) {
 assert(InOp.getValueType().isInteger() && ScalarVT.isInteger());
diff --git a/llvm/test/CodeGen/AMDGPU/shufflevector.v4f32.v2f32.ll 
b/llvm/test/CodeGen/AMDGPU/shufflevector.v4f32.v2f32.ll
index fafdac7ffac62f..54c5390ca76c62 100644
--- a/llvm/test/CodeGen/AMDGPU/shufflevector.v4f32.v2f32.ll
+++ b/llvm/test/CodeGen/AMDGPU/shufflevector.v4f32.v2f32.ll
@@ -176,8 +176,7 @@ define void @v_shuffle_v4f32_v2f32__3_0_u_u(ptr 
addrspace(1) inreg %ptr) {
 ; GFX90A-NEXT:;;#ASMSTART
 ; GFX90A-NEXT:; def v[2:3]
 ; GFX90A-NEXT:;;#ASMEND
-; GFX90A-NEXT:v_mov_b32_e32 v0, v1
-; GFX90A-NEXT:v_mov_b32_e32 v1, v2
+; GFX90A-NEXT:v_pk_mov_b32 v[0:1], v[2:3], v[0:1] op_sel:[1,0] 
op_sel_hi:[0,0]
 ; GFX90A-NEXT:global_store_dwordx4 v4, v[0:3], s[16:17]
 ; GFX90A-NEXT:s_waitcnt vmcnt(0)
 ; GFX90A-NEXT:s_setpc_b64 s[30:31]
@@ -192,8 +191,8 @@ define void @v_shuffle_v4f32_v2f32__3_0_u_u(ptr 
addrspace(1) inreg %ptr) {
 ; GFX940-NEXT:;;#ASMSTART
 ; GFX940-NEXT:; def v[2:3]
 ; GFX940-NEXT:;;#ASMEND
-; GFX940-NEXT:v_mov_b32_e32 v0, v1
-; GFX940-NEXT:v_mov_b32_e32 v1, v2
+; GFX940-NEXT:s_nop 0
+; GFX940-NEXT:v_pk_mov_b32 v[0:1], v[2:3], v[0:1] op_sel:[1,0] 
op_sel_hi:[0,0]
 ; GFX940-NEXT:global_store_dwordx4 v4, v[0:3], s[0:1] sc0 sc1
 ; GFX940-NEXT:s_waitcnt vmcnt(0)
 ; GFX940-NEXT:s_setpc_b64 s[30:31]
@@ -566,16 +565,15 @@ define void @v_shuffle_v4f32_v2f32__3_3_3_0(ptr 
addrspace(1) inreg %ptr) {
 ; GFX90A:   ; %bb.0:
 ; GFX90A-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX90A-NEXT:;;#ASMSTART
-; GFX90A-NEXT:; def v[0:1]
+; GFX90A-NEXT:; def v[2:3]
 ; GFX90A-NEXT:;;#ASMEND
-; GFX90A-NEXT:v_mov_b32_e32 v6, 0
 ; GFX90A-NEXT:;;#ASMSTART
-; GFX90A-NEXT:; def v[4:5]
+; GFX90A-NEXT:; def v[0:1]
 ; GFX90A-NEXT:;;#ASMEND
+; GFX90A-NEXT:v_mov_b32_e32 v4, 0
+; GFX90A-NEXT:v_pk_mov_b32 v[2:3], v[0:1], v[2:3] op_sel:[1,0] 
op_sel_hi:[0,0]
 ; GFX90A-NEXT:v_mov_b32_e32 v0, v1
-; GFX90A-NEXT:v_mov_b32_e32 v2, v1
-; GFX90A-NEXT:v_mov_b32_e32 v3, v4
-; GFX90A-NEXT:global_store_dwordx4 v6, v[0:3], s[16:17]
+; GFX90A-NEXT:global_store_dwordx4 v4, v[0:3], s[16:17]
 ; GFX90A-NEXT:s_waitcnt vmcnt(0)
 ; GFX90A-NEXT:s_setpc_b64 s[30:31]
 ;
@@ -583,16 +581,15 @@ define void @v_shuffle_v4f32_v2f32__3_3_3_0(ptr 
addrspace(1) inreg %ptr) {
 ; GFX940:   ; %bb.0:
 ; GFX940-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX940-NEXT:;;#ASMSTART
-; GFX940-NEXT:; def v[0:1]
+; GFX940-NEXT:; def v[2:3]
 ; GFX940-NEXT:;;#ASMEND
-; GFX940-NEXT:v_mov_b32_e32 v6, 0
 ; GFX940-NEXT:;;#ASMSTART
-; GFX940-NEXT:; def v[4:5]
+; GFX940-NEXT:; def v[0:1]
 ; GFX940-NEXT:;;#ASMEND
+; GFX940-NEXT:v_mov_b32_e32 v4, 0
+; GFX940-NEXT:v_pk_mov_b32

[llvm-branch-commits] [llvm] [AMDGPU][NewPM] Port SILowerWWMCopies to NPM (PR #123695)

2025-01-21 Thread Akshat Oke via llvm-branch-commits

https://github.com/optimisan updated 
https://github.com/llvm/llvm-project/pull/123695

>From b85cc524ef390d2680359d2ccc7085af11eb2eaf Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Tue, 21 Jan 2025 06:30:07 +
Subject: [PATCH] [AMDGPU][NewPM] Port SILowerWWMCopies to NPM

---
 llvm/lib/Target/AMDGPU/AMDGPU.h   |  4 +-
 llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def |  1 +
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |  7 +-
 llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp   | 81 +--
 llvm/lib/Target/AMDGPU/SILowerWWMCopies.h | 22 +
 .../CodeGen/AMDGPU/si-lower-wwm-copies.mir| 43 ++
 6 files changed, 127 insertions(+), 31 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/SILowerWWMCopies.h
 create mode 100644 llvm/test/CodeGen/AMDGPU/si-lower-wwm-copies.mir

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 5d9a830f041a74..5f3db283c9b447 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -180,8 +180,8 @@ extern char &SIFixSGPRCopiesLegacyID;
 void initializeSIFixVGPRCopiesPass(PassRegistry &);
 extern char &SIFixVGPRCopiesID;
 
-void initializeSILowerWWMCopiesPass(PassRegistry &);
-extern char &SILowerWWMCopiesID;
+void initializeSILowerWWMCopiesLegacyPass(PassRegistry &);
+extern char &SILowerWWMCopiesLegacyID;
 
 void initializeSILowerI1CopiesLegacyPass(PassRegistry &);
 extern char &SILowerI1CopiesLegacyID;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 09a39d23d801b9..79c19646984e8e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -104,6 +104,7 @@ MACHINE_FUNCTION_PASS("gcn-dpp-combine", 
GCNDPPCombinePass())
 MACHINE_FUNCTION_PASS("si-load-store-opt", SILoadStoreOptimizerPass())
 MACHINE_FUNCTION_PASS("si-lower-control-flow", SILowerControlFlowPass())
 MACHINE_FUNCTION_PASS("si-lower-sgpr-spills", SILowerSGPRSpillsPass())
+MACHINE_FUNCTION_PASS("si-lower-wwm-copies", SILowerWWMCopiesPass())
 MACHINE_FUNCTION_PASS("si-opt-vgpr-liverange", SIOptimizeVGPRLiveRangePass())
 MACHINE_FUNCTION_PASS("si-optimize-exec-masking", SIOptimizeExecMaskingPass())
 MACHINE_FUNCTION_PASS("si-peephole-sdwa", SIPeepholeSDWAPass())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 53ec80b8f72049..5dd296152cb9eb 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -40,6 +40,7 @@
 #include "SILoadStoreOptimizer.h"
 #include "SILowerControlFlow.h"
 #include "SILowerSGPRSpills.h"
+#include "SILowerWWMCopies.h"
 #include "SIMachineFunctionInfo.h"
 #include "SIMachineScheduler.h"
 #include "SIOptimizeExecMasking.h"
@@ -482,7 +483,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void 
LLVMInitializeAMDGPUTarget() {
   initializeAMDGPUGlobalISelDivergenceLoweringPass(*PR);
   initializeAMDGPURegBankSelectPass(*PR);
   initializeAMDGPURegBankLegalizePass(*PR);
-  initializeSILowerWWMCopiesPass(*PR);
+  initializeSILowerWWMCopiesLegacyPass(*PR);
   initializeAMDGPUMarkLastScratchLoadPass(*PR);
   initializeSILowerSGPRSpillsLegacyPass(*PR);
   initializeSIFixSGPRCopiesLegacyPass(*PR);
@@ -1581,7 +1582,7 @@ bool GCNPassConfig::addRegAssignAndRewriteFast() {
   // For allocating other wwm register operands.
   addPass(createWWMRegAllocPass(false));
 
-  addPass(&SILowerWWMCopiesID);
+  addPass(&SILowerWWMCopiesLegacyID);
   addPass(&AMDGPUReserveWWMRegsID);
 
   // For allocating per-thread VGPRs.
@@ -1617,7 +1618,7 @@ bool GCNPassConfig::addRegAssignAndRewriteOptimized() {
 
   // For allocating other whole wave mode registers.
   addPass(createWWMRegAllocPass(true));
-  addPass(&SILowerWWMCopiesID);
+  addPass(&SILowerWWMCopiesLegacyID);
   addPass(createVirtRegRewriter(false));
   addPass(&AMDGPUReserveWWMRegsID);
 
diff --git a/llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp 
b/llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp
index d80a5f958273a9..ef384c2a1a2150 100644
--- a/llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp
+++ b/llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp
@@ -15,6 +15,7 @@
 //
 
//===--===//
 
+#include "SILowerWWMCopies.h"
 #include "AMDGPU.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
@@ -30,12 +31,30 @@ using namespace llvm;
 
 namespace {
 
-class SILowerWWMCopies : public MachineFunctionPass {
+class SILowerWWMCopies {
+public:
+  SILowerWWMCopies(LiveIntervals *LIS, SlotIndexes *SI, VirtRegMap *VRM)
+  : LIS(LIS), Indexes(SI), VRM(VRM) {}
+  bool run(MachineFunction &MF);
+
+private:
+  bool isSCCLiveAtMI(const MachineInstr &MI);
+  void addToWWMSpills(MachineFunction &MF, Register Reg);
+
+  LiveIntervals *LIS;
+  SlotIndexes *Indexes;
+  VirtRegMap *VRM;
+  const SIRegisterInfo *TRI;
+  const MachineRegisterInfo *MRI;
+  SIMachineFunctionInfo *MFI;

[llvm-branch-commits] [llvm] [AMDGPU][NewPM] Port SILowerWWMCopies to NPM (PR #123695)

2025-01-21 Thread Akshat Oke via llvm-branch-commits

https://github.com/optimisan ready_for_review 
https://github.com/llvm/llvm-project/pull/123695
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][NewPM] Port SILowerWWMCopies to NPM (PR #123695)

2025-01-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Akshat Oke (optimisan)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/123695.diff


6 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPU.h (+2-2) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def (+1) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+4-3) 
- (modified) llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp (+55-26) 
- (added) llvm/lib/Target/AMDGPU/SILowerWWMCopies.h (+22) 
- (added) llvm/test/CodeGen/AMDGPU/si-lower-wwm-copies.mir (+43) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 5d9a830f041a74..5f3db283c9b447 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -180,8 +180,8 @@ extern char &SIFixSGPRCopiesLegacyID;
 void initializeSIFixVGPRCopiesPass(PassRegistry &);
 extern char &SIFixVGPRCopiesID;
 
-void initializeSILowerWWMCopiesPass(PassRegistry &);
-extern char &SILowerWWMCopiesID;
+void initializeSILowerWWMCopiesLegacyPass(PassRegistry &);
+extern char &SILowerWWMCopiesLegacyID;
 
 void initializeSILowerI1CopiesLegacyPass(PassRegistry &);
 extern char &SILowerI1CopiesLegacyID;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 09a39d23d801b9..79c19646984e8e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -104,6 +104,7 @@ MACHINE_FUNCTION_PASS("gcn-dpp-combine", 
GCNDPPCombinePass())
 MACHINE_FUNCTION_PASS("si-load-store-opt", SILoadStoreOptimizerPass())
 MACHINE_FUNCTION_PASS("si-lower-control-flow", SILowerControlFlowPass())
 MACHINE_FUNCTION_PASS("si-lower-sgpr-spills", SILowerSGPRSpillsPass())
+MACHINE_FUNCTION_PASS("si-lower-wwm-copies", SILowerWWMCopiesPass())
 MACHINE_FUNCTION_PASS("si-opt-vgpr-liverange", SIOptimizeVGPRLiveRangePass())
 MACHINE_FUNCTION_PASS("si-optimize-exec-masking", SIOptimizeExecMaskingPass())
 MACHINE_FUNCTION_PASS("si-peephole-sdwa", SIPeepholeSDWAPass())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 53ec80b8f72049..5dd296152cb9eb 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -40,6 +40,7 @@
 #include "SILoadStoreOptimizer.h"
 #include "SILowerControlFlow.h"
 #include "SILowerSGPRSpills.h"
+#include "SILowerWWMCopies.h"
 #include "SIMachineFunctionInfo.h"
 #include "SIMachineScheduler.h"
 #include "SIOptimizeExecMasking.h"
@@ -482,7 +483,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void 
LLVMInitializeAMDGPUTarget() {
   initializeAMDGPUGlobalISelDivergenceLoweringPass(*PR);
   initializeAMDGPURegBankSelectPass(*PR);
   initializeAMDGPURegBankLegalizePass(*PR);
-  initializeSILowerWWMCopiesPass(*PR);
+  initializeSILowerWWMCopiesLegacyPass(*PR);
   initializeAMDGPUMarkLastScratchLoadPass(*PR);
   initializeSILowerSGPRSpillsLegacyPass(*PR);
   initializeSIFixSGPRCopiesLegacyPass(*PR);
@@ -1581,7 +1582,7 @@ bool GCNPassConfig::addRegAssignAndRewriteFast() {
   // For allocating other wwm register operands.
   addPass(createWWMRegAllocPass(false));
 
-  addPass(&SILowerWWMCopiesID);
+  addPass(&SILowerWWMCopiesLegacyID);
   addPass(&AMDGPUReserveWWMRegsID);
 
   // For allocating per-thread VGPRs.
@@ -1617,7 +1618,7 @@ bool GCNPassConfig::addRegAssignAndRewriteOptimized() {
 
   // For allocating other whole wave mode registers.
   addPass(createWWMRegAllocPass(true));
-  addPass(&SILowerWWMCopiesID);
+  addPass(&SILowerWWMCopiesLegacyID);
   addPass(createVirtRegRewriter(false));
   addPass(&AMDGPUReserveWWMRegsID);
 
diff --git a/llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp 
b/llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp
index d80a5f958273a9..ef384c2a1a2150 100644
--- a/llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp
+++ b/llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp
@@ -15,6 +15,7 @@
 //
 
//===--===//
 
+#include "SILowerWWMCopies.h"
 #include "AMDGPU.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
@@ -30,12 +31,30 @@ using namespace llvm;
 
 namespace {
 
-class SILowerWWMCopies : public MachineFunctionPass {
+class SILowerWWMCopies {
+public:
+  SILowerWWMCopies(LiveIntervals *LIS, SlotIndexes *SI, VirtRegMap *VRM)
+  : LIS(LIS), Indexes(SI), VRM(VRM) {}
+  bool run(MachineFunction &MF);
+
+private:
+  bool isSCCLiveAtMI(const MachineInstr &MI);
+  void addToWWMSpills(MachineFunction &MF, Register Reg);
+
+  LiveIntervals *LIS;
+  SlotIndexes *Indexes;
+  VirtRegMap *VRM;
+  const SIRegisterInfo *TRI;
+  const MachineRegisterInfo *MRI;
+  SIMachineFunctionInfo *MFI;
+};
+
+class SILowerWWMCopiesLegacy : public MachineFunctionPass {
 public:
   static char ID;
 
-  SILowerWWMCopies() : MachineFunctionPass(ID) {
-initializeSILowerWWMCopiesPass(*PassRegistry::getPassRegistry());
+  SILowerWWMCopiesLe

[llvm-branch-commits] [lld] [lld][LoongArch] Support TLSDESC GD/LD to IE/LE. (PR #123715)

2025-01-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-lld

Author: Zhaoxin Yang (ylzsx)


Changes

Support TLSDESC to initial-exec or local-exec optimizations. Introduce a new 
hook RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC and use existing 
R_RELAX_TLS_GD_TO_IE_ABS to support TLSDESC => IE, while use existing 
R_RELAX_TLS_GD_TO_LE to support TLSDESC => LE.

In normal or medium code model, there are two forms of code sequences:
* pcalau12i  $a0, %desc_pc_hi20(sym_desc)
* addi.d $a0, $a0, %desc_pc_lo12(sym_desc)
* ld.d   $ra, $a0, %desc_ld(sym_desc)
* jirl   $ra, $ra, %desc_call(sym_desc)
--
* pcaddi $a0, %desc_pcrel_20(sym_desc)
* ld.d   $ra, $a0, %desc_ld(sym_desc)
* jirl   $ra, $ra, %desc_call(sym_desc)

Convert to IE:
* pcalau12i $a0, %ie_pc_hi20(sym_ie)
* ld.[wd]   $a0, $a0, %ie_pc_lo12(sym_ie)

Convert to LE:
* lu12i.w $a0, %le_hi20(sym_le)  # le_hi20 != 0, otherwise nop
* ori $a0 $a0, %le_lo12(sym_le)

Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to convert the 
preceding instructions to NOPs, due to both forms of code sequence 
(corresponding to relocation combinations: 
R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and 
R_LARCH_TLS_DESC_PCREL20_S2) have same process.

FIXME: When relaxation enables, redundant NOPs can be removed. It will be 
implemented in a future patch.

Note: All forms of TLSDESC code sequences should not appear interleaved in the 
normal, medium or extreme code model, which compilers do not generate and lld 
is unsupported. This is thanks to the guard in PostRASchedulerList.cpp in llvm.
```
Calls are not scheduling boundaries before register allocation,
but post-ra we don't gain anything by scheduling across calls
since we don't need to worry about register pressure.
```

---

Patch is 33.17 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/123715.diff


7 Files Affected:

- (modified) lld/ELF/Arch/LoongArch.cpp (+145-1) 
- (modified) lld/ELF/InputSection.cpp (+1) 
- (modified) lld/ELF/Relocations.cpp (+22-16) 
- (modified) lld/ELF/Relocations.h (+1) 
- (modified) lld/test/ELF/loongarch-relax-tlsdesc.s (+141-115) 
- (modified) lld/test/ELF/loongarch-tlsdesc-pcrel20-s2.s (+40-19) 
- (modified) lld/test/ELF/loongarch-tlsdesc.s (+46-19) 


``diff
diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp
index ef25e741901d93..37614c3e9615d6 100644
--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -39,11 +39,14 @@ class LoongArch final : public TargetInfo {
   void relocate(uint8_t *loc, const Relocation &rel,
 uint64_t val) const override;
   bool relaxOnce(int pass) const override;
+  RelExpr adjustTlsExpr(RelType type, RelExpr expr) const override;
   void relocateAlloc(InputSectionBase &sec, uint8_t *buf) const override;
   void finalizeRelax(int passes) const override;
 
 private:
   void tlsIeToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
+  void tlsdescToIe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
+  void tlsdescToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
 };
 } // end anonymous namespace
 
@@ -61,6 +64,7 @@ enum Op {
   LU12I_W = 0x1400,
   PCADDI = 0x1800,
   PCADDU12I = 0x1c00,
+  PCALAU12I = 0x1a00,
   LD_W = 0x2880,
   LD_D = 0x28c0,
   JIRL = 0x4c00,
@@ -72,6 +76,7 @@ enum Reg {
   R_ZERO = 0,
   R_RA = 1,
   R_TP = 2,
+  R_A0 = 4,
   R_T0 = 12,
   R_T1 = 13,
   R_T2 = 14,
@@ -962,7 +967,8 @@ static bool relax(Ctx &ctx, InputSection &sec) {
 case R_LARCH_TLS_LD_PC_HI20:
 case R_LARCH_TLS_DESC_PC_HI20:
   // The overflow check for i+2 will be carried out in isPairRelaxable.
-  if (isPairRelaxable(relocs, i))
+  if (r.expr != RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC &&
+  r.expr != R_RELAX_TLS_GD_TO_LE && isPairRelaxable(relocs, i))
 relaxPCHi20Lo12(ctx, sec, i, loc, r, relocs[i + 2], remove);
   break;
 case R_LARCH_CALL36:
@@ -1047,6 +1053,103 @@ void LoongArch::tlsIeToLe(uint8_t *loc, const 
Relocation &rel,
   }
 }
 
+// Convert TLSDESC GD/LD to IE.
+// In normal or medium code model, there are two forms of code sequences:
+//  * pcalau12i  $a0, %desc_pc_hi20(sym_desc)
+//  * addi.d $a0, $a0, %desc_pc_lo12(sym_desc)
+//  * ld.d   $ra, $a0, %desc_ld(sym_desc)
+//  * jirl   $ra, $ra, %desc_call(sym_desc)
+//  --
+//  * pcaddi $a0, %desc_pcrel_20(a)
+//  * load $ra, $a0, %desc_ld(a)
+//  * jirl $ra, $ra, %desc_call(a)
+//
+// The code sequence obtained is as follows:
+//  * pcalau12i $a0, %ie_pc_hi20(sym_ie)
+//  * ld.[wd]   $a0, $a0, %ie_pc_lo12(sym_ie)
+//
+// Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to convert 
the
+// preceding instructions to NOPs, due to both forms of code sequence
+// (corresponding to relocation combinations:
+// R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and
+// R_LARCH_TLS_DESC_PCREL20_S2) have same process.
+//
+// When relaxation ena

[llvm-branch-commits] [lld] [lld][LoongArch] Support TLSDESC GD/LD to IE/LE. (PR #123715)

2025-01-21 Thread Zhaoxin Yang via llvm-branch-commits

https://github.com/ylzsx created 
https://github.com/llvm/llvm-project/pull/123715

Support TLSDESC to initial-exec or local-exec optimizations. Introduce a new 
hook RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC and use existing 
R_RELAX_TLS_GD_TO_IE_ABS to support TLSDESC => IE, while use existing 
R_RELAX_TLS_GD_TO_LE to support TLSDESC => LE.

In normal or medium code model, there are two forms of code sequences:
* pcalau12i  $a0, %desc_pc_hi20(sym_desc)
* addi.d $a0, $a0, %desc_pc_lo12(sym_desc)
* ld.d   $ra, $a0, %desc_ld(sym_desc)
* jirl   $ra, $ra, %desc_call(sym_desc)
--
* pcaddi $a0, %desc_pcrel_20(sym_desc)
* ld.d   $ra, $a0, %desc_ld(sym_desc)
* jirl   $ra, $ra, %desc_call(sym_desc)

Convert to IE:
* pcalau12i $a0, %ie_pc_hi20(sym_ie)
* ld.[wd]   $a0, $a0, %ie_pc_lo12(sym_ie)

Convert to LE:
* lu12i.w $a0, %le_hi20(sym_le)  # le_hi20 != 0, otherwise nop
* ori $a0 $a0, %le_lo12(sym_le)

Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to convert the 
preceding instructions to NOPs, due to both forms of code sequence 
(corresponding to relocation combinations: 
R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and 
R_LARCH_TLS_DESC_PCREL20_S2) have same process.

FIXME: When relaxation enables, redundant NOPs can be removed. It will be 
implemented in a future patch.

Note: All forms of TLSDESC code sequences should not appear interleaved in the 
normal, medium or extreme code model, which compilers do not generate and lld 
is unsupported. This is thanks to the guard in PostRASchedulerList.cpp in llvm.
```
Calls are not scheduling boundaries before register allocation,
but post-ra we don't gain anything by scheduling across calls
since we don't need to worry about register pressure.
```

>From dff3031fdb2ca3755b73e3b81e56f8008a409470 Mon Sep 17 00:00:00 2001
From: yangzhaoxin 
Date: Fri, 3 Jan 2025 14:29:17 +0800
Subject: [PATCH 1/4] [lld][LoongArch] Implement TLSDESC GD/LD to IE/LE.

Support TLSDESC to initial-exec or local-exec optimizations. Introduce a new 
hook
RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC and use existing 
R_RELAX_TLS_GD_TO_IE_ABS
to support TLSDESC => IE, while use existing R_RELAX_TLS_GD_TO_LE to support
TLSDESC => LE.

In normal or medium code model, there are two forms of code sequences:
 * pcalau12i  $a0, %desc_pc_hi20(sym_desc)
 * addi.d $a0, $a0, %desc_pc_lo12(sym_desc)
 * ld.d   $ra, $a0, %desc_ld(sym_desc)
 * jirl   $ra, $ra, %desc_call(sym_desc)
 --
 * pcaddi $a0, %desc_pcrel_20(sym_desc)
 * ld.d   $ra, $a0, %desc_ld(sym_desc)
 * jirl   $ra, $ra, %desc_call(sym_desc)

The code sequence obtained is as follows:
 * pcalau12i $a0, %ie_pc_hi20(sym_ie)
 * ld.[wd]   $a0, $a0, %ie_pc_lo12(sym_ie)

Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to convert the 
preceding
instructions to NOPs, due to both forms of code sequence (corresponding to 
relocation
combinations: R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and 
R_LARCH_TLS_DESC_PCREL20_S2)
have same process.

FIXME: When relaxation enables, redundant NOPs can be removed. It will
be implemented in a future patch.

Note: All forms of TLSDESC code sequences should not appear interleaved in the 
normal,
medium or extreme code model, which compilers do not generate and lld is 
unsupported.
This is thanks to the guard in PostRASchedulerList.cpp in llvm.
```
Calls are not scheduling boundaries before register allocation, but
post-ra we don't gain anything by scheduling across calls since we
don't need to worry about register pressure.
```
---
 lld/ELF/Arch/LoongArch.cpp | 146 -
 lld/ELF/InputSection.cpp   |   1 +
 lld/ELF/Relocations.cpp|  38 ++
 lld/ELF/Relocations.h  |   1 +
 4 files changed, 169 insertions(+), 17 deletions(-)

diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp
index ef25e741901d93..37614c3e9615d6 100644
--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -39,11 +39,14 @@ class LoongArch final : public TargetInfo {
   void relocate(uint8_t *loc, const Relocation &rel,
 uint64_t val) const override;
   bool relaxOnce(int pass) const override;
+  RelExpr adjustTlsExpr(RelType type, RelExpr expr) const override;
   void relocateAlloc(InputSectionBase &sec, uint8_t *buf) const override;
   void finalizeRelax(int passes) const override;
 
 private:
   void tlsIeToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
+  void tlsdescToIe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
+  void tlsdescToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
 };
 } // end anonymous namespace
 
@@ -61,6 +64,7 @@ enum Op {
   LU12I_W = 0x1400,
   PCADDI = 0x1800,
   PCADDU12I = 0x1c00,
+  PCALAU12I = 0x1a00,
   LD_W = 0x2880,
   LD_D = 0x28c0,
   JIRL = 0x4c00,
@@ -72,6 +76,7 @@ enum Reg {
   R_ZERO = 0,
   R_RA = 1,
   R_TP = 2,
+  R_A0 = 4,
   R_T0 = 12,
   R_T1 = 13,

[llvm-branch-commits] [llvm] DAG: Avoid breaking legal vector_shuffle with multiple uses (PR #123712)

2025-01-21 Thread Simon Pilgrim via llvm-branch-commits


@@ -23172,6 +23172,11 @@ SDValue DAGCombiner::visitEXTRACT_VECTOR_ELT(SDNode 
*N) {
 }
 
 if (SVInVec.getOpcode() == ISD::BUILD_VECTOR) {
+  // TODO: Check if shuffle mask is legal?
+  if (LegalOperations && TLI.isOperationLegal(ISD::VECTOR_SHUFFLE, VecVT) 
&&

RKSimon wrote:

Have you investigate using isShuffleMaskLegal instead?

https://github.com/llvm/llvm-project/pull/123712
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Avoid breaking legal vector_shuffle with multiple uses (PR #123712)

2025-01-21 Thread Matt Arsenault via llvm-branch-commits


@@ -23172,6 +23172,11 @@ SDValue DAGCombiner::visitEXTRACT_VECTOR_ELT(SDNode 
*N) {
 }
 
 if (SVInVec.getOpcode() == ISD::BUILD_VECTOR) {
+  // TODO: Check if shuffle mask is legal?
+  if (LegalOperations && TLI.isOperationLegal(ISD::VECTOR_SHUFFLE, VecVT) 
&&

arsenm wrote:

I started to, but the usage in the combiner seems inconsistent. Implementing it 
accurately results in regressions independently of this 

https://github.com/llvm/llvm-project/pull/123712
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] [lld][LoongArch] Support TLSDESC GD/LD to IE/LE. (PR #123715)

2025-01-21 Thread Zhaoxin Yang via llvm-branch-commits

https://github.com/ylzsx updated 
https://github.com/llvm/llvm-project/pull/123715

>From dff3031fdb2ca3755b73e3b81e56f8008a409470 Mon Sep 17 00:00:00 2001
From: yangzhaoxin 
Date: Fri, 3 Jan 2025 14:29:17 +0800
Subject: [PATCH 1/6] [lld][LoongArch] Implement TLSDESC GD/LD to IE/LE.

Support TLSDESC to initial-exec or local-exec optimizations. Introduce a new 
hook
RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC and use existing 
R_RELAX_TLS_GD_TO_IE_ABS
to support TLSDESC => IE, while use existing R_RELAX_TLS_GD_TO_LE to support
TLSDESC => LE.

In normal or medium code model, there are two forms of code sequences:
 * pcalau12i  $a0, %desc_pc_hi20(sym_desc)
 * addi.d $a0, $a0, %desc_pc_lo12(sym_desc)
 * ld.d   $ra, $a0, %desc_ld(sym_desc)
 * jirl   $ra, $ra, %desc_call(sym_desc)
 --
 * pcaddi $a0, %desc_pcrel_20(sym_desc)
 * ld.d   $ra, $a0, %desc_ld(sym_desc)
 * jirl   $ra, $ra, %desc_call(sym_desc)

The code sequence obtained is as follows:
 * pcalau12i $a0, %ie_pc_hi20(sym_ie)
 * ld.[wd]   $a0, $a0, %ie_pc_lo12(sym_ie)

Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to convert the 
preceding
instructions to NOPs, due to both forms of code sequence (corresponding to 
relocation
combinations: R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and 
R_LARCH_TLS_DESC_PCREL20_S2)
have same process.

FIXME: When relaxation enables, redundant NOPs can be removed. It will
be implemented in a future patch.

Note: All forms of TLSDESC code sequences should not appear interleaved in the 
normal,
medium or extreme code model, which compilers do not generate and lld is 
unsupported.
This is thanks to the guard in PostRASchedulerList.cpp in llvm.
```
Calls are not scheduling boundaries before register allocation, but
post-ra we don't gain anything by scheduling across calls since we
don't need to worry about register pressure.
```
---
 lld/ELF/Arch/LoongArch.cpp | 146 -
 lld/ELF/InputSection.cpp   |   1 +
 lld/ELF/Relocations.cpp|  38 ++
 lld/ELF/Relocations.h  |   1 +
 4 files changed, 169 insertions(+), 17 deletions(-)

diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp
index ef25e741901d93..37614c3e9615d6 100644
--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -39,11 +39,14 @@ class LoongArch final : public TargetInfo {
   void relocate(uint8_t *loc, const Relocation &rel,
 uint64_t val) const override;
   bool relaxOnce(int pass) const override;
+  RelExpr adjustTlsExpr(RelType type, RelExpr expr) const override;
   void relocateAlloc(InputSectionBase &sec, uint8_t *buf) const override;
   void finalizeRelax(int passes) const override;
 
 private:
   void tlsIeToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
+  void tlsdescToIe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
+  void tlsdescToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
 };
 } // end anonymous namespace
 
@@ -61,6 +64,7 @@ enum Op {
   LU12I_W = 0x1400,
   PCADDI = 0x1800,
   PCADDU12I = 0x1c00,
+  PCALAU12I = 0x1a00,
   LD_W = 0x2880,
   LD_D = 0x28c0,
   JIRL = 0x4c00,
@@ -72,6 +76,7 @@ enum Reg {
   R_ZERO = 0,
   R_RA = 1,
   R_TP = 2,
+  R_A0 = 4,
   R_T0 = 12,
   R_T1 = 13,
   R_T2 = 14,
@@ -962,7 +967,8 @@ static bool relax(Ctx &ctx, InputSection &sec) {
 case R_LARCH_TLS_LD_PC_HI20:
 case R_LARCH_TLS_DESC_PC_HI20:
   // The overflow check for i+2 will be carried out in isPairRelaxable.
-  if (isPairRelaxable(relocs, i))
+  if (r.expr != RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC &&
+  r.expr != R_RELAX_TLS_GD_TO_LE && isPairRelaxable(relocs, i))
 relaxPCHi20Lo12(ctx, sec, i, loc, r, relocs[i + 2], remove);
   break;
 case R_LARCH_CALL36:
@@ -1047,6 +1053,103 @@ void LoongArch::tlsIeToLe(uint8_t *loc, const 
Relocation &rel,
   }
 }
 
+// Convert TLSDESC GD/LD to IE.
+// In normal or medium code model, there are two forms of code sequences:
+//  * pcalau12i  $a0, %desc_pc_hi20(sym_desc)
+//  * addi.d $a0, $a0, %desc_pc_lo12(sym_desc)
+//  * ld.d   $ra, $a0, %desc_ld(sym_desc)
+//  * jirl   $ra, $ra, %desc_call(sym_desc)
+//  --
+//  * pcaddi $a0, %desc_pcrel_20(a)
+//  * load $ra, $a0, %desc_ld(a)
+//  * jirl $ra, $ra, %desc_call(a)
+//
+// The code sequence obtained is as follows:
+//  * pcalau12i $a0, %ie_pc_hi20(sym_ie)
+//  * ld.[wd]   $a0, $a0, %ie_pc_lo12(sym_ie)
+//
+// Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to convert 
the
+// preceding instructions to NOPs, due to both forms of code sequence
+// (corresponding to relocation combinations:
+// R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and
+// R_LARCH_TLS_DESC_PCREL20_S2) have same process.
+//
+// When relaxation enables, redundant NOPs can be removed.
+void LoongArch::tlsdescToIe(uint8_t *loc, const Relocation &rel,
+uint64_t val) const {
+  switch (rel.type) {
+

[llvm-branch-commits] [lld] [lld][LoongArch] GOT indirection to PC relative optimization. (PR #123743)

2025-01-21 Thread Zhaoxin Yang via llvm-branch-commits

ylzsx wrote:

I have submitted all the patches related to relaxation in lld for LoongArch. 
Below is a list for peer review:
  * users/ylzsx/r-got-to-pcrel
  │ https://github.com/llvm/llvm-project/pull/123743
  │
  * users/ylzsx/r-tlsdesc-to-iele-relax
  │ https://github.com/llvm/llvm-project/pull/123730
  │
  * users/ylzsx/r-tlsdesc-to-iele-norelax
  │ https://github.com/llvm/llvm-project/pull/123715
  │
  * users/ylzsx/r-tls-ie-to-le-relax
  │ https://github.com/llvm/llvm-project/pull/123702
  │
  * users/ylzsx/r-tls-ie-to-le-norelax
  │ https://github.com/llvm/llvm-project/pull/123680
  │
  * users/ylzsx/r-tlsdesc-noconversion
  │ https://github.com/llvm/llvm-project/pull/123677
  │
  * users/ylzsx/r-tls-noie
  │ https://github.com/llvm/llvm-project/pull/123600
  │
  * users/ylzsx/r-call36
  │ https://github.com/llvm/llvm-project/pull/123576
  │
  * users/ylzsx/r-pchi20lo12
  │ https://github.com/llvm/llvm-project/pull/123566
  │
  * main

https://github.com/llvm/llvm-project/pull/123743
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Custom lower 32-bit element shuffles (PR #123711)

2025-01-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/123711

>From 6434af5c57940d376b1acc281e090346a6a5bc22 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sat, 18 Jan 2025 08:19:10 +0700
Subject: [PATCH] AMDGPU: Custom lower 32-bit element shuffles

This is so we can try to make use of v_pk_mov_b32 when available.
Note this currently has little observable effect. The combiner
will undo the common extract of shuffle pattern. The lack
of test changes should demonstrate this change is minimally
correct.

We should probably try to make better use of wider extracts in
even aligned cases, but I'm trying to avoid some really ugly
regalloc regressions in some MFMA tests. The DAG scheduler ends
up doing a worse job if we use vector extracts, resulting
in failure to do 3 address conversion of MFMAs.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 85 +--
 1 file changed, 80 insertions(+), 5 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 1aeca7f370aa1b..b632c50dae0e35 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -419,8 +419,9 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
   }
 
   setOperationAction(ISD::VECTOR_SHUFFLE,
- {MVT::v8i32, MVT::v8f32, MVT::v16i32, MVT::v16f32},
- Expand);
+ {MVT::v4i32, MVT::v4f32, MVT::v8i32, MVT::v8f32,
+  MVT::v16i32, MVT::v16f32, MVT::v32i32, MVT::v32f32},
+ Custom);
 
   if (Subtarget->hasPkMovB32()) {
 // TODO: 16-bit element vectors should be legal with even aligned elements.
@@ -7589,15 +7590,38 @@ static bool elementPairIsContiguous(ArrayRef Mask, 
int Elt) {
   return Mask[Elt + 1] == Mask[Elt] + 1 && (Mask[Elt] % 2 == 0);
 }
 
+static bool elementPairIsOddToEven(ArrayRef Mask, int Elt) {
+  assert(Elt % 2 == 0);
+  return Mask[Elt] >= 0 && Mask[Elt + 1] >= 0 && (Mask[Elt] & 1) &&
+ !(Mask[Elt + 1] & 1);
+}
+
 SDValue SITargetLowering::lowerVECTOR_SHUFFLE(SDValue Op,
   SelectionDAG &DAG) const {
   SDLoc SL(Op);
   EVT ResultVT = Op.getValueType();
   ShuffleVectorSDNode *SVN = cast(Op);
   MVT EltVT = ResultVT.getVectorElementType().getSimpleVT();
-  MVT PackVT = MVT::getVectorVT(EltVT, 2);
+  const int NewSrcNumElts = 2;
+  MVT PackVT = MVT::getVectorVT(EltVT, NewSrcNumElts);
   int SrcNumElts = Op.getOperand(0).getValueType().getVectorNumElements();
 
+  // Break up the shuffle into registers sized pieces.
+  //
+  // We're trying to form sub-shuffles that the register allocation pipeline
+  // won't be able to figure out, like how to use v_pk_mov_b32 to do a register
+  // blend or 16-bit op_sel. It should be able to figure out how to reassemble 
a
+  // pair of copies into a consecutive register copy, so use the ordinary
+  // extract_vector_elt lowering unless we can use the shuffle.
+  //
+  // TODO: This is a bit of hack, and we should probably always use
+  // extract_subvector for the largest possible subvector we can (or at least
+  // use it for PackVT aligned pieces). However we have worse support for
+  // combines on them don't directly treat extract_subvector / insert_subvector
+  // as legal. The DAG scheduler also ends up doing a worse job with the
+  // extract_subvectors.
+  const bool ShouldUseConsecutiveExtract = EltVT.getSizeInBits() == 16;
+
   // vector_shuffle <0,1,6,7> lhs, rhs
   // -> concat_vectors (extract_subvector lhs, 0), (extract_subvector rhs, 2)
   //
@@ -7608,9 +7632,18 @@ SDValue SITargetLowering::lowerVECTOR_SHUFFLE(SDValue Op,
   // -> concat_vectors (extract_subvector rhs, 2), (extract_subvector lhs, 0)
 
   // Avoid scalarizing when both halves are reading from consecutive elements.
-  SmallVector Pieces;
+
+  // If we're treating 2 element shuffles as legal, also create odd-to-even
+  // shuffles of neighboring pairs.
+  //
+  // vector_shuffle <3,2,7,6> lhs, rhs
+  //  -> concat_vectors vector_shuffle <1, 0> (extract_subvector lhs, 0)
+  //vector_shuffle <1, 0> (extract_subvector rhs, 2)
+
+  SmallVector Pieces;
   for (int I = 0, N = ResultVT.getVectorNumElements(); I != N; I += 2) {
-if (elementPairIsContiguous(SVN->getMask(), I)) {
+if (ShouldUseConsecutiveExtract &&
+elementPairIsContiguous(SVN->getMask(), I)) {
   const int Idx = SVN->getMaskElt(I);
   int VecIdx = Idx < SrcNumElts ? 0 : 1;
   int EltIdx = Idx < SrcNumElts ? Idx : Idx - SrcNumElts;
@@ -7618,6 +7651,48 @@ SDValue SITargetLowering::lowerVECTOR_SHUFFLE(SDValue Op,
SVN->getOperand(VecIdx),
DAG.getConstant(EltIdx, SL, MVT::i32));
   Pieces.push_back(SubVec);
+} else if (elementPairIsOddToEven(SVN->getMask(), I) &&
+   isOperationLegal(ISD::VECTOR_SHUFFLE, PackVT)) {
+  int I

[llvm-branch-commits] [llvm] [mlir] [mlir][LLVM] add argument and result attributes to llvm.call (PR #123177)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jeanPerier updated 
https://github.com/llvm/llvm-project/pull/123177

>From 137705661c184ea1530982c19163341933ab421e Mon Sep 17 00:00:00 2001
From: Jean Perier 
Date: Wed, 15 Jan 2025 09:09:53 -0800
Subject: [PATCH 1/4] [mlir][LLVM] add argument and result attributes to
 llvm.call

---
 llvm/include/llvm/IR/InstrTypes.h | 11 +++
 mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td   |  4 +-
 .../include/mlir/Target/LLVMIR/ModuleImport.h |  8 ++-
 .../mlir/Target/LLVMIR/ModuleTranslation.h|  9 ++-
 mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp| 67 +--
 .../LLVMIR/LLVMToLLVMIRTranslation.cpp| 21 ++
 mlir/lib/Target/LLVMIR/ModuleImport.cpp   | 35 ++
 mlir/lib/Target/LLVMIR/ModuleTranslation.cpp  | 52 ++
 mlir/test/Dialect/LLVMIR/invalid.mlir |  2 +
 mlir/test/Dialect/LLVMIR/roundtrip.mlir   | 20 ++
 .../LLVMIR/Import/call-argument-attributes.ll | 25 +++
 .../LLVMIR/call-argument-attributes.mlir  | 17 +
 12 files changed, 230 insertions(+), 41 deletions(-)
 create mode 100644 mlir/test/Target/LLVMIR/Import/call-argument-attributes.ll
 create mode 100644 mlir/test/Target/LLVMIR/call-argument-attributes.mlir

diff --git a/llvm/include/llvm/IR/InstrTypes.h 
b/llvm/include/llvm/IR/InstrTypes.h
index b8d9cc10292f4a..0e391325eebdce 100644
--- a/llvm/include/llvm/IR/InstrTypes.h
+++ b/llvm/include/llvm/IR/InstrTypes.h
@@ -1490,6 +1490,11 @@ class CallBase : public Instruction {
 Attrs = Attrs.addRetAttribute(getContext(), Attr);
   }
 
+  /// Adds attributes to the return value.
+  void addRetAttrs(const AttrBuilder &B) {
+Attrs = Attrs.addRetAttributes(getContext(), B);
+  }
+
   /// Adds the attribute to the indicated argument
   void addParamAttr(unsigned ArgNo, Attribute::AttrKind Kind) {
 assert(ArgNo < arg_size() && "Out of bounds");
@@ -1502,6 +1507,12 @@ class CallBase : public Instruction {
 Attrs = Attrs.addParamAttribute(getContext(), ArgNo, Attr);
   }
 
+  /// Adds attributes to the indicated argument
+  void addParamAttrs(unsigned ArgNo, const AttrBuilder &B) {
+assert(ArgNo < arg_size() && "Out of bounds");
+Attrs = Attrs.addParamAttributes(getContext(), ArgNo, B);
+  }
+
   /// removes the attribute from the list of attributes.
   void removeAttributeAtIndex(unsigned i, Attribute::AttrKind Kind) {
 Attrs = Attrs.removeAttributeAtIndex(getContext(), i, Kind);
diff --git a/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td 
b/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
index b2281536aa40b6..85f5c6cc8cca07 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
@@ -755,7 +755,9 @@ def LLVM_CallOp : LLVM_MemAccessOpBase<"call",
   VariadicOfVariadic:$op_bundle_operands,
   DenseI32ArrayAttr:$op_bundle_sizes,
-  OptionalAttr:$op_bundle_tags);
+  OptionalAttr:$op_bundle_tags,
+  OptionalAttr:$arg_attrs,
+  OptionalAttr:$res_attrs);
   // Append the aliasing related attributes defined in LLVM_MemAccessOpBase.
   let arguments = !con(args, aliasAttrs);
   let results = (outs Optional:$result);
diff --git a/mlir/include/mlir/Target/LLVMIR/ModuleImport.h 
b/mlir/include/mlir/Target/LLVMIR/ModuleImport.h
index 33c9af7c6335a4..86e1d6a04cd096 100644
--- a/mlir/include/mlir/Target/LLVMIR/ModuleImport.h
+++ b/mlir/include/mlir/Target/LLVMIR/ModuleImport.h
@@ -326,14 +326,18 @@ class ModuleImport {
SmallVectorImpl &types,
SmallVectorImpl &operands,
bool allowInlineAsm = false);
-  /// Converts the parameter attributes attached to `func` and adds them to the
-  /// `funcOp`.
+  /// Converts the parameter and result attributes attached to `func` and adds
+  /// them to the `funcOp`.
   void convertParameterAttributes(llvm::Function *func, LLVMFuncOp funcOp,
   OpBuilder &builder);
   /// Converts the AttributeSet of one parameter in LLVM IR to a corresponding
   /// DictionaryAttr for the LLVM dialect.
   DictionaryAttr convertParameterAttribute(llvm::AttributeSet llvmParamAttrs,
OpBuilder &builder);
+  /// Converts the parameter and result attributes attached to `call` and adds
+  /// them to the `callOp`.
+  void convertParameterAttributes(llvm::CallBase *call, CallOpInterface callOp,
+  OpBuilder &builder);
   /// Returns the builtin type equivalent to the given LLVM dialect type or
   /// nullptr if there is no equivalent. The returned type can be used to 
create
   /// an attribute for a GlobalOp or a ConstantOp.
diff --git a/mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h 
b/mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h
index 1b62437761ed9d..88fc17ca4fda24 100644
--- a/mlir/include/mlir/Target/LL

[llvm-branch-commits] [llvm] [AMDGPU][NewPM] Port SILowerWWMCopies to NPM (PR #123695)

2025-01-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/123695
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [tablegen] Implement multiple outputs for patterns (PR #123775)

2025-01-21 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff 3fa2cf6b1a6a4509635c2c90bed2876beb748110 
7e7ba17845cd66e3fcbbe19097f4a60407e13ef4 --extensions cpp,h -- 
llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp 
llvm/utils/TableGen/Common/CodeGenDAGPatterns.h 
llvm/utils/TableGen/DAGISelEmitter.cpp 
llvm/utils/TableGen/DAGISelMatcherGen.cpp 
llvm/utils/TableGen/FastISelEmitter.cpp 
llvm/utils/TableGen/GlobalISelEmitter.cpp
``





View the diff from clang-format here.


``diff
diff --git a/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp 
b/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp
index fcff67b030..858910ab39 100644
--- a/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp
+++ b/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp
@@ -2810,7 +2810,7 @@ TreePattern::TreePattern(const Record *TheRec,
 
 TreePattern TreePattern::clone() const {
   TreePattern New = *this;
-  for(TreePatternNodePtr& Tree : New.Trees)
+  for (TreePatternNodePtr &Tree : New.Trees)
 Tree = Tree->clone();
   return New;
 }
@@ -2828,7 +2828,6 @@ bool TreePattern::hasProperTypeByHwMode() const {
   [](auto &T) { return T->hasProperTypeByHwMode(); });
 }
 
-
 void TreePattern::ComputeNamedNodes() {
   for (TreePatternNodePtr &Tree : Trees)
 ComputeNamedNodes(*Tree);
@@ -4392,10 +4391,10 @@ void CodeGenDAGPatterns::ParseOnePattern(
   // tree which has possible types.
   for (const auto &InTree : Pattern.getTrees()) {
 if (InTree->hasPossibleType())
-  AddPatternToMatch(&Pattern,
-PatternToMatch(TheDef, Preds, InTree,
-   ExpandedOutPattern, InstImpResults,
-   Complexity, TheDef->getID(), 
ShouldIgnore));
+  AddPatternToMatch(
+  &Pattern, PatternToMatch(TheDef, Preds, InTree, ExpandedOutPattern,
+   InstImpResults, Complexity, TheDef->getID(),
+   ShouldIgnore));
   }
 }
 
@@ -4464,9 +4463,9 @@ void CodeGenDAGPatterns::ExpandHwModeBasedTypes() {
 return;
 
 PatternsToMatch.emplace_back(P.getSrcRecord(), P.getPredicates(),
- std::move(NewSrc), NewDst,
- P.getDstRegs(), P.getAddedComplexity(),
- getNewUID(), P.getGISelShouldIgnore(), Check);
+ std::move(NewSrc), NewDst, P.getDstRegs(),
+ P.getAddedComplexity(), getNewUID(),
+ P.getGISelShouldIgnore(), Check);
   };
 
   for (PatternToMatch &P : Copy) {
diff --git a/llvm/utils/TableGen/FastISelEmitter.cpp 
b/llvm/utils/TableGen/FastISelEmitter.cpp
index d5f850be44..a452640078 100644
--- a/llvm/utils/TableGen/FastISelEmitter.cpp
+++ b/llvm/utils/TableGen/FastISelEmitter.cpp
@@ -591,8 +591,7 @@ void FastISelMap::collectPatterns(const CodeGenDAGPatterns 
&CGP) {
 // Ok, we found a pattern that we can handle. Remember it.
 InstructionMemo Memo(
 Pattern.getDstPattern().getOnlyTree()->getOperator()->getName(), DstRC,
-std::move(SubRegNo), std::move(PhysRegInputs),
- PredicateCheck);
+std::move(SubRegNo), std::move(PhysRegInputs), PredicateCheck);
 
 int complexity = Pattern.getPatternComplexity(CGP);
 

``




https://github.com/llvm/llvm-project/pull/123775
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang][CodeComplete] Use HeuristicResolver to resolve DependentNamTypes (PR #123818)

2025-01-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Nathan Ridge (HighCommander4)


Changes

Fixes https://github.com/clangd/clangd/issues/1249

---
Full diff: https://github.com/llvm/llvm-project/pull/123818.diff


2 Files Affected:

- (modified) clang/lib/Sema/SemaCodeComplete.cpp (+18-7) 
- (modified) clang/test/CodeCompletion/member-access.cpp (+16) 


``diff
diff --git a/clang/lib/Sema/SemaCodeComplete.cpp 
b/clang/lib/Sema/SemaCodeComplete.cpp
index 69cda6e68bd36b..d349928e1a171b 100644
--- a/clang/lib/Sema/SemaCodeComplete.cpp
+++ b/clang/lib/Sema/SemaCodeComplete.cpp
@@ -5736,11 +5736,20 @@ class ConceptInfo {
 // In particular, when E->getType() is DependentTy, try to guess a likely type.
 // We accept some lossiness (like dropping parameters).
 // We only try to handle common expressions on the LHS of MemberExpr.
-QualType getApproximateType(const Expr *E) {
+QualType getApproximateType(const Expr *E, HeuristicResolver &Resolver) {
   if (E->getType().isNull())
 return QualType();
   E = E->IgnoreParenImpCasts();
   QualType Unresolved = E->getType();
+  // Resolve DependentNameType
+  if (const auto *DNT = Unresolved->getAs()) {
+auto Decls = Resolver.resolveDependentNameType(DNT);
+if (Decls.size() == 1) {
+  if (const auto *TD = dyn_cast(Decls[0])) {
+return QualType(TD->getTypeForDecl(), 0);
+  }
+}
+  }
   // We only resolve DependentTy, or undeduced autos (including auto* etc).
   if (!Unresolved->isSpecificBuiltinType(BuiltinType::Dependent)) {
 AutoType *Auto = Unresolved->getContainedAutoType();
@@ -5749,7 +5758,7 @@ QualType getApproximateType(const Expr *E) {
   }
   // A call: approximate-resolve callee to a function type, get its return type
   if (const CallExpr *CE = llvm::dyn_cast(E)) {
-QualType Callee = getApproximateType(CE->getCallee());
+QualType Callee = getApproximateType(CE->getCallee(), Resolver);
 if (Callee.isNull() ||
 Callee->isSpecificPlaceholderType(BuiltinType::BoundMember))
   Callee = Expr::findBoundMemberType(CE->getCallee());
@@ -5792,7 +5801,7 @@ QualType getApproximateType(const Expr *E) {
   if (const auto *CDSME = llvm::dyn_cast(E)) {
 QualType Base = CDSME->isImplicitAccess()
 ? CDSME->getBaseType()
-: getApproximateType(CDSME->getBase());
+: getApproximateType(CDSME->getBase(), Resolver);
 if (CDSME->isArrow() && !Base.isNull())
   Base = Base->getPointeeType(); // could handle unique_ptr etc here?
 auto *RD =
@@ -5813,14 +5822,15 @@ QualType getApproximateType(const Expr *E) {
   if (const auto *DRE = llvm::dyn_cast(E)) {
 if (const auto *VD = llvm::dyn_cast(DRE->getDecl())) {
   if (VD->hasInit())
-return getApproximateType(VD->getInit());
+return getApproximateType(VD->getInit(), Resolver);
 }
   }
   if (const auto *UO = llvm::dyn_cast(E)) {
 if (UO->getOpcode() == UnaryOperatorKind::UO_Deref) {
   // We recurse into the subexpression because it could be of dependent
   // type.
-  if (auto Pointee = 
getApproximateType(UO->getSubExpr())->getPointeeType();
+  if (auto Pointee =
+  getApproximateType(UO->getSubExpr(), Resolver)->getPointeeType();
   !Pointee.isNull())
 return Pointee;
   // Our caller expects a non-null result, even though the SubType is
@@ -5857,7 +5867,8 @@ void SemaCodeCompletion::CodeCompleteMemberReferenceExpr(
   SemaRef.PerformMemberExprBaseConversion(Base, IsArrow);
   if (ConvertedBase.isInvalid())
 return;
-  QualType ConvertedBaseType = getApproximateType(ConvertedBase.get());
+  QualType ConvertedBaseType =
+  getApproximateType(ConvertedBase.get(), Resolver);
 
   enum CodeCompletionContext::Kind contextKind;
 
@@ -5896,7 +5907,7 @@ void SemaCodeCompletion::CodeCompleteMemberReferenceExpr(
   return false;
 Base = ConvertedBase.get();
 
-QualType BaseType = getApproximateType(Base);
+QualType BaseType = getApproximateType(Base, Resolver);
 if (BaseType.isNull())
   return false;
 ExprValueKind BaseKind = Base->getValueKind();
diff --git a/clang/test/CodeCompletion/member-access.cpp 
b/clang/test/CodeCompletion/member-access.cpp
index ab6dc69bf2923d..bf35f7ad021f71 100644
--- a/clang/test/CodeCompletion/member-access.cpp
+++ b/clang/test/CodeCompletion/member-access.cpp
@@ -401,3 +401,19 @@ struct node {
   }
 };
 }
+
+namespace dependent_nested_class {
+template 
+struct Foo {
+  struct Bar {
+int field;
+  };
+};
+template 
+void f() {
+  typename Foo::Bar bar;
+  bar.field;
+  // RUN: %clang_cc1 -fsyntax-only -code-completion-at=%s:415:7 %s -o - | 
FileCheck -check-prefix=CHECK-DEPENDENT-NESTEDCLASS %s
+  // CHECK-DEPENDENT-NESTEDCLASS: [#int#]field
+}
+}

``




https://github.com/llvm/llvm-project/pull/123818
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://l

[llvm-branch-commits] [clang] [clang][CodeComplete] Use HeuristicResolver to resolve DependentNamTypes (PR #123818)

2025-01-21 Thread Nathan Ridge via llvm-branch-commits

https://github.com/HighCommander4 created 
https://github.com/llvm/llvm-project/pull/123818

Fixes https://github.com/clangd/clangd/issues/1249

>From 58029449d63af7f452820dd02aba0d10134f588c Mon Sep 17 00:00:00 2001
From: Nathan Ridge 
Date: Tue, 21 Jan 2025 15:56:27 -0500
Subject: [PATCH] [clang][CodeComplete] Use HeuristicResolver to resolve
 DependentNameTypes

Fixes https://github.com/clangd/clangd/issues/1249
---
 clang/lib/Sema/SemaCodeComplete.cpp | 25 +++--
 clang/test/CodeCompletion/member-access.cpp | 16 +
 2 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/clang/lib/Sema/SemaCodeComplete.cpp 
b/clang/lib/Sema/SemaCodeComplete.cpp
index 69cda6e68bd36b..d349928e1a171b 100644
--- a/clang/lib/Sema/SemaCodeComplete.cpp
+++ b/clang/lib/Sema/SemaCodeComplete.cpp
@@ -5736,11 +5736,20 @@ class ConceptInfo {
 // In particular, when E->getType() is DependentTy, try to guess a likely type.
 // We accept some lossiness (like dropping parameters).
 // We only try to handle common expressions on the LHS of MemberExpr.
-QualType getApproximateType(const Expr *E) {
+QualType getApproximateType(const Expr *E, HeuristicResolver &Resolver) {
   if (E->getType().isNull())
 return QualType();
   E = E->IgnoreParenImpCasts();
   QualType Unresolved = E->getType();
+  // Resolve DependentNameType
+  if (const auto *DNT = Unresolved->getAs()) {
+auto Decls = Resolver.resolveDependentNameType(DNT);
+if (Decls.size() == 1) {
+  if (const auto *TD = dyn_cast(Decls[0])) {
+return QualType(TD->getTypeForDecl(), 0);
+  }
+}
+  }
   // We only resolve DependentTy, or undeduced autos (including auto* etc).
   if (!Unresolved->isSpecificBuiltinType(BuiltinType::Dependent)) {
 AutoType *Auto = Unresolved->getContainedAutoType();
@@ -5749,7 +5758,7 @@ QualType getApproximateType(const Expr *E) {
   }
   // A call: approximate-resolve callee to a function type, get its return type
   if (const CallExpr *CE = llvm::dyn_cast(E)) {
-QualType Callee = getApproximateType(CE->getCallee());
+QualType Callee = getApproximateType(CE->getCallee(), Resolver);
 if (Callee.isNull() ||
 Callee->isSpecificPlaceholderType(BuiltinType::BoundMember))
   Callee = Expr::findBoundMemberType(CE->getCallee());
@@ -5792,7 +5801,7 @@ QualType getApproximateType(const Expr *E) {
   if (const auto *CDSME = llvm::dyn_cast(E)) {
 QualType Base = CDSME->isImplicitAccess()
 ? CDSME->getBaseType()
-: getApproximateType(CDSME->getBase());
+: getApproximateType(CDSME->getBase(), Resolver);
 if (CDSME->isArrow() && !Base.isNull())
   Base = Base->getPointeeType(); // could handle unique_ptr etc here?
 auto *RD =
@@ -5813,14 +5822,15 @@ QualType getApproximateType(const Expr *E) {
   if (const auto *DRE = llvm::dyn_cast(E)) {
 if (const auto *VD = llvm::dyn_cast(DRE->getDecl())) {
   if (VD->hasInit())
-return getApproximateType(VD->getInit());
+return getApproximateType(VD->getInit(), Resolver);
 }
   }
   if (const auto *UO = llvm::dyn_cast(E)) {
 if (UO->getOpcode() == UnaryOperatorKind::UO_Deref) {
   // We recurse into the subexpression because it could be of dependent
   // type.
-  if (auto Pointee = 
getApproximateType(UO->getSubExpr())->getPointeeType();
+  if (auto Pointee =
+  getApproximateType(UO->getSubExpr(), Resolver)->getPointeeType();
   !Pointee.isNull())
 return Pointee;
   // Our caller expects a non-null result, even though the SubType is
@@ -5857,7 +5867,8 @@ void SemaCodeCompletion::CodeCompleteMemberReferenceExpr(
   SemaRef.PerformMemberExprBaseConversion(Base, IsArrow);
   if (ConvertedBase.isInvalid())
 return;
-  QualType ConvertedBaseType = getApproximateType(ConvertedBase.get());
+  QualType ConvertedBaseType =
+  getApproximateType(ConvertedBase.get(), Resolver);
 
   enum CodeCompletionContext::Kind contextKind;
 
@@ -5896,7 +5907,7 @@ void SemaCodeCompletion::CodeCompleteMemberReferenceExpr(
   return false;
 Base = ConvertedBase.get();
 
-QualType BaseType = getApproximateType(Base);
+QualType BaseType = getApproximateType(Base, Resolver);
 if (BaseType.isNull())
   return false;
 ExprValueKind BaseKind = Base->getValueKind();
diff --git a/clang/test/CodeCompletion/member-access.cpp 
b/clang/test/CodeCompletion/member-access.cpp
index ab6dc69bf2923d..bf35f7ad021f71 100644
--- a/clang/test/CodeCompletion/member-access.cpp
+++ b/clang/test/CodeCompletion/member-access.cpp
@@ -401,3 +401,19 @@ struct node {
   }
 };
 }
+
+namespace dependent_nested_class {
+template 
+struct Foo {
+  struct Bar {
+int field;
+  };
+};
+template 
+void f() {
+  typename Foo::Bar bar;
+  bar.field;
+  // RUN: %clang_cc1 -fsyntax-only -code-completion-at=%s:415:7 %s -o - | 
FileCheck -check-prefix=CHECK-DEPENDENT-NESTEDCLASS %s
+  // CHEC

[llvm-branch-commits] [clang] [HLSL] Introduce address space `hlsl_constant(2)` for constant buffer declarations (PR #123411)

2025-01-21 Thread Steven Perron via llvm-branch-commits

https://github.com/s-perron approved this pull request.

LGTM. The SPIR-V code seems correct. It would be useful to have a SPIR-V 
specific test.

https://github.com/llvm/llvm-project/pull/123411
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Introduce address space `hlsl_constant(2)` for constant buffer declarations (PR #123411)

2025-01-21 Thread Steven Perron via llvm-branch-commits

https://github.com/s-perron edited 
https://github.com/llvm/llvm-project/pull/123411
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Introduce address space `hlsl_constant(2)` for constant buffer declarations (PR #123411)

2025-01-21 Thread Steven Perron via llvm-branch-commits




s-perron wrote:

Could we add a test for SPIR-V to make sure it gets the correct address space 
as well? They have different target info, so the are different code paths.

https://github.com/llvm/llvm-project/pull/123411
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [tablegen] Implement multiple outputs for patterns (PR #123775)

2025-01-21 Thread Tomas Matheson via llvm-branch-commits

https://github.com/tmatheson-arm created 
https://github.com/llvm/llvm-project/pull/123775

None

>From 85c7ec7fee92ba634f3d2bec502eab94d4fbf18e Mon Sep 17 00:00:00 2001
From: Tomas Matheson 
Date: Fri, 9 Feb 2024 15:56:02 +
Subject: [PATCH 1/4] Add TreePattern constructor that takes multiple patterns

---
 llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp | 8 
 llvm/utils/TableGen/Common/CodeGenDAGPatterns.h   | 4 
 2 files changed, 12 insertions(+)

diff --git a/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp 
b/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp
index b95fe4eedda83e..5a9aec2ed7e5e5 100644
--- a/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp
+++ b/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp
@@ -2800,6 +2800,14 @@ TreePattern::TreePattern(const Record *TheRec, 
TreePatternNodePtr Pat,
   Trees.push_back(Pat);
 }
 
+TreePattern::TreePattern(Record *TheRec,
+ const std::vector &Pats,
+ bool isInput, CodeGenDAGPatterns &cdp)
+: TheRecord(TheRec), CDP(cdp), isInputPattern(isInput), HasError(false),
+  Infer(*this) {
+  Trees = Pats;
+}
+
 void TreePattern::error(const Twine &Msg) {
   if (HasError)
 return;
diff --git a/llvm/utils/TableGen/Common/CodeGenDAGPatterns.h 
b/llvm/utils/TableGen/Common/CodeGenDAGPatterns.h
index f8c39172938256..d06623c743fe15 100644
--- a/llvm/utils/TableGen/Common/CodeGenDAGPatterns.h
+++ b/llvm/utils/TableGen/Common/CodeGenDAGPatterns.h
@@ -925,8 +925,12 @@ class TreePattern {
   CodeGenDAGPatterns &ise);
   TreePattern(const Record *TheRec, const DagInit *Pat, bool isInput,
   CodeGenDAGPatterns &ise);
+  /// Construct from a single pattern
   TreePattern(const Record *TheRec, TreePatternNodePtr Pat, bool isInput,
   CodeGenDAGPatterns &ise);
+  /// Construct from multiple patterns
+  TreePattern(Record *TheRec, const std::vector &Pat,
+  bool isInput, CodeGenDAGPatterns &ise);
 
   /// getTrees - Return the tree patterns which corresponds to this pattern.
   ///

>From daf6bcf1af8600eff9876e366c4c9334f30df100 Mon Sep 17 00:00:00 2001
From: Tomas Matheson 
Date: Fri, 9 Feb 2024 15:57:38 +
Subject: [PATCH 2/4] Add TreePattern::hasProperTypeByHwMode

Change-Id: I3ac2e64b86233dd4c8b4f7f6effa50ed29759f25
---
 llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp | 6 ++
 llvm/utils/TableGen/Common/CodeGenDAGPatterns.h   | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp 
b/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp
index 5a9aec2ed7e5e5..96a98c2bac0250 100644
--- a/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp
+++ b/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp
@@ -2816,6 +2816,12 @@ void TreePattern::error(const Twine &Msg) {
   HasError = true;
 }
 
+bool TreePattern::hasProperTypeByHwMode() const {
+  return llvm::any_of(getTrees(),
+  [](auto &T) { return T->hasProperTypeByHwMode(); });
+}
+
+
 void TreePattern::ComputeNamedNodes() {
   for (TreePatternNodePtr &Tree : Trees)
 ComputeNamedNodes(*Tree);
diff --git a/llvm/utils/TableGen/Common/CodeGenDAGPatterns.h 
b/llvm/utils/TableGen/Common/CodeGenDAGPatterns.h
index d06623c743fe15..3fdb70a91544e1 100644
--- a/llvm/utils/TableGen/Common/CodeGenDAGPatterns.h
+++ b/llvm/utils/TableGen/Common/CodeGenDAGPatterns.h
@@ -988,6 +988,8 @@ class TreePattern {
 
   TypeInfer &getInfer() { return Infer; }
 
+  bool hasProperTypeByHwMode() const;
+
   void print(raw_ostream &OS) const;
   void dump() const;
 

>From 0d2594fac0c3eb8b3bd3e43d62cd58ad4afb7c8c Mon Sep 17 00:00:00 2001
From: Tomas Matheson 
Date: Sat, 10 Feb 2024 11:21:09 +
Subject: [PATCH 3/4] Implement multiple output patterns

See also D154945. It took me a while to track down the source of the
problems with the existing patterns.

Tablegen patterns for instrucion selection can generate multiple
instructions, but only if there is a data dependency between them so
that they can be expressed as a DAG. For example, something like:

(store (load %src), %dst)
For some patterns, we might want to generate a sequence of instructions
which do not have a data dependency. For example on AArch64 some atomic
instructions are implemented like this:

LDP %ptr
DMB ISH
Currently, sequences like this can not be selected with tablegen
patterns. To work around this we need to do custom selection, which has
several disadvantages compared to using patterns, such as needing
separate implementations for SelectionDAG and GlobalISel.

This patch adds basic support for tablegen Patterns which have a list
of output instructions. Pattern already has the ability to express this
but it looks like it was never implemented.

Multiple result instructions in an output pattern will be chained
together.

The GlobalISel pattern importer will skip these patterns for now. I
intend to implement this soon.
---
 .../dag-isel-multiple-instructions.td | 204 ++

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From 4205f3f94a8a991919569a02e549cb7d649c876b Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 25 +++--
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 52 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 15 ++
 .../X86/expand-atomic-non-integer.ll  | 14 +
 4 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index a75fa688d87a8d..490cc9aa4c3c80 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2060,9 +2060,28 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's T scalar type to I's <2 x T/2> vector type
+  if (I->getType()->getScalarType()->isIntOrPtrTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+TypeSize Size = Result->getType()->getPrimitiveSizeInBits();
+assert((unsigned)Size % 2 == 0);
+unsigned HalfSize = (unsigned)Size / 2;
+Value *Lo =
+Builder.CreateTrunc(Result, IntegerType::get(Ctx, HalfSize));
+Value *RS = Builder.CreateLShr(
+Result, ConstantInt::get(IntegerType::get(Ctx, Size), HalfSize));
+Value *Hi = Builder.CreateTrunc(RS, IntegerType::get(Ctx, HalfSize));
+Value *Vec = Builder.CreateInsertElement(
+VectorType::get(IntegerType::get(Ctx, HalfSize),
+cast(I->getType())->getElementCount()),
+Lo, ConstantInt::get(IntegerType::get(Ctx, 32), 0));
+Vec = Builder.CreateInsertElement(
+Vec, Hi, ConstantInt::get(IntegerType::get(Ctx, 32), 1));
+V = Builder.CreateBitOrPointerCast(Vec, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29d..8736c5bd9a9451 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,55 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:mov r0, #0
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 2d8cb14cf5fcde..3ce05f75fdcfe5 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -362,6 +362,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEX

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From 4205f3f94a8a991919569a02e549cb7d649c876b Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 25 +++--
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 52 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 15 ++
 .../X86/expand-atomic-non-integer.ll  | 14 +
 4 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index a75fa688d87a8d..490cc9aa4c3c80 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2060,9 +2060,28 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's T scalar type to I's <2 x T/2> vector type
+  if (I->getType()->getScalarType()->isIntOrPtrTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+TypeSize Size = Result->getType()->getPrimitiveSizeInBits();
+assert((unsigned)Size % 2 == 0);
+unsigned HalfSize = (unsigned)Size / 2;
+Value *Lo =
+Builder.CreateTrunc(Result, IntegerType::get(Ctx, HalfSize));
+Value *RS = Builder.CreateLShr(
+Result, ConstantInt::get(IntegerType::get(Ctx, Size), HalfSize));
+Value *Hi = Builder.CreateTrunc(RS, IntegerType::get(Ctx, HalfSize));
+Value *Vec = Builder.CreateInsertElement(
+VectorType::get(IntegerType::get(Ctx, HalfSize),
+cast(I->getType())->getElementCount()),
+Lo, ConstantInt::get(IntegerType::get(Ctx, 32), 0));
+Vec = Builder.CreateInsertElement(
+Vec, Hi, ConstantInt::get(IntegerType::get(Ctx, 32), 1));
+V = Builder.CreateBitOrPointerCast(Vec, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29d..8736c5bd9a9451 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,55 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:mov r0, #0
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 2d8cb14cf5fcde..3ce05f75fdcfe5 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -362,6 +362,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEX

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From a7fb95e250eac58c806161e7ffc666779606d217 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  5 +-
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 33 -
 llvm/lib/Target/X86/X86InstrCompiler.td   |  7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 72 +++
 llvm/test/CodeGen/X86/atomic-unordered.ll |  3 +-
 5 files changed, 113 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 893fca233d191b..8923d5596e7ac4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1048,6 +1048,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
@@ -1131,8 +1132,8 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   /// resulting wider type. It takes:
   ///   LdChain: list of chains for the load to be generated.
   ///   Ld:  load to widen
-  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain,
-  LoadSDNode *LD);
+  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain, MemSDNode *LD,
+  bool IsAtomic = false);
 
   /// Helper function to generate a set of extension loads to load a vector 
with
   /// a resulting wider type. It takes:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 980c157fe6f3f1..9ef4e0fc99620d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4521,6 +4521,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -5907,6 +5910,25 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SmallVector LdChain; // Chain for the series of load
+  SDValue Result = GenWidenVectorLoads(LdChain, N, /*IsAtomic=*/true);
+
+  if (Result) {
+// Build a factor node to remember the multiple loads are independent and
+// chain to that.
+SDValue NewChain = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, 
LdChain);
+
+// Modified the chain - switch anything that used the old chain to use
+// the new one.
+ReplaceValueWith(SDValue(N, 1), NewChain);
+
+return Result;
+  }
+
+  report_fatal_error("Unable to widen atomic vector load");
+}
+
 SDValue DAGTypeLegalizer::WidenVecRes_LOAD(SDNode *N) {
   LoadSDNode *LD = cast(N);
   ISD::LoadExtType ExtType = LD->getExtensionType();
@@ -7706,7 +7728,7 @@ static SDValue BuildVectorFromScalar(SelectionDAG& DAG, 
EVT VecTy,
 }
 
 SDValue DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl 
&LdChain,
-  LoadSDNode *LD) {
+  MemSDNode *LD, bool IsAtomic) {
   // The strategy assumes that we can efficiently load power-of-two widths.
   // The routine chops the vector into the largest vector loads with the same
   // element type or scalar loads and then recombines it to the widen vector
@@ -7763,8 +7785,13 @@ SDValue 
DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl &LdChain,
 } while (TypeSize::isKnownGT(RemainingWidth, NewVTWidth));
   }
 
-  SDValue LdOp = DAG.getLoad(*FirstVT, dl, Chain, BasePtr, 
LD->getPointerInfo(),
- LD->getOriginalAlign(), MMOFlags, AAInfo);
+  SDValue LdOp;
+  if (IsAtomic)
+LdOp = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, *FirstVT, *FirstVT, Chain,
+ BasePtr, LD->getMemOperand());
+  else
+LdOp = DAG.getLoad(*FirstVT, dl, Chain, BasePtr, LD->getPointerInfo(),
+   LD->getOriginalAlign()

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From a7fb95e250eac58c806161e7ffc666779606d217 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  5 +-
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 33 -
 llvm/lib/Target/X86/X86InstrCompiler.td   |  7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 72 +++
 llvm/test/CodeGen/X86/atomic-unordered.ll |  3 +-
 5 files changed, 113 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 893fca233d191b..8923d5596e7ac4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1048,6 +1048,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
@@ -1131,8 +1132,8 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   /// resulting wider type. It takes:
   ///   LdChain: list of chains for the load to be generated.
   ///   Ld:  load to widen
-  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain,
-  LoadSDNode *LD);
+  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain, MemSDNode *LD,
+  bool IsAtomic = false);
 
   /// Helper function to generate a set of extension loads to load a vector 
with
   /// a resulting wider type. It takes:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 980c157fe6f3f1..9ef4e0fc99620d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4521,6 +4521,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -5907,6 +5910,25 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SmallVector LdChain; // Chain for the series of load
+  SDValue Result = GenWidenVectorLoads(LdChain, N, /*IsAtomic=*/true);
+
+  if (Result) {
+// Build a factor node to remember the multiple loads are independent and
+// chain to that.
+SDValue NewChain = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, 
LdChain);
+
+// Modified the chain - switch anything that used the old chain to use
+// the new one.
+ReplaceValueWith(SDValue(N, 1), NewChain);
+
+return Result;
+  }
+
+  report_fatal_error("Unable to widen atomic vector load");
+}
+
 SDValue DAGTypeLegalizer::WidenVecRes_LOAD(SDNode *N) {
   LoadSDNode *LD = cast(N);
   ISD::LoadExtType ExtType = LD->getExtensionType();
@@ -7706,7 +7728,7 @@ static SDValue BuildVectorFromScalar(SelectionDAG& DAG, 
EVT VecTy,
 }
 
 SDValue DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl 
&LdChain,
-  LoadSDNode *LD) {
+  MemSDNode *LD, bool IsAtomic) {
   // The strategy assumes that we can efficiently load power-of-two widths.
   // The routine chops the vector into the largest vector loads with the same
   // element type or scalar loads and then recombines it to the widen vector
@@ -7763,8 +7785,13 @@ SDValue 
DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl &LdChain,
 } while (TypeSize::isKnownGT(RemainingWidth, NewVTWidth));
   }
 
-  SDValue LdOp = DAG.getLoad(*FirstVT, dl, Chain, BasePtr, 
LD->getPointerInfo(),
- LD->getOriginalAlign(), MMOFlags, AAInfo);
+  SDValue LdOp;
+  if (IsAtomic)
+LdOp = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, *FirstVT, *FirstVT, Chain,
+ BasePtr, LD->getMemOperand());
+  else
+LdOp = DAG.getLoad(*FirstVT, dl, Chain, BasePtr, LD->getPointerInfo(),
+   LD->getOriginalAlign()

[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385

>From ff979b4758b6d7b3120bd656726d273414c44170 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:37:17 -0500
Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load

`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  15 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +-
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index f13f70e66cfaa6..893fca233d191b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -863,6 +863,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
   SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
+  SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N);
   SDValue ScalarizeVecRes_VSELECT(SDNode *N);
   SDValue ScalarizeVecRes_SELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index f39d9ca15496a9..980c157fe6f3f1 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, 
unsigned ResNo) {
 R = ScalarizeVecRes_UnaryOpWithExtraInput(N);
 break;
   case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+R = ScalarizeVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:   R = 
ScalarizeVecRes_LOAD(cast(N));break;
   case ISD::SCALAR_TO_VECTOR:  R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break;
   case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break;
@@ -457,6 +460,18 @@ SDValue 
DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) {
   return Op;
 }
 
+SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SDValue Result = DAG.getAtomic(
+  ISD::ATOMIC_LOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(),
+  N->getValueType(0).getVectorElementType(), N->getChain(), 
N->getBasePtr(),
+  N->getMemOperand());
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
+  return Result;
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) {
   assert(N->isUnindexed() && "Indexed vector load?");
 
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 5bce4401f7bdb0..d23cfb89f9fc87 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s --check-prefixes=CHECK,CHECK0
 
 define void @test1(ptr %ptr, i32 %val1) {
 ; CHECK-LABEL: test1:
@@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) {
   %val = load atomic i32, ptr %ptr seq_cst, align 4
   ret i32 %val
 }
+
+define <1 x i32> @atomic_vec1_i32(ptr %x) {
+; CHECK-LABEL: atomic_vec1_i32:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x i32>, ptr %x acquire, align 4
+  ret <1 x i32> %ret
+}
+
+define <1 x i8> @atomic_vec1_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzbl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movb (%rdi), %al
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i8>, ptr %x acquire, align 1
+  ret <1 x i8> %ret
+}
+
+define <1 x i16> @atomic_vec1_i16(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i16>, ptr %x acquire, align 2
+  ret <1 x i16> %ret
+}
+
+define <1 x i32> @atomic_vec1_i8_zext(ptr %x) {
+; CHECK3-LABEL: atomic_

[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120386

>From 4698589155b60abda2caefd8664599850d084d6e Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:38:23 -0500
Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
---
 llvm/lib/Target/X86/X86ISelLowering.cpp|  4 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index a956074e50d86f..b681b80858de30 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2639,6 +2639,10 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
 setOperationAction(Op, MVT::f32, Promote);
   }
 
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64);
+
   // We have target-specific dag combine patterns for the following nodes:
   setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::SCALAR_TO_VECTOR,
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index d23cfb89f9fc87..6efcbb80c0ce6d 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   %ret = load atomic <1 x i64>, ptr %x acquire, align 8
   ret <1 x i64> %ret
 }
+
+define <1 x half> @atomic_vec1_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %cx
+; CHECK0-NEXT:## implicit-def: $eax
+; CHECK0-NEXT:movw %cx, %ax
+; CHECK0-NEXT:## implicit-def: $xmm0
+; CHECK0-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x half>, ptr %x acquire, align 2
+  ret <1 x half> %ret
+}
+
+define <1 x float> @atomic_vec1_float(ptr %x) {
+; CHECK-LABEL: atomic_vec1_float:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x float>, ptr %x acquire, align 4
+  ret <1 x float> %ret
+}
+
+define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec1_double_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 8
+  ret <1 x double> %ret
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SelectionDAG][X86] Split via Concat vector types for atomic load (PR #120640)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From 1ac2df790916fb43a60f38ffbaa4c9b8c2f63771 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG][X86] Split via Concat  vector types for
 atomic load

Vector types that aren't widened are 'split' via CONCAT_VECTORS
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |  4 +-
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 31 
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 24 -
 .../SelectionDAGAddressAnalysis.cpp   | 30 ++-
 .../SelectionDAG/SelectionDAGBuilder.cpp  |  6 ++-
 llvm/lib/Target/X86/X86ISelLowering.cpp   | 50 ---
 llvm/test/CodeGen/X86/atomic-load-store.ll| 36 +
 8 files changed, 147 insertions(+), 35 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba0538f7084eec..648ad14a67cb6c 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1837,7 +1837,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2263,6 +2263,8 @@ class SelectionDAG {
   /// location that the 'Base' load is loading from.
   bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
   unsigned Bytes, int Dist) const;
+  bool areNonVolatileConsecutiveLoads(AtomicSDNode *LD, AtomicSDNode *Base,
+  unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
   /// cannot be inferred.
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 8923d5596e7ac4..59ba987b1b8e92 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -948,6 +948,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 9ef4e0fc99620d..b786c3a26902eb 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1152,6 +1152,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1395,6 +1398,34 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  SDLoc dl(LD);
+
+  EVT MemoryVT = LD->getMemoryVT();
+  unsigned NumElts = MemoryVT.getVectorMinNumElements();
+
+  EVT IntMemoryVT = EVT::getVectorVT(*DAG.getContext(), MVT::i16, NumElts);
+  EVT ElemVT = EVT::getVectorVT(*DAG.getContext(),
+MemoryVT.getVectorElementType(), 1);
+
+  // Create a single atomic to load all the elements at once.
+  SDValue Atomic = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, IntMemoryVT, 
IntMemoryVT,
+ LD->getChain(), LD->getBasePtr(),
+ LD->getMemOperand());
+
+  // Instead of splitting, put all the elements back into a vector.
+  SmallVector Ops;
+  for (unsigned i = 0; i < NumElts; ++i) {
+SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i16, Atomic,
+  DAG.getVectorIdxConstant(i, dl));
+Elt = DAG.getBitcast(ElemV

[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385

>From ff979b4758b6d7b3120bd656726d273414c44170 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:37:17 -0500
Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load

`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  15 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +-
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index f13f70e66cfaa6..893fca233d191b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -863,6 +863,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
   SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
+  SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N);
   SDValue ScalarizeVecRes_VSELECT(SDNode *N);
   SDValue ScalarizeVecRes_SELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index f39d9ca15496a9..980c157fe6f3f1 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, 
unsigned ResNo) {
 R = ScalarizeVecRes_UnaryOpWithExtraInput(N);
 break;
   case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+R = ScalarizeVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:   R = 
ScalarizeVecRes_LOAD(cast(N));break;
   case ISD::SCALAR_TO_VECTOR:  R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break;
   case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break;
@@ -457,6 +460,18 @@ SDValue 
DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) {
   return Op;
 }
 
+SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SDValue Result = DAG.getAtomic(
+  ISD::ATOMIC_LOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(),
+  N->getValueType(0).getVectorElementType(), N->getChain(), 
N->getBasePtr(),
+  N->getMemOperand());
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
+  return Result;
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) {
   assert(N->isUnindexed() && "Indexed vector load?");
 
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 5bce4401f7bdb0..d23cfb89f9fc87 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s --check-prefixes=CHECK,CHECK0
 
 define void @test1(ptr %ptr, i32 %val1) {
 ; CHECK-LABEL: test1:
@@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) {
   %val = load atomic i32, ptr %ptr seq_cst, align 4
   ret i32 %val
 }
+
+define <1 x i32> @atomic_vec1_i32(ptr %x) {
+; CHECK-LABEL: atomic_vec1_i32:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x i32>, ptr %x acquire, align 4
+  ret <1 x i32> %ret
+}
+
+define <1 x i8> @atomic_vec1_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzbl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movb (%rdi), %al
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i8>, ptr %x acquire, align 1
+  ret <1 x i8> %ret
+}
+
+define <1 x i16> @atomic_vec1_i16(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i16>, ptr %x acquire, align 2
+  ret <1 x i16> %ret
+}
+
+define <1 x i32> @atomic_vec1_i8_zext(ptr %x) {
+; CHECK3-LABEL: atomic_

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From 4205f3f94a8a991919569a02e549cb7d649c876b Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 25 +++--
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 52 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 15 ++
 .../X86/expand-atomic-non-integer.ll  | 14 +
 4 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index a75fa688d87a8d..490cc9aa4c3c80 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2060,9 +2060,28 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's T scalar type to I's <2 x T/2> vector type
+  if (I->getType()->getScalarType()->isIntOrPtrTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+TypeSize Size = Result->getType()->getPrimitiveSizeInBits();
+assert((unsigned)Size % 2 == 0);
+unsigned HalfSize = (unsigned)Size / 2;
+Value *Lo =
+Builder.CreateTrunc(Result, IntegerType::get(Ctx, HalfSize));
+Value *RS = Builder.CreateLShr(
+Result, ConstantInt::get(IntegerType::get(Ctx, Size), HalfSize));
+Value *Hi = Builder.CreateTrunc(RS, IntegerType::get(Ctx, HalfSize));
+Value *Vec = Builder.CreateInsertElement(
+VectorType::get(IntegerType::get(Ctx, HalfSize),
+cast(I->getType())->getElementCount()),
+Lo, ConstantInt::get(IntegerType::get(Ctx, 32), 0));
+Vec = Builder.CreateInsertElement(
+Vec, Hi, ConstantInt::get(IntegerType::get(Ctx, 32), 1));
+V = Builder.CreateBitOrPointerCast(Vec, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29d..8736c5bd9a9451 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,55 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:mov r0, #0
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 2d8cb14cf5fcde..3ce05f75fdcfe5 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -362,6 +362,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEX

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Split via Concat vector types for atomic load (PR #120640)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From 1ac2df790916fb43a60f38ffbaa4c9b8c2f63771 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG][X86] Split via Concat  vector types for
 atomic load

Vector types that aren't widened are 'split' via CONCAT_VECTORS
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |  4 +-
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 31 
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 24 -
 .../SelectionDAGAddressAnalysis.cpp   | 30 ++-
 .../SelectionDAG/SelectionDAGBuilder.cpp  |  6 ++-
 llvm/lib/Target/X86/X86ISelLowering.cpp   | 50 ---
 llvm/test/CodeGen/X86/atomic-load-store.ll| 36 +
 8 files changed, 147 insertions(+), 35 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba0538f7084eec..648ad14a67cb6c 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1837,7 +1837,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2263,6 +2263,8 @@ class SelectionDAG {
   /// location that the 'Base' load is loading from.
   bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
   unsigned Bytes, int Dist) const;
+  bool areNonVolatileConsecutiveLoads(AtomicSDNode *LD, AtomicSDNode *Base,
+  unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
   /// cannot be inferred.
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 8923d5596e7ac4..59ba987b1b8e92 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -948,6 +948,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 9ef4e0fc99620d..b786c3a26902eb 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1152,6 +1152,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1395,6 +1398,34 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  SDLoc dl(LD);
+
+  EVT MemoryVT = LD->getMemoryVT();
+  unsigned NumElts = MemoryVT.getVectorMinNumElements();
+
+  EVT IntMemoryVT = EVT::getVectorVT(*DAG.getContext(), MVT::i16, NumElts);
+  EVT ElemVT = EVT::getVectorVT(*DAG.getContext(),
+MemoryVT.getVectorElementType(), 1);
+
+  // Create a single atomic to load all the elements at once.
+  SDValue Atomic = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, IntMemoryVT, 
IntMemoryVT,
+ LD->getChain(), LD->getBasePtr(),
+ LD->getMemOperand());
+
+  // Instead of splitting, put all the elements back into a vector.
+  SmallVector Ops;
+  for (unsigned i = 0; i < NumElts; ++i) {
+SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i16, Atomic,
+  DAG.getVectorIdxConstant(i, dl));
+Elt = DAG.getBitcast(ElemV

[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120386

>From 4698589155b60abda2caefd8664599850d084d6e Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:38:23 -0500
Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
---
 llvm/lib/Target/X86/X86ISelLowering.cpp|  4 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index a956074e50d86f..b681b80858de30 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2639,6 +2639,10 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
 setOperationAction(Op, MVT::f32, Promote);
   }
 
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64);
+
   // We have target-specific dag combine patterns for the following nodes:
   setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::SCALAR_TO_VECTOR,
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index d23cfb89f9fc87..6efcbb80c0ce6d 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   %ret = load atomic <1 x i64>, ptr %x acquire, align 8
   ret <1 x i64> %ret
 }
+
+define <1 x half> @atomic_vec1_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %cx
+; CHECK0-NEXT:## implicit-def: $eax
+; CHECK0-NEXT:movw %cx, %ax
+; CHECK0-NEXT:## implicit-def: $xmm0
+; CHECK0-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x half>, ptr %x acquire, align 2
+  ret <1 x half> %ret
+}
+
+define <1 x float> @atomic_vec1_float(ptr %x) {
+; CHECK-LABEL: atomic_vec1_float:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x float>, ptr %x acquire, align 4
+  ret <1 x float> %ret
+}
+
+define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec1_double_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 8
+  ret <1 x double> %ret
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-21 Thread via llvm-branch-commits


@@ -5901,6 +5904,30 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SmallVector LdChain; // Chain for the series of load
+  SDValue Result = GenWidenVectorLoads(LdChain, N, true /*IsAtomic*/);
+
+  if (Result) {
+// If we generate a single load, we can use that for the chain.  Otherwise,
+// build a factor node to remember the multiple loads are independent and
+// chain to that.
+SDValue NewChain;
+if (LdChain.size() == 1)
+  NewChain = LdChain[0];
+else
+  NewChain = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, LdChain);

jofrn wrote:

Neat. It looks like it works. Thank you.

https://github.com/llvm/llvm-project/pull/120598
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang][CodeComplete] Use HeuristicResolver to resolve DependentNameTypes (PR #123818)

2025-01-21 Thread Nathan Ridge via llvm-branch-commits

https://github.com/HighCommander4 edited 
https://github.com/llvm/llvm-project/pull/123818
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From 5254725d2f49042d87194074e23c29d2d2dc3525 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  5 +-
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 38 +-
 llvm/lib/Target/X86/X86InstrCompiler.td   |  7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 72 +++
 llvm/test/CodeGen/X86/atomic-unordered.ll |  3 +-
 5 files changed, 118 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 893fca233d191b..8923d5596e7ac4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1048,6 +1048,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
@@ -1131,8 +1132,8 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   /// resulting wider type. It takes:
   ///   LdChain: list of chains for the load to be generated.
   ///   Ld:  load to widen
-  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain,
-  LoadSDNode *LD);
+  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain, MemSDNode *LD,
+  bool IsAtomic = false);
 
   /// Helper function to generate a set of extension loads to load a vector 
with
   /// a resulting wider type. It takes:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 980c157fe6f3f1..d412d8eafda4e5 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4521,6 +4521,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -5907,6 +5910,30 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SmallVector LdChain; // Chain for the series of load
+  SDValue Result = GenWidenVectorLoads(LdChain, N, /*IsAtomic=*/true);
+
+  if (Result) {
+// If we generate a single load, we can use that for the chain.  Otherwise,
+// build a factor node to remember the multiple loads are independent and
+// chain to that.
+SDValue NewChain;
+if (LdChain.size() == 1)
+  NewChain = LdChain[0];
+else
+  NewChain = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, LdChain);
+
+// Modified the chain - switch anything that used the old chain to use
+// the new one.
+ReplaceValueWith(SDValue(N, 1), NewChain);
+
+return Result;
+  }
+
+  report_fatal_error("Unable to widen atomic vector load");
+}
+
 SDValue DAGTypeLegalizer::WidenVecRes_LOAD(SDNode *N) {
   LoadSDNode *LD = cast(N);
   ISD::LoadExtType ExtType = LD->getExtensionType();
@@ -7706,7 +7733,7 @@ static SDValue BuildVectorFromScalar(SelectionDAG& DAG, 
EVT VecTy,
 }
 
 SDValue DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl 
&LdChain,
-  LoadSDNode *LD) {
+  MemSDNode *LD, bool IsAtomic) {
   // The strategy assumes that we can efficiently load power-of-two widths.
   // The routine chops the vector into the largest vector loads with the same
   // element type or scalar loads and then recombines it to the widen vector
@@ -7763,8 +7790,13 @@ SDValue 
DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl &LdChain,
 } while (TypeSize::isKnownGT(RemainingWidth, NewVTWidth));
   }
 
-  SDValue LdOp = DAG.getLoad(*FirstVT, dl, Chain, BasePtr, 
LD->getPointerInfo(),
- LD->getOriginalAlign(), MMOFlags, AAInfo);
+  SDValue LdOp;
+  if (IsAtomic)
+LdOp = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, *FirstVT, *FirstVT, Chain,
+  

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From 9fde3b59ab60cf55ad324b4348aa60bde6ae5706 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 25 +++--
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 52 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 15 ++
 .../X86/expand-atomic-non-integer.ll  | 14 +
 4 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index a75fa688d87a8d..490cc9aa4c3c80 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2060,9 +2060,28 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's T scalar type to I's <2 x T/2> vector type
+  if (I->getType()->getScalarType()->isIntOrPtrTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+TypeSize Size = Result->getType()->getPrimitiveSizeInBits();
+assert((unsigned)Size % 2 == 0);
+unsigned HalfSize = (unsigned)Size / 2;
+Value *Lo =
+Builder.CreateTrunc(Result, IntegerType::get(Ctx, HalfSize));
+Value *RS = Builder.CreateLShr(
+Result, ConstantInt::get(IntegerType::get(Ctx, Size), HalfSize));
+Value *Hi = Builder.CreateTrunc(RS, IntegerType::get(Ctx, HalfSize));
+Value *Vec = Builder.CreateInsertElement(
+VectorType::get(IntegerType::get(Ctx, HalfSize),
+cast(I->getType())->getElementCount()),
+Lo, ConstantInt::get(IntegerType::get(Ctx, 32), 0));
+Vec = Builder.CreateInsertElement(
+Vec, Hi, ConstantInt::get(IntegerType::get(Ctx, 32), 1));
+V = Builder.CreateBitOrPointerCast(Vec, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29d..8736c5bd9a9451 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,55 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:mov r0, #0
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 2d8cb14cf5fcde..3ce05f75fdcfe5 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -362,6 +362,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEX

[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120386

>From 846ab2e6caad370afa04fe217d18d55d524f3799 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:38:23 -0500
Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
---
 llvm/lib/Target/X86/X86ISelLowering.cpp|  4 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index a956074e50d86f..b681b80858de30 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2639,6 +2639,10 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
 setOperationAction(Op, MVT::f32, Promote);
   }
 
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64);
+
   // We have target-specific dag combine patterns for the following nodes:
   setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::SCALAR_TO_VECTOR,
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index d23cfb89f9fc87..6efcbb80c0ce6d 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   %ret = load atomic <1 x i64>, ptr %x acquire, align 8
   ret <1 x i64> %ret
 }
+
+define <1 x half> @atomic_vec1_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %cx
+; CHECK0-NEXT:## implicit-def: $eax
+; CHECK0-NEXT:movw %cx, %ax
+; CHECK0-NEXT:## implicit-def: $xmm0
+; CHECK0-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x half>, ptr %x acquire, align 2
+  ret <1 x half> %ret
+}
+
+define <1 x float> @atomic_vec1_float(ptr %x) {
+; CHECK-LABEL: atomic_vec1_float:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x float>, ptr %x acquire, align 4
+  ret <1 x float> %ret
+}
+
+define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec1_double_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 8
+  ret <1 x double> %ret
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120387

>From 647a59b75785b5b6b930687f26397cd06c44f93b Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:40:32 -0500
Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes.

Unaligned atomic vectors with size >1 are lowered to calls.
Adding their tests separately here.

commit-id:a06a5cc6
---
 llvm/test/CodeGen/X86/atomic-load-store.ll | 253 +
 1 file changed, 253 insertions(+)

diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 6efcbb80c0ce6d..39e9fdfa5e62b0 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_ptr:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_ptr:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
+
 define <1 x half> @atomic_vec1_half(ptr %x) {
 ; CHECK3-LABEL: atomic_vec1_half:
 ; CHECK3:   ## %bb.0:
@@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) 
nounwind {
   %ret = load atomic <1 x double>, ptr %x acquire, align 8
   ret <1 x double> %ret
 }
+
+define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_i64:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i64:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i64>, ptr %x acquire, align 4
+  ret <1 x i64> %ret
+}
+
+define <1 x double> @atomic_vec1_double(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_double:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_double:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 4
+  ret <1 x double> %ret
+}
+
+define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec2_i32:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i32:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i32>, ptr %x acquire, align 4
+  ret <2 x i32> %ret
+}
+
+define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_float_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From 5254725d2f49042d87194074e23c29d2d2dc3525 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  5 +-
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 38 +-
 llvm/lib/Target/X86/X86InstrCompiler.td   |  7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 72 +++
 llvm/test/CodeGen/X86/atomic-unordered.ll |  3 +-
 5 files changed, 118 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 893fca233d191b..8923d5596e7ac4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1048,6 +1048,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
@@ -1131,8 +1132,8 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   /// resulting wider type. It takes:
   ///   LdChain: list of chains for the load to be generated.
   ///   Ld:  load to widen
-  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain,
-  LoadSDNode *LD);
+  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain, MemSDNode *LD,
+  bool IsAtomic = false);
 
   /// Helper function to generate a set of extension loads to load a vector 
with
   /// a resulting wider type. It takes:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 980c157fe6f3f1..d412d8eafda4e5 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4521,6 +4521,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -5907,6 +5910,30 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SmallVector LdChain; // Chain for the series of load
+  SDValue Result = GenWidenVectorLoads(LdChain, N, /*IsAtomic=*/true);
+
+  if (Result) {
+// If we generate a single load, we can use that for the chain.  Otherwise,
+// build a factor node to remember the multiple loads are independent and
+// chain to that.
+SDValue NewChain;
+if (LdChain.size() == 1)
+  NewChain = LdChain[0];
+else
+  NewChain = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, LdChain);
+
+// Modified the chain - switch anything that used the old chain to use
+// the new one.
+ReplaceValueWith(SDValue(N, 1), NewChain);
+
+return Result;
+  }
+
+  report_fatal_error("Unable to widen atomic vector load");
+}
+
 SDValue DAGTypeLegalizer::WidenVecRes_LOAD(SDNode *N) {
   LoadSDNode *LD = cast(N);
   ISD::LoadExtType ExtType = LD->getExtensionType();
@@ -7706,7 +7733,7 @@ static SDValue BuildVectorFromScalar(SelectionDAG& DAG, 
EVT VecTy,
 }
 
 SDValue DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl 
&LdChain,
-  LoadSDNode *LD) {
+  MemSDNode *LD, bool IsAtomic) {
   // The strategy assumes that we can efficiently load power-of-two widths.
   // The routine chops the vector into the largest vector loads with the same
   // element type or scalar loads and then recombines it to the widen vector
@@ -7763,8 +7790,13 @@ SDValue 
DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl &LdChain,
 } while (TypeSize::isKnownGT(RemainingWidth, NewVTWidth));
   }
 
-  SDValue LdOp = DAG.getLoad(*FirstVT, dl, Chain, BasePtr, 
LD->getPointerInfo(),
- LD->getOriginalAlign(), MMOFlags, AAInfo);
+  SDValue LdOp;
+  if (IsAtomic)
+LdOp = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, *FirstVT, *FirstVT, Chain,
+  

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Split via Concat vector types for atomic load (PR #120640)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From b36e103034ad5cbb6cea31ce1148518d7ea136ae Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG][X86] Split via Concat  vector types for
 atomic load

Vector types that aren't widened are 'split' via CONCAT_VECTORS
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |  4 +-
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 31 
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 24 -
 .../SelectionDAGAddressAnalysis.cpp   | 30 ++-
 .../SelectionDAG/SelectionDAGBuilder.cpp  |  6 ++-
 llvm/lib/Target/X86/X86ISelLowering.cpp   | 50 ---
 llvm/test/CodeGen/X86/atomic-load-store.ll| 36 +
 8 files changed, 147 insertions(+), 35 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba0538f7084eec..648ad14a67cb6c 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1837,7 +1837,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2263,6 +2263,8 @@ class SelectionDAG {
   /// location that the 'Base' load is loading from.
   bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
   unsigned Bytes, int Dist) const;
+  bool areNonVolatileConsecutiveLoads(AtomicSDNode *LD, AtomicSDNode *Base,
+  unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
   /// cannot be inferred.
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 8923d5596e7ac4..59ba987b1b8e92 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -948,6 +948,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index d412d8eafda4e5..0fa318fcdca2b0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1152,6 +1152,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1395,6 +1398,34 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  SDLoc dl(LD);
+
+  EVT MemoryVT = LD->getMemoryVT();
+  unsigned NumElts = MemoryVT.getVectorMinNumElements();
+
+  EVT IntMemoryVT = EVT::getVectorVT(*DAG.getContext(), MVT::i16, NumElts);
+  EVT ElemVT = EVT::getVectorVT(*DAG.getContext(),
+MemoryVT.getVectorElementType(), 1);
+
+  // Create a single atomic to load all the elements at once.
+  SDValue Atomic = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, IntMemoryVT, 
IntMemoryVT,
+ LD->getChain(), LD->getBasePtr(),
+ LD->getMemOperand());
+
+  // Instead of splitting, put all the elements back into a vector.
+  SmallVector Ops;
+  for (unsigned i = 0; i < NumElts; ++i) {
+SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i16, Atomic,
+  DAG.getVectorIdxConstant(i, dl));
+Elt = DAG.getBitcast(ElemV

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From 9fde3b59ab60cf55ad324b4348aa60bde6ae5706 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 25 +++--
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 52 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 15 ++
 .../X86/expand-atomic-non-integer.ll  | 14 +
 4 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index a75fa688d87a8d..490cc9aa4c3c80 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2060,9 +2060,28 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's T scalar type to I's <2 x T/2> vector type
+  if (I->getType()->getScalarType()->isIntOrPtrTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+TypeSize Size = Result->getType()->getPrimitiveSizeInBits();
+assert((unsigned)Size % 2 == 0);
+unsigned HalfSize = (unsigned)Size / 2;
+Value *Lo =
+Builder.CreateTrunc(Result, IntegerType::get(Ctx, HalfSize));
+Value *RS = Builder.CreateLShr(
+Result, ConstantInt::get(IntegerType::get(Ctx, Size), HalfSize));
+Value *Hi = Builder.CreateTrunc(RS, IntegerType::get(Ctx, HalfSize));
+Value *Vec = Builder.CreateInsertElement(
+VectorType::get(IntegerType::get(Ctx, HalfSize),
+cast(I->getType())->getElementCount()),
+Lo, ConstantInt::get(IntegerType::get(Ctx, 32), 0));
+Vec = Builder.CreateInsertElement(
+Vec, Hi, ConstantInt::get(IntegerType::get(Ctx, 32), 1));
+V = Builder.CreateBitOrPointerCast(Vec, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29d..8736c5bd9a9451 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,55 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:mov r0, #0
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 2d8cb14cf5fcde..3ce05f75fdcfe5 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -362,6 +362,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEX

[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385

>From 11b950d871756b65d4c58e59b22a6af595f0cd0c Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:37:17 -0500
Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load

`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  15 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +-
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index f13f70e66cfaa6..893fca233d191b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -863,6 +863,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
   SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
+  SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N);
   SDValue ScalarizeVecRes_VSELECT(SDNode *N);
   SDValue ScalarizeVecRes_SELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index f39d9ca15496a9..980c157fe6f3f1 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, 
unsigned ResNo) {
 R = ScalarizeVecRes_UnaryOpWithExtraInput(N);
 break;
   case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+R = ScalarizeVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:   R = 
ScalarizeVecRes_LOAD(cast(N));break;
   case ISD::SCALAR_TO_VECTOR:  R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break;
   case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break;
@@ -457,6 +460,18 @@ SDValue 
DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) {
   return Op;
 }
 
+SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SDValue Result = DAG.getAtomic(
+  ISD::ATOMIC_LOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(),
+  N->getValueType(0).getVectorElementType(), N->getChain(), 
N->getBasePtr(),
+  N->getMemOperand());
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
+  return Result;
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) {
   assert(N->isUnindexed() && "Indexed vector load?");
 
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 5bce4401f7bdb0..d23cfb89f9fc87 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s --check-prefixes=CHECK,CHECK0
 
 define void @test1(ptr %ptr, i32 %val1) {
 ; CHECK-LABEL: test1:
@@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) {
   %val = load atomic i32, ptr %ptr seq_cst, align 4
   ret i32 %val
 }
+
+define <1 x i32> @atomic_vec1_i32(ptr %x) {
+; CHECK-LABEL: atomic_vec1_i32:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x i32>, ptr %x acquire, align 4
+  ret <1 x i32> %ret
+}
+
+define <1 x i8> @atomic_vec1_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzbl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movb (%rdi), %al
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i8>, ptr %x acquire, align 1
+  ret <1 x i8> %ret
+}
+
+define <1 x i16> @atomic_vec1_i16(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i16>, ptr %x acquire, align 2
+  ret <1 x i16> %ret
+}
+
+define <1 x i32> @atomic_vec1_i8_zext(ptr %x) {
+; CHECK3-LABEL: atomic_

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From 5254725d2f49042d87194074e23c29d2d2dc3525 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  5 +-
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 38 +-
 llvm/lib/Target/X86/X86InstrCompiler.td   |  7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 72 +++
 llvm/test/CodeGen/X86/atomic-unordered.ll |  3 +-
 5 files changed, 118 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 893fca233d191b..8923d5596e7ac4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1048,6 +1048,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
@@ -1131,8 +1132,8 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   /// resulting wider type. It takes:
   ///   LdChain: list of chains for the load to be generated.
   ///   Ld:  load to widen
-  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain,
-  LoadSDNode *LD);
+  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain, MemSDNode *LD,
+  bool IsAtomic = false);
 
   /// Helper function to generate a set of extension loads to load a vector 
with
   /// a resulting wider type. It takes:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 980c157fe6f3f1..d412d8eafda4e5 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4521,6 +4521,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -5907,6 +5910,30 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SmallVector LdChain; // Chain for the series of load
+  SDValue Result = GenWidenVectorLoads(LdChain, N, /*IsAtomic=*/true);
+
+  if (Result) {
+// If we generate a single load, we can use that for the chain.  Otherwise,
+// build a factor node to remember the multiple loads are independent and
+// chain to that.
+SDValue NewChain;
+if (LdChain.size() == 1)
+  NewChain = LdChain[0];
+else
+  NewChain = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, LdChain);
+
+// Modified the chain - switch anything that used the old chain to use
+// the new one.
+ReplaceValueWith(SDValue(N, 1), NewChain);
+
+return Result;
+  }
+
+  report_fatal_error("Unable to widen atomic vector load");
+}
+
 SDValue DAGTypeLegalizer::WidenVecRes_LOAD(SDNode *N) {
   LoadSDNode *LD = cast(N);
   ISD::LoadExtType ExtType = LD->getExtensionType();
@@ -7706,7 +7733,7 @@ static SDValue BuildVectorFromScalar(SelectionDAG& DAG, 
EVT VecTy,
 }
 
 SDValue DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl 
&LdChain,
-  LoadSDNode *LD) {
+  MemSDNode *LD, bool IsAtomic) {
   // The strategy assumes that we can efficiently load power-of-two widths.
   // The routine chops the vector into the largest vector loads with the same
   // element type or scalar loads and then recombines it to the widen vector
@@ -7763,8 +7790,13 @@ SDValue 
DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl &LdChain,
 } while (TypeSize::isKnownGT(RemainingWidth, NewVTWidth));
   }
 
-  SDValue LdOp = DAG.getLoad(*FirstVT, dl, Chain, BasePtr, 
LD->getPointerInfo(),
- LD->getOriginalAlign(), MMOFlags, AAInfo);
+  SDValue LdOp;
+  if (IsAtomic)
+LdOp = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, *FirstVT, *FirstVT, Chain,
+  

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Split via Concat vector types for atomic load (PR #120640)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From b36e103034ad5cbb6cea31ce1148518d7ea136ae Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG][X86] Split via Concat  vector types for
 atomic load

Vector types that aren't widened are 'split' via CONCAT_VECTORS
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |  4 +-
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 31 
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 24 -
 .../SelectionDAGAddressAnalysis.cpp   | 30 ++-
 .../SelectionDAG/SelectionDAGBuilder.cpp  |  6 ++-
 llvm/lib/Target/X86/X86ISelLowering.cpp   | 50 ---
 llvm/test/CodeGen/X86/atomic-load-store.ll| 36 +
 8 files changed, 147 insertions(+), 35 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba0538f7084eec..648ad14a67cb6c 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1837,7 +1837,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2263,6 +2263,8 @@ class SelectionDAG {
   /// location that the 'Base' load is loading from.
   bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
   unsigned Bytes, int Dist) const;
+  bool areNonVolatileConsecutiveLoads(AtomicSDNode *LD, AtomicSDNode *Base,
+  unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
   /// cannot be inferred.
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 8923d5596e7ac4..59ba987b1b8e92 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -948,6 +948,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index d412d8eafda4e5..0fa318fcdca2b0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1152,6 +1152,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1395,6 +1398,34 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  SDLoc dl(LD);
+
+  EVT MemoryVT = LD->getMemoryVT();
+  unsigned NumElts = MemoryVT.getVectorMinNumElements();
+
+  EVT IntMemoryVT = EVT::getVectorVT(*DAG.getContext(), MVT::i16, NumElts);
+  EVT ElemVT = EVT::getVectorVT(*DAG.getContext(),
+MemoryVT.getVectorElementType(), 1);
+
+  // Create a single atomic to load all the elements at once.
+  SDValue Atomic = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, IntMemoryVT, 
IntMemoryVT,
+ LD->getChain(), LD->getBasePtr(),
+ LD->getMemOperand());
+
+  // Instead of splitting, put all the elements back into a vector.
+  SmallVector Ops;
+  for (unsigned i = 0; i < NumElts; ++i) {
+SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i16, Atomic,
+  DAG.getVectorIdxConstant(i, dl));
+Elt = DAG.getBitcast(ElemV

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From 9fde3b59ab60cf55ad324b4348aa60bde6ae5706 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 25 +++--
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 52 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 15 ++
 .../X86/expand-atomic-non-integer.ll  | 14 +
 4 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index a75fa688d87a8d..490cc9aa4c3c80 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2060,9 +2060,28 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's T scalar type to I's <2 x T/2> vector type
+  if (I->getType()->getScalarType()->isIntOrPtrTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+TypeSize Size = Result->getType()->getPrimitiveSizeInBits();
+assert((unsigned)Size % 2 == 0);
+unsigned HalfSize = (unsigned)Size / 2;
+Value *Lo =
+Builder.CreateTrunc(Result, IntegerType::get(Ctx, HalfSize));
+Value *RS = Builder.CreateLShr(
+Result, ConstantInt::get(IntegerType::get(Ctx, Size), HalfSize));
+Value *Hi = Builder.CreateTrunc(RS, IntegerType::get(Ctx, HalfSize));
+Value *Vec = Builder.CreateInsertElement(
+VectorType::get(IntegerType::get(Ctx, HalfSize),
+cast(I->getType())->getElementCount()),
+Lo, ConstantInt::get(IntegerType::get(Ctx, 32), 0));
+Vec = Builder.CreateInsertElement(
+Vec, Hi, ConstantInt::get(IntegerType::get(Ctx, 32), 1));
+V = Builder.CreateBitOrPointerCast(Vec, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29d..8736c5bd9a9451 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,55 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:mov r0, #0
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 2d8cb14cf5fcde..3ce05f75fdcfe5 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -362,6 +362,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEX

[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120386

>From 846ab2e6caad370afa04fe217d18d55d524f3799 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:38:23 -0500
Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
---
 llvm/lib/Target/X86/X86ISelLowering.cpp|  4 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index a956074e50d86f..b681b80858de30 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2639,6 +2639,10 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
 setOperationAction(Op, MVT::f32, Promote);
   }
 
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64);
+
   // We have target-specific dag combine patterns for the following nodes:
   setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::SCALAR_TO_VECTOR,
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index d23cfb89f9fc87..6efcbb80c0ce6d 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   %ret = load atomic <1 x i64>, ptr %x acquire, align 8
   ret <1 x i64> %ret
 }
+
+define <1 x half> @atomic_vec1_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %cx
+; CHECK0-NEXT:## implicit-def: $eax
+; CHECK0-NEXT:movw %cx, %ax
+; CHECK0-NEXT:## implicit-def: $xmm0
+; CHECK0-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x half>, ptr %x acquire, align 2
+  ret <1 x half> %ret
+}
+
+define <1 x float> @atomic_vec1_float(ptr %x) {
+; CHECK-LABEL: atomic_vec1_float:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x float>, ptr %x acquire, align 4
+  ret <1 x float> %ret
+}
+
+define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec1_double_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 8
+  ret <1 x double> %ret
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120387

>From 647a59b75785b5b6b930687f26397cd06c44f93b Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:40:32 -0500
Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes.

Unaligned atomic vectors with size >1 are lowered to calls.
Adding their tests separately here.

commit-id:a06a5cc6
---
 llvm/test/CodeGen/X86/atomic-load-store.ll | 253 +
 1 file changed, 253 insertions(+)

diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 6efcbb80c0ce6d..39e9fdfa5e62b0 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_ptr:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_ptr:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
+
 define <1 x half> @atomic_vec1_half(ptr %x) {
 ; CHECK3-LABEL: atomic_vec1_half:
 ; CHECK3:   ## %bb.0:
@@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) 
nounwind {
   %ret = load atomic <1 x double>, ptr %x acquire, align 8
   ret <1 x double> %ret
 }
+
+define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_i64:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i64:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i64>, ptr %x acquire, align 4
+  ret <1 x i64> %ret
+}
+
+define <1 x double> @atomic_vec1_double(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_double:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_double:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 4
+  ret <1 x double> %ret
+}
+
+define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec2_i32:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i32:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i32>, ptr %x acquire, align 4
+  ret <2 x i32> %ret
+}
+
+define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_float_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm

[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120387

>From 647a59b75785b5b6b930687f26397cd06c44f93b Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:40:32 -0500
Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes.

Unaligned atomic vectors with size >1 are lowered to calls.
Adding their tests separately here.

commit-id:a06a5cc6
---
 llvm/test/CodeGen/X86/atomic-load-store.ll | 253 +
 1 file changed, 253 insertions(+)

diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 6efcbb80c0ce6d..39e9fdfa5e62b0 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_ptr:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_ptr:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
+
 define <1 x half> @atomic_vec1_half(ptr %x) {
 ; CHECK3-LABEL: atomic_vec1_half:
 ; CHECK3:   ## %bb.0:
@@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) 
nounwind {
   %ret = load atomic <1 x double>, ptr %x acquire, align 8
   ret <1 x double> %ret
 }
+
+define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_i64:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i64:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i64>, ptr %x acquire, align 4
+  ret <1 x i64> %ret
+}
+
+define <1 x double> @atomic_vec1_double(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_double:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_double:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 4
+  ret <1 x double> %ret
+}
+
+define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec2_i32:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i32:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i32>, ptr %x acquire, align 4
+  ret <2 x i32> %ret
+}
+
+define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_float_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm

[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120386

>From 846ab2e6caad370afa04fe217d18d55d524f3799 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:38:23 -0500
Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
---
 llvm/lib/Target/X86/X86ISelLowering.cpp|  4 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index a956074e50d86f..b681b80858de30 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2639,6 +2639,10 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
 setOperationAction(Op, MVT::f32, Promote);
   }
 
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64);
+
   // We have target-specific dag combine patterns for the following nodes:
   setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::SCALAR_TO_VECTOR,
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index d23cfb89f9fc87..6efcbb80c0ce6d 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   %ret = load atomic <1 x i64>, ptr %x acquire, align 8
   ret <1 x i64> %ret
 }
+
+define <1 x half> @atomic_vec1_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %cx
+; CHECK0-NEXT:## implicit-def: $eax
+; CHECK0-NEXT:movw %cx, %ax
+; CHECK0-NEXT:## implicit-def: $xmm0
+; CHECK0-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x half>, ptr %x acquire, align 2
+  ret <1 x half> %ret
+}
+
+define <1 x float> @atomic_vec1_float(ptr %x) {
+; CHECK-LABEL: atomic_vec1_float:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x float>, ptr %x acquire, align 4
+  ret <1 x float> %ret
+}
+
+define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec1_double_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 8
+  ret <1 x double> %ret
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-21 Thread via llvm-branch-commits


@@ -146,6 +146,47 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <2 x i8> @atomic_vec2_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec2_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i8>, ptr %x acquire, align 4
+  ret <2 x i8> %ret
+}
+
+define <2 x i16> @atomic_vec2_i16(ptr %x) {
+; CHECK-LABEL: atomic_vec2_i16:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <2 x i16>, ptr %x acquire, align 4
+  ret <2 x i16> %ret
+}

jofrn wrote:

https://github.com/llvm/llvm-project/pull/120640/files contains <2,4 x half> 
and <2,4 x bloat>

https://github.com/llvm/llvm-project/pull/120598
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Introduce address space `hlsl_constant(2)` for constant buffer declarations (PR #123411)

2025-01-21 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota updated 
https://github.com/llvm/llvm-project/pull/123411

>From 6aba475e4af789fc03594560ad9937e3502cce51 Mon Sep 17 00:00:00 2001
From: Helena Kotas 
Date: Fri, 17 Jan 2025 13:31:01 -0800
Subject: [PATCH 1/2] [HLSL] Add address space `hlsl_constant(2)` for constant
 buffer declarations

---
 clang/include/clang/Basic/AddressSpaces.h |  1 +
 clang/lib/AST/TypePrinter.cpp |  2 +
 clang/lib/Basic/Targets/AArch64.h |  1 +
 clang/lib/Basic/Targets/AMDGPU.cpp|  1 +
 clang/lib/Basic/Targets/DirectX.h |  1 +
 clang/lib/Basic/Targets/NVPTX.h   |  1 +
 clang/lib/Basic/Targets/SPIR.h|  2 +
 clang/lib/Basic/Targets/SystemZ.h |  1 +
 clang/lib/Basic/Targets/TCE.h |  1 +
 clang/lib/Basic/Targets/WebAssembly.h |  1 +
 clang/lib/Basic/Targets/X86.h |  1 +
 clang/lib/CodeGen/CGHLSLRuntime.cpp   | 16 
 clang/lib/Sema/SemaHLSL.cpp   | 15 ++--
 .../ast-dump-comment-cbuffer-tbuffer.hlsl | 16 
 clang/test/AST/HLSL/cbuffer.hlsl  | 24 ++--
 clang/test/AST/HLSL/packoffset.hlsl   | 38 +--
 clang/test/AST/HLSL/pch_hlsl_buffer.hlsl  | 12 +++---
 .../test/AST/HLSL/resource_binding_attr.hlsl  |  8 ++--
 clang/test/CodeGenHLSL/cbuf.hlsl  | 13 +--
 clang/test/CodeGenHLSL/cbuf_in_namespace.hlsl |  8 +++-
 .../static_global_and_function_in_cb.hlsl | 14 ---
 21 files changed, 97 insertions(+), 80 deletions(-)

diff --git a/clang/include/clang/Basic/AddressSpaces.h 
b/clang/include/clang/Basic/AddressSpaces.h
index 7b723d508fff17..d18bfe54931f93 100644
--- a/clang/include/clang/Basic/AddressSpaces.h
+++ b/clang/include/clang/Basic/AddressSpaces.h
@@ -58,6 +58,7 @@ enum class LangAS : unsigned {
 
   // HLSL specific address spaces.
   hlsl_groupshared,
+  hlsl_constant,
 
   // Wasm specific address spaces.
   wasm_funcref,
diff --git a/clang/lib/AST/TypePrinter.cpp b/clang/lib/AST/TypePrinter.cpp
index a850410ffc8468..6cad74fef3fe33 100644
--- a/clang/lib/AST/TypePrinter.cpp
+++ b/clang/lib/AST/TypePrinter.cpp
@@ -2556,6 +2556,8 @@ std::string Qualifiers::getAddrSpaceAsString(LangAS AS) {
 return "__funcref";
   case LangAS::hlsl_groupshared:
 return "groupshared";
+  case LangAS::hlsl_constant:
+return "hlsl_constant";
   default:
 return std::to_string(toTargetAddressSpace(AS));
   }
diff --git a/clang/lib/Basic/Targets/AArch64.h 
b/clang/lib/Basic/Targets/AArch64.h
index ecf80b23a508c9..600940f5e4e23c 100644
--- a/clang/lib/Basic/Targets/AArch64.h
+++ b/clang/lib/Basic/Targets/AArch64.h
@@ -44,6 +44,7 @@ static const unsigned ARM64AddrSpaceMap[] = {
 static_cast(AArch64AddrSpace::ptr32_uptr),
 static_cast(AArch64AddrSpace::ptr64),
 0, // hlsl_groupshared
+0, // hlsl_constant
 // Wasm address space values for this target are dummy values,
 // as it is only enabled for Wasm targets.
 20, // wasm_funcref
diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp 
b/clang/lib/Basic/Targets/AMDGPU.cpp
index 99f8f2944e2796..824134d52ec139 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -83,6 +83,7 @@ const LangASMap AMDGPUTargetInfo::AMDGPUDefIsPrivMap = {
 llvm::AMDGPUAS::FLAT_ADDRESS, // ptr32_uptr
 llvm::AMDGPUAS::FLAT_ADDRESS, // ptr64
 llvm::AMDGPUAS::FLAT_ADDRESS, // hlsl_groupshared
+llvm::AMDGPUAS::FLAT_ADDRESS, // hlsl_constant
 
 };
 } // namespace targets
diff --git a/clang/lib/Basic/Targets/DirectX.h 
b/clang/lib/Basic/Targets/DirectX.h
index ab22d1281a4df7..4e6bc0e040398b 100644
--- a/clang/lib/Basic/Targets/DirectX.h
+++ b/clang/lib/Basic/Targets/DirectX.h
@@ -42,6 +42,7 @@ static const unsigned DirectXAddrSpaceMap[] = {
 0, // ptr32_uptr
 0, // ptr64
 3, // hlsl_groupshared
+2, // hlsl_constant
 // Wasm address space values for this target are dummy values,
 // as it is only enabled for Wasm targets.
 20, // wasm_funcref
diff --git a/clang/lib/Basic/Targets/NVPTX.h b/clang/lib/Basic/Targets/NVPTX.h
index d81b89a7f24ac0..c6531148fe30ce 100644
--- a/clang/lib/Basic/Targets/NVPTX.h
+++ b/clang/lib/Basic/Targets/NVPTX.h
@@ -46,6 +46,7 @@ static const unsigned NVPTXAddrSpaceMap[] = {
 0, // ptr32_uptr
 0, // ptr64
 0, // hlsl_groupshared
+0, // hlsl_constant
 // Wasm address space values for this target are dummy values,
 // as it is only enabled for Wasm targets.
 20, // wasm_funcref
diff --git a/clang/lib/Basic/Targets/SPIR.h b/clang/lib/Basic/Targets/SPIR.h
index 5a328b9ceeb1d1..c0849b69dcdb32 100644
--- a/clang/lib/Basic/Targets/SPIR.h
+++ b/clang/lib/Basic/Targets/SPIR.h
@@ -47,6 +47,7 @@ static const unsigned SPIRDefIsPrivMap[] = {
 0, // ptr32_uptr
 0, // ptr64
 0, // hlsl_groupshared
+2, // hlsl_constant
 // Wasm address space values for this target are dummy values,
 // as it 

[llvm-branch-commits] [clang] [HLSL] Introduce address space `hlsl_constant(2)` for constant buffer declarations (PR #123411)

2025-01-21 Thread Helena Kotas via llvm-branch-commits




hekota wrote:

I have added SPIR-V variants of the 3 codegen tests.

https://github.com/llvm/llvm-project/pull/123411
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Custom lower 32-bit element shuffles (PR #123711)

2025-01-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/123711

>From 4e1bdb492246adacd90c42b0590613e040268e1a Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sat, 18 Jan 2025 08:19:10 +0700
Subject: [PATCH] AMDGPU: Custom lower 32-bit element shuffles

This is so we can try to make use of v_pk_mov_b32 when available.
Note this currently has little observable effect. The combiner
will undo the common extract of shuffle pattern. The lack
of test changes should demonstrate this change is minimally
correct.

We should probably try to make better use of wider extracts in
even aligned cases, but I'm trying to avoid some really ugly
regalloc regressions in some MFMA tests. The DAG scheduler ends
up doing a worse job if we use vector extracts, resulting
in failure to do 3 address conversion of MFMAs.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 85 +--
 1 file changed, 80 insertions(+), 5 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 1aeca7f370aa1b..b632c50dae0e35 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -419,8 +419,9 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
   }
 
   setOperationAction(ISD::VECTOR_SHUFFLE,
- {MVT::v8i32, MVT::v8f32, MVT::v16i32, MVT::v16f32},
- Expand);
+ {MVT::v4i32, MVT::v4f32, MVT::v8i32, MVT::v8f32,
+  MVT::v16i32, MVT::v16f32, MVT::v32i32, MVT::v32f32},
+ Custom);
 
   if (Subtarget->hasPkMovB32()) {
 // TODO: 16-bit element vectors should be legal with even aligned elements.
@@ -7589,15 +7590,38 @@ static bool elementPairIsContiguous(ArrayRef Mask, 
int Elt) {
   return Mask[Elt + 1] == Mask[Elt] + 1 && (Mask[Elt] % 2 == 0);
 }
 
+static bool elementPairIsOddToEven(ArrayRef Mask, int Elt) {
+  assert(Elt % 2 == 0);
+  return Mask[Elt] >= 0 && Mask[Elt + 1] >= 0 && (Mask[Elt] & 1) &&
+ !(Mask[Elt + 1] & 1);
+}
+
 SDValue SITargetLowering::lowerVECTOR_SHUFFLE(SDValue Op,
   SelectionDAG &DAG) const {
   SDLoc SL(Op);
   EVT ResultVT = Op.getValueType();
   ShuffleVectorSDNode *SVN = cast(Op);
   MVT EltVT = ResultVT.getVectorElementType().getSimpleVT();
-  MVT PackVT = MVT::getVectorVT(EltVT, 2);
+  const int NewSrcNumElts = 2;
+  MVT PackVT = MVT::getVectorVT(EltVT, NewSrcNumElts);
   int SrcNumElts = Op.getOperand(0).getValueType().getVectorNumElements();
 
+  // Break up the shuffle into registers sized pieces.
+  //
+  // We're trying to form sub-shuffles that the register allocation pipeline
+  // won't be able to figure out, like how to use v_pk_mov_b32 to do a register
+  // blend or 16-bit op_sel. It should be able to figure out how to reassemble 
a
+  // pair of copies into a consecutive register copy, so use the ordinary
+  // extract_vector_elt lowering unless we can use the shuffle.
+  //
+  // TODO: This is a bit of hack, and we should probably always use
+  // extract_subvector for the largest possible subvector we can (or at least
+  // use it for PackVT aligned pieces). However we have worse support for
+  // combines on them don't directly treat extract_subvector / insert_subvector
+  // as legal. The DAG scheduler also ends up doing a worse job with the
+  // extract_subvectors.
+  const bool ShouldUseConsecutiveExtract = EltVT.getSizeInBits() == 16;
+
   // vector_shuffle <0,1,6,7> lhs, rhs
   // -> concat_vectors (extract_subvector lhs, 0), (extract_subvector rhs, 2)
   //
@@ -7608,9 +7632,18 @@ SDValue SITargetLowering::lowerVECTOR_SHUFFLE(SDValue Op,
   // -> concat_vectors (extract_subvector rhs, 2), (extract_subvector lhs, 0)
 
   // Avoid scalarizing when both halves are reading from consecutive elements.
-  SmallVector Pieces;
+
+  // If we're treating 2 element shuffles as legal, also create odd-to-even
+  // shuffles of neighboring pairs.
+  //
+  // vector_shuffle <3,2,7,6> lhs, rhs
+  //  -> concat_vectors vector_shuffle <1, 0> (extract_subvector lhs, 0)
+  //vector_shuffle <1, 0> (extract_subvector rhs, 2)
+
+  SmallVector Pieces;
   for (int I = 0, N = ResultVT.getVectorNumElements(); I != N; I += 2) {
-if (elementPairIsContiguous(SVN->getMask(), I)) {
+if (ShouldUseConsecutiveExtract &&
+elementPairIsContiguous(SVN->getMask(), I)) {
   const int Idx = SVN->getMaskElt(I);
   int VecIdx = Idx < SrcNumElts ? 0 : 1;
   int EltIdx = Idx < SrcNumElts ? Idx : Idx - SrcNumElts;
@@ -7618,6 +7651,48 @@ SDValue SITargetLowering::lowerVECTOR_SHUFFLE(SDValue Op,
SVN->getOperand(VecIdx),
DAG.getConstant(EltIdx, SL, MVT::i32));
   Pieces.push_back(SubVec);
+} else if (elementPairIsOddToEven(SVN->getMask(), I) &&
+   isOperationLegal(ISD::VECTOR_SHUFFLE, PackVT)) {
+  int I

[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120387

>From b0364eefcd22e368bab088222d052f71ef2b835e Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:40:32 -0500
Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes.

Unaligned atomic vectors with size >1 are lowered to calls.
Adding their tests separately here.

commit-id:a06a5cc6
---
 llvm/test/CodeGen/X86/atomic-load-store.ll | 253 +
 1 file changed, 253 insertions(+)

diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 6efcbb80c0ce6d..39e9fdfa5e62b0 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_ptr:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_ptr:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
+
 define <1 x half> @atomic_vec1_half(ptr %x) {
 ; CHECK3-LABEL: atomic_vec1_half:
 ; CHECK3:   ## %bb.0:
@@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) 
nounwind {
   %ret = load atomic <1 x double>, ptr %x acquire, align 8
   ret <1 x double> %ret
 }
+
+define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_i64:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i64:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i64>, ptr %x acquire, align 4
+  ret <1 x i64> %ret
+}
+
+define <1 x double> @atomic_vec1_double(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_double:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_double:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 4
+  ret <1 x double> %ret
+}
+
+define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec2_i32:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i32:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i32>, ptr %x acquire, align 4
+  ret <2 x i32> %ret
+}
+
+define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_float_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm

[llvm-branch-commits] [clang] [HLSL] Introduce address space `hlsl_constant(2)` for constant buffer declarations (PR #123411)

2025-01-21 Thread Helena Kotas via llvm-branch-commits


@@ -455,14 +456,22 @@ void createHostLayoutStructForBuffer(Sema &S, 
HLSLBufferDecl *BufDecl) {
   LS->setImplicit(true);
   LS->startDefinition();
 
-  for (const Decl *D : BufDecl->decls()) {
-const VarDecl *VD = dyn_cast(D);
+  for (Decl *D : BufDecl->decls()) {
+VarDecl *VD = dyn_cast(D);
 if (!VD || VD->getStorageClass() == SC_Static)
   continue;
 const Type *Ty = VD->getType()->getUnqualifiedDesugaredType();
 if (FieldDecl *FD = createFieldForHostLayoutStruct(
-S, Ty, VD->getIdentifier(), LS, BufDecl))
+S, Ty, VD->getIdentifier(), LS, BufDecl)) {
+  // add the field decl to the layout struct
   LS->addDecl(FD);
+  // update address space of the original decl to hlsl_constant
+  // and disable initialization
+  QualType NewTy =
+  AST.getAddrSpaceQualType(VD->getType(), LangAS::hlsl_constant);
+  VD->setType(NewTy);
+  VD->setInit(nullptr);

hekota wrote:

If we need to preserve the initializers in the AST even though they are not 
used, we could mark the generated global variable as `externally_initialized`. 
The global gets removed in the final DXIL after all.

https://github.com/llvm/llvm-project/pull/123411
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Avoid breaking legal vector_shuffle with multiple uses (PR #123712)

2025-01-21 Thread Philip Reames via llvm-branch-commits

https://github.com/preames approved this pull request.

LGTM

I suspect we'll want to refine the profitability here over the time, but this 
seems reasonable as a stepping stone.

https://github.com/llvm/llvm-project/pull/123712
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120387

>From b0364eefcd22e368bab088222d052f71ef2b835e Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:40:32 -0500
Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes.

Unaligned atomic vectors with size >1 are lowered to calls.
Adding their tests separately here.

commit-id:a06a5cc6
---
 llvm/test/CodeGen/X86/atomic-load-store.ll | 253 +
 1 file changed, 253 insertions(+)

diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 6efcbb80c0ce6d..39e9fdfa5e62b0 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_ptr:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_ptr:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
+
 define <1 x half> @atomic_vec1_half(ptr %x) {
 ; CHECK3-LABEL: atomic_vec1_half:
 ; CHECK3:   ## %bb.0:
@@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) 
nounwind {
   %ret = load atomic <1 x double>, ptr %x acquire, align 8
   ret <1 x double> %ret
 }
+
+define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_i64:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i64:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i64>, ptr %x acquire, align 4
+  ret <1 x i64> %ret
+}
+
+define <1 x double> @atomic_vec1_double(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_double:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_double:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 4
+  ret <1 x double> %ret
+}
+
+define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec2_i32:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i32:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i32>, ptr %x acquire, align 4
+  ret <2 x i32> %ret
+}
+
+define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_float_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm

[llvm-branch-commits] [clang] [HLSL] Introduce address space `hlsl_constant(2)` for constant buffer declarations (PR #123411)

2025-01-21 Thread Helena Kotas via llvm-branch-commits


@@ -83,6 +83,7 @@ const LangASMap AMDGPUTargetInfo::AMDGPUDefIsPrivMap = {
 llvm::AMDGPUAS::FLAT_ADDRESS, // ptr32_uptr
 llvm::AMDGPUAS::FLAT_ADDRESS, // ptr64
 llvm::AMDGPUAS::FLAT_ADDRESS, // hlsl_groupshared
+llvm::AMDGPUAS::FLAT_ADDRESS, // hlsl_constant

hekota wrote:

Interesting. Sounds like it is useful to use `static const unsigned[]` then 
instead of `LangASMap` since it gives you a compile error when the dimension 
does not match.

https://github.com/llvm/llvm-project/pull/123411
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Custom lower 32-bit element shuffles (PR #123711)

2025-01-21 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

Is there any way at all to test it?

https://github.com/llvm/llvm-project/pull/123711
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From a847ecfa366a17853e425a5fd94edb668396b446 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  5 +-
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 36 -
 llvm/lib/Target/X86/X86InstrCompiler.td   |  7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 81 +++
 llvm/test/CodeGen/X86/atomic-unordered.ll |  3 +-
 5 files changed, 124 insertions(+), 8 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 893fca233d191b..8923d5596e7ac4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1048,6 +1048,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
@@ -1131,8 +1132,8 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   /// resulting wider type. It takes:
   ///   LdChain: list of chains for the load to be generated.
   ///   Ld:  load to widen
-  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain,
-  LoadSDNode *LD);
+  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain, MemSDNode *LD,
+  bool IsAtomic = false);
 
   /// Helper function to generate a set of extension loads to load a vector 
with
   /// a resulting wider type. It takes:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 980c157fe6f3f1..6f324e7ba1aa58 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4521,6 +4521,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -5907,6 +5910,26 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SmallVector LdChain; // Chain for the series of load
+  SDValue Result = GenWidenVectorLoads(LdChain, N, /*IsAtomic=*/true);
+
+  if (Result) {
+// Build a factor node to remember the multiple loads are independent and
+// chain to that.
+SDValue NewChain =
+DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, LdChain);
+
+// Modified the chain - switch anything that used the old chain to use
+// the new one.
+ReplaceValueWith(SDValue(N, 1), NewChain);
+
+return Result;
+  }
+
+  report_fatal_error("Unable to widen atomic vector load");
+}
+
 SDValue DAGTypeLegalizer::WidenVecRes_LOAD(SDNode *N) {
   LoadSDNode *LD = cast(N);
   ISD::LoadExtType ExtType = LD->getExtensionType();
@@ -7706,7 +7729,7 @@ static SDValue BuildVectorFromScalar(SelectionDAG& DAG, 
EVT VecTy,
 }
 
 SDValue DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl 
&LdChain,
-  LoadSDNode *LD) {
+  MemSDNode *LD, bool IsAtomic) {
   // The strategy assumes that we can efficiently load power-of-two widths.
   // The routine chops the vector into the largest vector loads with the same
   // element type or scalar loads and then recombines it to the widen vector
@@ -7763,12 +7786,17 @@ SDValue 
DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl &LdChain,
 } while (TypeSize::isKnownGT(RemainingWidth, NewVTWidth));
   }
 
-  SDValue LdOp = DAG.getLoad(*FirstVT, dl, Chain, BasePtr, 
LD->getPointerInfo(),
- LD->getOriginalAlign(), MMOFlags, AAInfo);
+  SDValue LdOp;
+  if (IsAtomic)
+LdOp = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, *FirstVT, *FirstVT, Chain,
+ BasePtr, LD->getMemOperand());
+  else
+LdOp = DAG.getLoad(*FirstVT, dl, Chain, BasePtr, LD->getPointerInfo(),
+   LD->getOrigin

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From f4d372c660d9c5abe4e60aca91d455c5c376a25e Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 25 +++--
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 52 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 45 
 .../X86/expand-atomic-non-integer.ll  | 14 +
 4 files changed, 133 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index a75fa688d87a8d..490cc9aa4c3c80 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2060,9 +2060,28 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's T scalar type to I's <2 x T/2> vector type
+  if (I->getType()->getScalarType()->isIntOrPtrTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+TypeSize Size = Result->getType()->getPrimitiveSizeInBits();
+assert((unsigned)Size % 2 == 0);
+unsigned HalfSize = (unsigned)Size / 2;
+Value *Lo =
+Builder.CreateTrunc(Result, IntegerType::get(Ctx, HalfSize));
+Value *RS = Builder.CreateLShr(
+Result, ConstantInt::get(IntegerType::get(Ctx, Size), HalfSize));
+Value *Hi = Builder.CreateTrunc(RS, IntegerType::get(Ctx, HalfSize));
+Value *Vec = Builder.CreateInsertElement(
+VectorType::get(IntegerType::get(Ctx, HalfSize),
+cast(I->getType())->getElementCount()),
+Lo, ConstantInt::get(IntegerType::get(Ctx, 32), 0));
+Vec = Builder.CreateInsertElement(
+Vec, Hi, ConstantInt::get(IntegerType::get(Ctx, 32), 1));
+V = Builder.CreateBitOrPointerCast(Vec, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29d..8736c5bd9a9451 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,55 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:mov r0, #0
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 08d0405345f573..8384987ae3334b 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+;

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Split via Concat vector types for atomic load (PR #120640)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From a14a025133cc1e18bc1d554d3b0d9961f3696131 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG][X86] Split via Concat  vector types for
 atomic load

Vector types that aren't widened are 'split' via CONCAT_VECTORS
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |  4 +-
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 31 
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 24 -
 .../SelectionDAGAddressAnalysis.cpp   | 30 ++-
 .../SelectionDAG/SelectionDAGBuilder.cpp  |  6 ++-
 llvm/lib/Target/X86/X86ISelLowering.cpp   | 50 ---
 llvm/test/CodeGen/X86/atomic-load-store.ll| 36 +
 8 files changed, 147 insertions(+), 35 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba0538f7084eec..648ad14a67cb6c 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1837,7 +1837,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2263,6 +2263,8 @@ class SelectionDAG {
   /// location that the 'Base' load is loading from.
   bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
   unsigned Bytes, int Dist) const;
+  bool areNonVolatileConsecutiveLoads(AtomicSDNode *LD, AtomicSDNode *Base,
+  unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
   /// cannot be inferred.
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 8923d5596e7ac4..59ba987b1b8e92 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -948,6 +948,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 6f324e7ba1aa58..f6d816bb733ee0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1152,6 +1152,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1395,6 +1398,34 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  SDLoc dl(LD);
+
+  EVT MemoryVT = LD->getMemoryVT();
+  unsigned NumElts = MemoryVT.getVectorMinNumElements();
+
+  EVT IntMemoryVT = EVT::getVectorVT(*DAG.getContext(), MVT::i16, NumElts);
+  EVT ElemVT = EVT::getVectorVT(*DAG.getContext(),
+MemoryVT.getVectorElementType(), 1);
+
+  // Create a single atomic to load all the elements at once.
+  SDValue Atomic = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, IntMemoryVT, 
IntMemoryVT,
+ LD->getChain(), LD->getBasePtr(),
+ LD->getMemOperand());
+
+  // Instead of splitting, put all the elements back into a vector.
+  SmallVector Ops;
+  for (unsigned i = 0; i < NumElts; ++i) {
+SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i16, Atomic,
+  DAG.getVectorIdxConstant(i, dl));
+Elt = DAG.getBitcast(ElemV

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From a847ecfa366a17853e425a5fd94edb668396b446 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  5 +-
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 36 -
 llvm/lib/Target/X86/X86InstrCompiler.td   |  7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 81 +++
 llvm/test/CodeGen/X86/atomic-unordered.ll |  3 +-
 5 files changed, 124 insertions(+), 8 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 893fca233d191b..8923d5596e7ac4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1048,6 +1048,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
@@ -1131,8 +1132,8 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   /// resulting wider type. It takes:
   ///   LdChain: list of chains for the load to be generated.
   ///   Ld:  load to widen
-  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain,
-  LoadSDNode *LD);
+  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain, MemSDNode *LD,
+  bool IsAtomic = false);
 
   /// Helper function to generate a set of extension loads to load a vector 
with
   /// a resulting wider type. It takes:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 980c157fe6f3f1..6f324e7ba1aa58 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4521,6 +4521,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -5907,6 +5910,26 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SmallVector LdChain; // Chain for the series of load
+  SDValue Result = GenWidenVectorLoads(LdChain, N, /*IsAtomic=*/true);
+
+  if (Result) {
+// Build a factor node to remember the multiple loads are independent and
+// chain to that.
+SDValue NewChain =
+DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, LdChain);
+
+// Modified the chain - switch anything that used the old chain to use
+// the new one.
+ReplaceValueWith(SDValue(N, 1), NewChain);
+
+return Result;
+  }
+
+  report_fatal_error("Unable to widen atomic vector load");
+}
+
 SDValue DAGTypeLegalizer::WidenVecRes_LOAD(SDNode *N) {
   LoadSDNode *LD = cast(N);
   ISD::LoadExtType ExtType = LD->getExtensionType();
@@ -7706,7 +7729,7 @@ static SDValue BuildVectorFromScalar(SelectionDAG& DAG, 
EVT VecTy,
 }
 
 SDValue DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl 
&LdChain,
-  LoadSDNode *LD) {
+  MemSDNode *LD, bool IsAtomic) {
   // The strategy assumes that we can efficiently load power-of-two widths.
   // The routine chops the vector into the largest vector loads with the same
   // element type or scalar loads and then recombines it to the widen vector
@@ -7763,12 +7786,17 @@ SDValue 
DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl &LdChain,
 } while (TypeSize::isKnownGT(RemainingWidth, NewVTWidth));
   }
 
-  SDValue LdOp = DAG.getLoad(*FirstVT, dl, Chain, BasePtr, 
LD->getPointerInfo(),
- LD->getOriginalAlign(), MMOFlags, AAInfo);
+  SDValue LdOp;
+  if (IsAtomic)
+LdOp = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, *FirstVT, *FirstVT, Chain,
+ BasePtr, LD->getMemOperand());
+  else
+LdOp = DAG.getLoad(*FirstVT, dl, Chain, BasePtr, LD->getPointerInfo(),
+   LD->getOrigin

[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385

>From 02d1d2da5ed69cf3d577b11ad4563c5d1bfc2d22 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:37:17 -0500
Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load

`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  15 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +-
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index f13f70e66cfaa6..893fca233d191b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -863,6 +863,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
   SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
+  SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N);
   SDValue ScalarizeVecRes_VSELECT(SDNode *N);
   SDValue ScalarizeVecRes_SELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index f39d9ca15496a9..980c157fe6f3f1 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, 
unsigned ResNo) {
 R = ScalarizeVecRes_UnaryOpWithExtraInput(N);
 break;
   case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+R = ScalarizeVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:   R = 
ScalarizeVecRes_LOAD(cast(N));break;
   case ISD::SCALAR_TO_VECTOR:  R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break;
   case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break;
@@ -457,6 +460,18 @@ SDValue 
DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) {
   return Op;
 }
 
+SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SDValue Result = DAG.getAtomic(
+  ISD::ATOMIC_LOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(),
+  N->getValueType(0).getVectorElementType(), N->getChain(), 
N->getBasePtr(),
+  N->getMemOperand());
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
+  return Result;
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) {
   assert(N->isUnindexed() && "Indexed vector load?");
 
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 5bce4401f7bdb0..d23cfb89f9fc87 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s --check-prefixes=CHECK,CHECK0
 
 define void @test1(ptr %ptr, i32 %val1) {
 ; CHECK-LABEL: test1:
@@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) {
   %val = load atomic i32, ptr %ptr seq_cst, align 4
   ret i32 %val
 }
+
+define <1 x i32> @atomic_vec1_i32(ptr %x) {
+; CHECK-LABEL: atomic_vec1_i32:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x i32>, ptr %x acquire, align 4
+  ret <1 x i32> %ret
+}
+
+define <1 x i8> @atomic_vec1_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzbl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movb (%rdi), %al
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i8>, ptr %x acquire, align 1
+  ret <1 x i8> %ret
+}
+
+define <1 x i16> @atomic_vec1_i16(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i16>, ptr %x acquire, align 2
+  ret <1 x i16> %ret
+}
+
+define <1 x i32> @atomic_vec1_i8_zext(ptr %x) {
+; CHECK3-LABEL: atomic_

[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120387

>From 2af4d7c2efd07dc96e93017c3684f6aeb73be9e8 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:40:32 -0500
Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes.

Unaligned atomic vectors with size >1 are lowered to calls.
Adding their tests separately here.

commit-id:a06a5cc6
---
 llvm/test/CodeGen/X86/atomic-load-store.ll | 253 +
 1 file changed, 253 insertions(+)

diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 6efcbb80c0ce6d..39e9fdfa5e62b0 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_ptr:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_ptr:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
+
 define <1 x half> @atomic_vec1_half(ptr %x) {
 ; CHECK3-LABEL: atomic_vec1_half:
 ; CHECK3:   ## %bb.0:
@@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) 
nounwind {
   %ret = load atomic <1 x double>, ptr %x acquire, align 8
   ret <1 x double> %ret
 }
+
+define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_i64:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i64:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i64>, ptr %x acquire, align 4
+  ret <1 x i64> %ret
+}
+
+define <1 x double> @atomic_vec1_double(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_double:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_double:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 4
+  ret <1 x double> %ret
+}
+
+define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec2_i32:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i32:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i32>, ptr %x acquire, align 4
+  ret <2 x i32> %ret
+}
+
+define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_float_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Split via Concat vector types for atomic load (PR #120640)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From a14a025133cc1e18bc1d554d3b0d9961f3696131 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG][X86] Split via Concat  vector types for
 atomic load

Vector types that aren't widened are 'split' via CONCAT_VECTORS
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |  4 +-
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 31 
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 24 -
 .../SelectionDAGAddressAnalysis.cpp   | 30 ++-
 .../SelectionDAG/SelectionDAGBuilder.cpp  |  6 ++-
 llvm/lib/Target/X86/X86ISelLowering.cpp   | 50 ---
 llvm/test/CodeGen/X86/atomic-load-store.ll| 36 +
 8 files changed, 147 insertions(+), 35 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba0538f7084eec..648ad14a67cb6c 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1837,7 +1837,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2263,6 +2263,8 @@ class SelectionDAG {
   /// location that the 'Base' load is loading from.
   bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
   unsigned Bytes, int Dist) const;
+  bool areNonVolatileConsecutiveLoads(AtomicSDNode *LD, AtomicSDNode *Base,
+  unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
   /// cannot be inferred.
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 8923d5596e7ac4..59ba987b1b8e92 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -948,6 +948,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 6f324e7ba1aa58..f6d816bb733ee0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1152,6 +1152,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1395,6 +1398,34 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  SDLoc dl(LD);
+
+  EVT MemoryVT = LD->getMemoryVT();
+  unsigned NumElts = MemoryVT.getVectorMinNumElements();
+
+  EVT IntMemoryVT = EVT::getVectorVT(*DAG.getContext(), MVT::i16, NumElts);
+  EVT ElemVT = EVT::getVectorVT(*DAG.getContext(),
+MemoryVT.getVectorElementType(), 1);
+
+  // Create a single atomic to load all the elements at once.
+  SDValue Atomic = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, IntMemoryVT, 
IntMemoryVT,
+ LD->getChain(), LD->getBasePtr(),
+ LD->getMemOperand());
+
+  // Instead of splitting, put all the elements back into a vector.
+  SmallVector Ops;
+  for (unsigned i = 0; i < NumElts; ++i) {
+SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i16, Atomic,
+  DAG.getVectorIdxConstant(i, dl));
+Elt = DAG.getBitcast(ElemV

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From f4d372c660d9c5abe4e60aca91d455c5c376a25e Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 25 +++--
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 52 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 45 
 .../X86/expand-atomic-non-integer.ll  | 14 +
 4 files changed, 133 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index a75fa688d87a8d..490cc9aa4c3c80 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2060,9 +2060,28 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's T scalar type to I's <2 x T/2> vector type
+  if (I->getType()->getScalarType()->isIntOrPtrTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+TypeSize Size = Result->getType()->getPrimitiveSizeInBits();
+assert((unsigned)Size % 2 == 0);
+unsigned HalfSize = (unsigned)Size / 2;
+Value *Lo =
+Builder.CreateTrunc(Result, IntegerType::get(Ctx, HalfSize));
+Value *RS = Builder.CreateLShr(
+Result, ConstantInt::get(IntegerType::get(Ctx, Size), HalfSize));
+Value *Hi = Builder.CreateTrunc(RS, IntegerType::get(Ctx, HalfSize));
+Value *Vec = Builder.CreateInsertElement(
+VectorType::get(IntegerType::get(Ctx, HalfSize),
+cast(I->getType())->getElementCount()),
+Lo, ConstantInt::get(IntegerType::get(Ctx, 32), 0));
+Vec = Builder.CreateInsertElement(
+Vec, Hi, ConstantInt::get(IntegerType::get(Ctx, 32), 1));
+V = Builder.CreateBitOrPointerCast(Vec, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29d..8736c5bd9a9451 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,55 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:mov r0, #0
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 08d0405345f573..8384987ae3334b 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+;

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Split via Concat vector types for atomic load (PR #120640)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From a14a025133cc1e18bc1d554d3b0d9961f3696131 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG][X86] Split via Concat  vector types for
 atomic load

Vector types that aren't widened are 'split' via CONCAT_VECTORS
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |  4 +-
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 31 
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 24 -
 .../SelectionDAGAddressAnalysis.cpp   | 30 ++-
 .../SelectionDAG/SelectionDAGBuilder.cpp  |  6 ++-
 llvm/lib/Target/X86/X86ISelLowering.cpp   | 50 ---
 llvm/test/CodeGen/X86/atomic-load-store.ll| 36 +
 8 files changed, 147 insertions(+), 35 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba0538f7084eec..648ad14a67cb6c 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1837,7 +1837,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2263,6 +2263,8 @@ class SelectionDAG {
   /// location that the 'Base' load is loading from.
   bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
   unsigned Bytes, int Dist) const;
+  bool areNonVolatileConsecutiveLoads(AtomicSDNode *LD, AtomicSDNode *Base,
+  unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
   /// cannot be inferred.
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 8923d5596e7ac4..59ba987b1b8e92 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -948,6 +948,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 6f324e7ba1aa58..f6d816bb733ee0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1152,6 +1152,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1395,6 +1398,34 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  SDLoc dl(LD);
+
+  EVT MemoryVT = LD->getMemoryVT();
+  unsigned NumElts = MemoryVT.getVectorMinNumElements();
+
+  EVT IntMemoryVT = EVT::getVectorVT(*DAG.getContext(), MVT::i16, NumElts);
+  EVT ElemVT = EVT::getVectorVT(*DAG.getContext(),
+MemoryVT.getVectorElementType(), 1);
+
+  // Create a single atomic to load all the elements at once.
+  SDValue Atomic = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, IntMemoryVT, 
IntMemoryVT,
+ LD->getChain(), LD->getBasePtr(),
+ LD->getMemOperand());
+
+  // Instead of splitting, put all the elements back into a vector.
+  SmallVector Ops;
+  for (unsigned i = 0; i < NumElts; ++i) {
+SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i16, Atomic,
+  DAG.getVectorIdxConstant(i, dl));
+Elt = DAG.getBitcast(ElemV

[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120386

>From 6f50ac7470ddea05e4cddfd51aab3d9a69736767 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:38:23 -0500
Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
---
 llvm/lib/Target/X86/X86ISelLowering.cpp|  4 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index a956074e50d86f..b681b80858de30 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2639,6 +2639,10 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
 setOperationAction(Op, MVT::f32, Promote);
   }
 
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64);
+
   // We have target-specific dag combine patterns for the following nodes:
   setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::SCALAR_TO_VECTOR,
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index d23cfb89f9fc87..6efcbb80c0ce6d 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   %ret = load atomic <1 x i64>, ptr %x acquire, align 8
   ret <1 x i64> %ret
 }
+
+define <1 x half> @atomic_vec1_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %cx
+; CHECK0-NEXT:## implicit-def: $eax
+; CHECK0-NEXT:movw %cx, %ax
+; CHECK0-NEXT:## implicit-def: $xmm0
+; CHECK0-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x half>, ptr %x acquire, align 2
+  ret <1 x half> %ret
+}
+
+define <1 x float> @atomic_vec1_float(ptr %x) {
+; CHECK-LABEL: atomic_vec1_float:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x float>, ptr %x acquire, align 4
+  ret <1 x float> %ret
+}
+
+define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec1_double_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 8
+  ret <1 x double> %ret
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385

>From 02d1d2da5ed69cf3d577b11ad4563c5d1bfc2d22 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:37:17 -0500
Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load

`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  15 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +-
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index f13f70e66cfaa6..893fca233d191b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -863,6 +863,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
   SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
+  SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N);
   SDValue ScalarizeVecRes_VSELECT(SDNode *N);
   SDValue ScalarizeVecRes_SELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index f39d9ca15496a9..980c157fe6f3f1 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, 
unsigned ResNo) {
 R = ScalarizeVecRes_UnaryOpWithExtraInput(N);
 break;
   case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+R = ScalarizeVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:   R = 
ScalarizeVecRes_LOAD(cast(N));break;
   case ISD::SCALAR_TO_VECTOR:  R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break;
   case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break;
@@ -457,6 +460,18 @@ SDValue 
DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) {
   return Op;
 }
 
+SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SDValue Result = DAG.getAtomic(
+  ISD::ATOMIC_LOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(),
+  N->getValueType(0).getVectorElementType(), N->getChain(), 
N->getBasePtr(),
+  N->getMemOperand());
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
+  return Result;
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) {
   assert(N->isUnindexed() && "Indexed vector load?");
 
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 5bce4401f7bdb0..d23cfb89f9fc87 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s --check-prefixes=CHECK,CHECK0
 
 define void @test1(ptr %ptr, i32 %val1) {
 ; CHECK-LABEL: test1:
@@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) {
   %val = load atomic i32, ptr %ptr seq_cst, align 4
   ret i32 %val
 }
+
+define <1 x i32> @atomic_vec1_i32(ptr %x) {
+; CHECK-LABEL: atomic_vec1_i32:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x i32>, ptr %x acquire, align 4
+  ret <1 x i32> %ret
+}
+
+define <1 x i8> @atomic_vec1_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzbl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movb (%rdi), %al
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i8>, ptr %x acquire, align 1
+  ret <1 x i8> %ret
+}
+
+define <1 x i16> @atomic_vec1_i16(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i16>, ptr %x acquire, align 2
+  ret <1 x i16> %ret
+}
+
+define <1 x i32> @atomic_vec1_i8_zext(ptr %x) {
+; CHECK3-LABEL: atomic_

[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385

>From 02d1d2da5ed69cf3d577b11ad4563c5d1bfc2d22 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:37:17 -0500
Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load

`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  15 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +-
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index f13f70e66cfaa6..893fca233d191b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -863,6 +863,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
   SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
+  SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N);
   SDValue ScalarizeVecRes_VSELECT(SDNode *N);
   SDValue ScalarizeVecRes_SELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index f39d9ca15496a9..980c157fe6f3f1 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, 
unsigned ResNo) {
 R = ScalarizeVecRes_UnaryOpWithExtraInput(N);
 break;
   case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+R = ScalarizeVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:   R = 
ScalarizeVecRes_LOAD(cast(N));break;
   case ISD::SCALAR_TO_VECTOR:  R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break;
   case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break;
@@ -457,6 +460,18 @@ SDValue 
DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) {
   return Op;
 }
 
+SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SDValue Result = DAG.getAtomic(
+  ISD::ATOMIC_LOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(),
+  N->getValueType(0).getVectorElementType(), N->getChain(), 
N->getBasePtr(),
+  N->getMemOperand());
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
+  return Result;
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) {
   assert(N->isUnindexed() && "Indexed vector load?");
 
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 5bce4401f7bdb0..d23cfb89f9fc87 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s --check-prefixes=CHECK,CHECK0
 
 define void @test1(ptr %ptr, i32 %val1) {
 ; CHECK-LABEL: test1:
@@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) {
   %val = load atomic i32, ptr %ptr seq_cst, align 4
   ret i32 %val
 }
+
+define <1 x i32> @atomic_vec1_i32(ptr %x) {
+; CHECK-LABEL: atomic_vec1_i32:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x i32>, ptr %x acquire, align 4
+  ret <1 x i32> %ret
+}
+
+define <1 x i8> @atomic_vec1_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzbl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movb (%rdi), %al
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i8>, ptr %x acquire, align 1
+  ret <1 x i8> %ret
+}
+
+define <1 x i16> @atomic_vec1_i16(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i16>, ptr %x acquire, align 2
+  ret <1 x i16> %ret
+}
+
+define <1 x i32> @atomic_vec1_i8_zext(ptr %x) {
+; CHECK3-LABEL: atomic_

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From f4d372c660d9c5abe4e60aca91d455c5c376a25e Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 25 +++--
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 52 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 45 
 .../X86/expand-atomic-non-integer.ll  | 14 +
 4 files changed, 133 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index a75fa688d87a8d..490cc9aa4c3c80 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2060,9 +2060,28 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's T scalar type to I's <2 x T/2> vector type
+  if (I->getType()->getScalarType()->isIntOrPtrTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+TypeSize Size = Result->getType()->getPrimitiveSizeInBits();
+assert((unsigned)Size % 2 == 0);
+unsigned HalfSize = (unsigned)Size / 2;
+Value *Lo =
+Builder.CreateTrunc(Result, IntegerType::get(Ctx, HalfSize));
+Value *RS = Builder.CreateLShr(
+Result, ConstantInt::get(IntegerType::get(Ctx, Size), HalfSize));
+Value *Hi = Builder.CreateTrunc(RS, IntegerType::get(Ctx, HalfSize));
+Value *Vec = Builder.CreateInsertElement(
+VectorType::get(IntegerType::get(Ctx, HalfSize),
+cast(I->getType())->getElementCount()),
+Lo, ConstantInt::get(IntegerType::get(Ctx, 32), 0));
+Vec = Builder.CreateInsertElement(
+Vec, Hi, ConstantInt::get(IntegerType::get(Ctx, 32), 1));
+V = Builder.CreateBitOrPointerCast(Vec, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29d..8736c5bd9a9451 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,55 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:mov r0, #0
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 08d0405345f573..8384987ae3334b 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+;

[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120386

>From 6f50ac7470ddea05e4cddfd51aab3d9a69736767 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:38:23 -0500
Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
---
 llvm/lib/Target/X86/X86ISelLowering.cpp|  4 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index a956074e50d86f..b681b80858de30 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2639,6 +2639,10 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
 setOperationAction(Op, MVT::f32, Promote);
   }
 
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64);
+
   // We have target-specific dag combine patterns for the following nodes:
   setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::SCALAR_TO_VECTOR,
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index d23cfb89f9fc87..6efcbb80c0ce6d 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   %ret = load atomic <1 x i64>, ptr %x acquire, align 8
   ret <1 x i64> %ret
 }
+
+define <1 x half> @atomic_vec1_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %cx
+; CHECK0-NEXT:## implicit-def: $eax
+; CHECK0-NEXT:movw %cx, %ax
+; CHECK0-NEXT:## implicit-def: $xmm0
+; CHECK0-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x half>, ptr %x acquire, align 2
+  ret <1 x half> %ret
+}
+
+define <1 x float> @atomic_vec1_float(ptr %x) {
+; CHECK-LABEL: atomic_vec1_float:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x float>, ptr %x acquire, align 4
+  ret <1 x float> %ret
+}
+
+define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec1_double_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 8
+  ret <1 x double> %ret
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120387

>From 2af4d7c2efd07dc96e93017c3684f6aeb73be9e8 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:40:32 -0500
Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes.

Unaligned atomic vectors with size >1 are lowered to calls.
Adding their tests separately here.

commit-id:a06a5cc6
---
 llvm/test/CodeGen/X86/atomic-load-store.ll | 253 +
 1 file changed, 253 insertions(+)

diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 6efcbb80c0ce6d..39e9fdfa5e62b0 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_ptr:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_ptr:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
+
 define <1 x half> @atomic_vec1_half(ptr %x) {
 ; CHECK3-LABEL: atomic_vec1_half:
 ; CHECK3:   ## %bb.0:
@@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) 
nounwind {
   %ret = load atomic <1 x double>, ptr %x acquire, align 8
   ret <1 x double> %ret
 }
+
+define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_i64:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i64:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i64>, ptr %x acquire, align 4
+  ret <1 x i64> %ret
+}
+
+define <1 x double> @atomic_vec1_double(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_double:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_double:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 4
+  ret <1 x double> %ret
+}
+
+define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec2_i32:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i32:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i32>, ptr %x acquire, align 4
+  ret <2 x i32> %ret
+}
+
+define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_float_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From a847ecfa366a17853e425a5fd94edb668396b446 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  5 +-
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 36 -
 llvm/lib/Target/X86/X86InstrCompiler.td   |  7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 81 +++
 llvm/test/CodeGen/X86/atomic-unordered.ll |  3 +-
 5 files changed, 124 insertions(+), 8 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 893fca233d191b..8923d5596e7ac4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1048,6 +1048,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
@@ -1131,8 +1132,8 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   /// resulting wider type. It takes:
   ///   LdChain: list of chains for the load to be generated.
   ///   Ld:  load to widen
-  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain,
-  LoadSDNode *LD);
+  SDValue GenWidenVectorLoads(SmallVectorImpl &LdChain, MemSDNode *LD,
+  bool IsAtomic = false);
 
   /// Helper function to generate a set of extension loads to load a vector 
with
   /// a resulting wider type. It takes:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 980c157fe6f3f1..6f324e7ba1aa58 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4521,6 +4521,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -5907,6 +5910,26 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SmallVector LdChain; // Chain for the series of load
+  SDValue Result = GenWidenVectorLoads(LdChain, N, /*IsAtomic=*/true);
+
+  if (Result) {
+// Build a factor node to remember the multiple loads are independent and
+// chain to that.
+SDValue NewChain =
+DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, LdChain);
+
+// Modified the chain - switch anything that used the old chain to use
+// the new one.
+ReplaceValueWith(SDValue(N, 1), NewChain);
+
+return Result;
+  }
+
+  report_fatal_error("Unable to widen atomic vector load");
+}
+
 SDValue DAGTypeLegalizer::WidenVecRes_LOAD(SDNode *N) {
   LoadSDNode *LD = cast(N);
   ISD::LoadExtType ExtType = LD->getExtensionType();
@@ -7706,7 +7729,7 @@ static SDValue BuildVectorFromScalar(SelectionDAG& DAG, 
EVT VecTy,
 }
 
 SDValue DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl 
&LdChain,
-  LoadSDNode *LD) {
+  MemSDNode *LD, bool IsAtomic) {
   // The strategy assumes that we can efficiently load power-of-two widths.
   // The routine chops the vector into the largest vector loads with the same
   // element type or scalar loads and then recombines it to the widen vector
@@ -7763,12 +7786,17 @@ SDValue 
DAGTypeLegalizer::GenWidenVectorLoads(SmallVectorImpl &LdChain,
 } while (TypeSize::isKnownGT(RemainingWidth, NewVTWidth));
   }
 
-  SDValue LdOp = DAG.getLoad(*FirstVT, dl, Chain, BasePtr, 
LD->getPointerInfo(),
- LD->getOriginalAlign(), MMOFlags, AAInfo);
+  SDValue LdOp;
+  if (IsAtomic)
+LdOp = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, *FirstVT, *FirstVT, Chain,
+ BasePtr, LD->getMemOperand());
+  else
+LdOp = DAG.getLoad(*FirstVT, dl, Chain, BasePtr, LD->getPointerInfo(),
+   LD->getOrigin

[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120387

>From 2af4d7c2efd07dc96e93017c3684f6aeb73be9e8 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:40:32 -0500
Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes.

Unaligned atomic vectors with size >1 are lowered to calls.
Adding their tests separately here.

commit-id:a06a5cc6
---
 llvm/test/CodeGen/X86/atomic-load-store.ll | 253 +
 1 file changed, 253 insertions(+)

diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 6efcbb80c0ce6d..39e9fdfa5e62b0 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_ptr:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_ptr:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
+
 define <1 x half> @atomic_vec1_half(ptr %x) {
 ; CHECK3-LABEL: atomic_vec1_half:
 ; CHECK3:   ## %bb.0:
@@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) 
nounwind {
   %ret = load atomic <1 x double>, ptr %x acquire, align 8
   ret <1 x double> %ret
 }
+
+define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_i64:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i64:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i64>, ptr %x acquire, align 4
+  ret <1 x i64> %ret
+}
+
+define <1 x double> @atomic_vec1_double(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_double:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_double:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 4
+  ret <1 x double> %ret
+}
+
+define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec2_i32:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i32:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i32>, ptr %x acquire, align 4
+  ret <2 x i32> %ret
+}
+
+define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_float_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-01-21 Thread via llvm-branch-commits


@@ -5907,6 +5910,30 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SmallVector LdChain; // Chain for the series of load
+  SDValue Result = GenWidenVectorLoads(LdChain, N, /*IsAtomic=*/true);

jofrn wrote:

Within GenWidenVectorLoads, MemVTs should be empty and so we will generate a 
single atomic anyway. I've added a check `IsAtomic` to ensure we go down this 
path.

https://github.com/llvm/llvm-project/pull/120598
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Introduce address space `hlsl_constant(2)` for constant buffer declarations (PR #123411)

2025-01-21 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota edited 
https://github.com/llvm/llvm-project/pull/123411
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Introduce address space `hlsl_constant(2)` for constant buffer declarations (PR #123411)

2025-01-21 Thread Helena Kotas via llvm-branch-commits


@@ -42,6 +42,7 @@ static const unsigned DirectXAddrSpaceMap[] = {
 0, // ptr32_uptr
 0, // ptr64
 3, // hlsl_groupshared
+2, // hlsl_constant

hekota wrote:

Yes, not updating all of the maps produces a compile error.

https://github.com/llvm/llvm-project/pull/123411
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] [lld][LoongArch] Support relaxation during TLSDESC GD/LD to IE/LE conversion. (PR #123730)

2025-01-21 Thread Zhaoxin Yang via llvm-branch-commits

https://github.com/ylzsx updated 
https://github.com/llvm/llvm-project/pull/123730

>From 187759562d861034a79cd8c4ee4ab063bba5f4ff Mon Sep 17 00:00:00 2001
From: yangzhaoxin 
Date: Sat, 4 Jan 2025 15:03:47 +0800
Subject: [PATCH 1/2] Support relaxation during TLSDESC GD/LD to IE/LE
 conversion.

Complement https://. When relaxation enable, remove redundant NOPs.
---
 lld/ELF/Arch/LoongArch.cpp | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp
index 37614c3e9615d6..5f49b23e8ffb1a 100644
--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -965,10 +965,16 @@ static bool relax(Ctx &ctx, InputSection &sec) {
 case R_LARCH_GOT_PC_HI20:
 case R_LARCH_TLS_GD_PC_HI20:
 case R_LARCH_TLS_LD_PC_HI20:
-case R_LARCH_TLS_DESC_PC_HI20:
   // The overflow check for i+2 will be carried out in isPairRelaxable.
-  if (r.expr != RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC &&
-  r.expr != R_RELAX_TLS_GD_TO_LE && isPairRelaxable(relocs, i))
+  if (isPairRelaxable(relocs, i))
+relaxPCHi20Lo12(ctx, sec, i, loc, r, relocs[i + 2], remove);
+  break;
+case R_LARCH_TLS_DESC_PC_HI20:
+  if (r.expr == RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC ||
+  r.expr == R_RELAX_TLS_GD_TO_LE) {
+if (relaxable(relocs, i))
+  remove = 4;
+  } else if (isPairRelaxable(relocs, i))
 relaxPCHi20Lo12(ctx, sec, i, loc, r, relocs[i + 2], remove);
   break;
 case R_LARCH_CALL36:
@@ -986,6 +992,17 @@ static bool relax(Ctx &ctx, InputSection &sec) {
   isUInt<12>(r.sym->getVA(ctx, r.addend)))
 remove = 4;
   break;
+case R_LARCH_TLS_DESC_PC_LO12:
+  if (relaxable(relocs, i) &&
+  (r.expr == RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC ||
+   r.expr == R_RELAX_TLS_GD_TO_LE))
+remove = 4;
+  break;
+case R_LARCH_TLS_DESC_LD:
+  if (relaxable(relocs, i) && r.expr == R_RELAX_TLS_GD_TO_LE &&
+  isUInt<12>(r.sym->getVA(ctx, r.addend)))
+remove = 4;
+  break;
 }
 
 // For all anchors whose offsets are <= r.offset, they are preceded by
@@ -1214,6 +1231,10 @@ void LoongArch::relocateAlloc(InputSectionBase &sec, 
uint8_t *buf) const {
bits);
 relocateNoSym(loc, rel.type, val);
   } else {
+isRelax = relaxable(relocs, i);
+if (isRelax && (rel.type == R_LARCH_TLS_DESC_PC_HI20 ||
+rel.type == R_LARCH_TLS_DESC_PC_LO12))
+  continue;
 tlsdescToIe(loc, rel, val);
   }
   continue;
@@ -1230,6 +1251,11 @@ void LoongArch::relocateAlloc(InputSectionBase &sec, 
uint8_t *buf) const {
bits);
 relocateNoSym(loc, rel.type, val);
   } else {
+isRelax = relaxable(relocs, i);
+if (isRelax && (rel.type == R_LARCH_TLS_DESC_PC_HI20 ||
+rel.type == R_LARCH_TLS_DESC_PC_LO12 ||
+(rel.type == R_LARCH_TLS_DESC_LD && isUInt<12>(val
+  continue;
 tlsdescToLe(loc, rel, val);
   }
   continue;

>From e024b7c5a85100627aa64a47dfc46221e709f400 Mon Sep 17 00:00:00 2001
From: yangzhaoxin 
Date: Sat, 4 Jan 2025 15:30:55 +0800
Subject: [PATCH 2/2] Modify loongarch-relax-tlsdesc.s.

---
 lld/test/ELF/loongarch-relax-tlsdesc.s | 45 +-
 1 file changed, 16 insertions(+), 29 deletions(-)

diff --git a/lld/test/ELF/loongarch-relax-tlsdesc.s 
b/lld/test/ELF/loongarch-relax-tlsdesc.s
index 5e538985d1402c..666ca6bd1e7243 100644
--- a/lld/test/ELF/loongarch-relax-tlsdesc.s
+++ b/lld/test/ELF/loongarch-relax-tlsdesc.s
@@ -9,7 +9,6 @@
 # RUN: llvm-readobj -r -x .got a.64.so | FileCheck --check-prefix=GD64-RELA %s
 # RUN: llvm-objdump --no-show-raw-insn -dr -h a.64.so | FileCheck %s 
--check-prefix=GD64
 
-## FIXME: IE/LE relaxation have not yet been implemented, --relax/--no-relax 
obtain the same results.
 ## Transition from TLSDESC to IE/LE. Also check --emit-relocs.
 # RUN: ld.lld --relax -e 0 -z now --emit-relocs a.64.o c.64.o -o a.64.le
 # RUN: llvm-readobj -r -x .got a.64.le 2>&1 | FileCheck 
--check-prefix=LE64-RELA %s
@@ -73,25 +72,21 @@
 # LE64-RELA: could not find section '.got'
 
 ## a@tprel = 0x8
-# LE64:20158: nop
+# LE64:20158: ori $a0, $zero, 8
 # LE64-NEXT:R_LARCH_TLS_DESC_PC_HI20 a
 # LE64-NEXT:R_LARCH_RELAX *ABS*
-# LE64-NEXT:  nop
 # LE64-NEXT:R_LARCH_TLS_DESC_PC_LO12 a
 # LE64-NEXT:R_LARCH_RELAX *ABS*
-# LE64-NEXT:  nop
 # LE64-NEXT:R_LARCH_TLS_DESC_LD a
 # LE64-NEXT:R_LARCH_RELAX *ABS*
-# LE64-NEXT:  ori $a0, $zero, 8
 # LE64-NEXT:R_LARCH_TLS_DESC_CALL a
 # LE64-NEXT:R_LARCH_RELAX *ABS*
 # LE64-NEXT:  add.d   $a1, $a0, $tp
 
 ## b@tprel = 0x7ff
-# LE64:2016c: nop
+# LE64:20160: 

[llvm-branch-commits] [lld] [lld][LoongArch] GOT indirection to PC relative optimization. (PR #123743)

2025-01-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-lld

Author: Zhaoxin Yang (ylzsx)


Changes

In LoongArch, this optimization is only supported when relaxation is enabled.
From:
* pcalau12i $a0, %got_pc_hi20(sym_got)
* ld.w/d $a0, $a0, %got_pc_lo12(sym_got)
To:
* pcalau12i $a0, %pc_hi20(sym)
* addi.w/d $a0, $a0, %pc_lo12(sym)

If the original code sequence can be relaxed into a single instruction 
`pcaddi`, this patch will not be taken (see 
https://github.com/llvm/llvm-project/pull/123566).
The implementation related to `got` is split into two locations because the 
`relax()` function is part of an iteration fixed-point algorithm. We should 
minimize it to achieve better linker performance.

FIXME: Althouth the optimization has been performed, the GOT entries still 
exists, similarly to AArch64. Eliminating the entries may be require additional 
marking in the common code.

---
Full diff: https://github.com/llvm/llvm-project/pull/123743.diff


2 Files Affected:

- (modified) lld/ELF/Arch/LoongArch.cpp (+70) 
- (modified) lld/test/ELF/loongarch-relax-pc-hi20-lo12.s (+6-4) 


``diff
diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp
index 5f49b23e8ffb1a..6ae45e109a6dec 100644
--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -47,6 +47,8 @@ class LoongArch final : public TargetInfo {
   void tlsIeToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
   void tlsdescToIe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
   void tlsdescToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
+  bool tryGotToPCRel(uint8_t *loc, const Relocation &rHi20,
+ const Relocation &rLo12, uint64_t secAddr) const;
 };
 } // end anonymous namespace
 
@@ -1150,6 +1152,58 @@ void LoongArch::tlsdescToLe(uint8_t *loc, const 
Relocation &rel,
   }
 }
 
+// Try GOT indirection to PC relative optimization when relaxation is enabled.
+// From:
+//  * pcalau12i $a0, %got_pc_hi20(sym_got)
+//  * ld.w/d$a0, $a0, %got_pc_lo12(sym_got)
+// To:
+//  * pcalau12i $a0, %pc_hi20(sym)
+//  * addi.w/d  $a0, $a0, %pc_lo12(sym)
+//
+// FIXME: Althouth the optimization has been performed, the GOT entries still
+// exists, similarly to AArch64. Eliminating the entries may be require
+// additional marking in the common code.
+bool LoongArch::tryGotToPCRel(uint8_t *loc, const Relocation &rHi20,
+  const Relocation &rLo12, uint64_t secAddr) const 
{
+  if (!rHi20.sym->isDefined() || rHi20.sym->isPreemptible ||
+  rHi20.sym->isGnuIFunc() ||
+  (ctx.arg.isPic && !cast(*rHi20.sym).section))
+return false;
+
+  Symbol &sym = *rHi20.sym;
+  uint64_t symLocal = sym.getVA(ctx) + rHi20.addend;
+  // Check if the address difference is within +/-2GB range.
+  // For simplicity, the range mentioned here is an approximate estimate and is
+  // not fully equivalent to the entire region that PC-relative addressing can
+  // cover.
+  int64_t pageOffset =
+  getLoongArchPage(symLocal) - getLoongArchPage(secAddr + rHi20.offset);
+  if (!isInt<20>(pageOffset >> 12))
+return false;
+
+  Relocation newRHi20 = {RE_LOONGARCH_PAGE_PC, R_LARCH_PCALA_HI20, 
rHi20.offset,
+ rHi20.addend, &sym};
+  Relocation newRLo12 = {R_ABS, R_LARCH_PCALA_LO12, rLo12.offset, rLo12.addend,
+ &sym};
+
+  const uint32_t currInsn = read32le(loc);
+  const uint32_t nextInsn = read32le(loc + 4);
+  // Check if use the same register.
+  if (getD5(currInsn) != getJ5(nextInsn) || getJ5(nextInsn) != getD5(nextInsn))
+return false;
+
+  uint64_t pageDelta =
+  getLoongArchPageDelta(symLocal, secAddr + rHi20.offset, rHi20.type);
+  // pcalau12i $a0, %pc_hi20
+  write32le(loc, insn(PCALAU12I, getD5(currInsn), 0, 0));
+  relocate(loc, newRHi20, pageDelta);
+  // addi.w/d $a0, $a0, %pc_lo12
+  write32le(loc + 4, insn(ctx.arg.is64 ? ADDI_D : ADDI_W, getD5(nextInsn),
+  getJ5(nextInsn), 0));
+  relocate(loc + 4, newRLo12, SignExtend64(symLocal, 64));
+  return true;
+}
+
 // During TLSDESC GD_TO_IE, the converted code sequence always includes an
 // instruction related to the Lo12 relocation (ld.[wd]). To obtain correct val
 // in `getRelocTargetVA`, expr of this instruction should be adjusted to
@@ -1259,6 +1313,22 @@ void LoongArch::relocateAlloc(InputSectionBase &sec, 
uint8_t *buf) const {
 tlsdescToLe(loc, rel, val);
   }
   continue;
+case RE_LOONGARCH_GOT_PAGE_PC:
+  // In LoongArch, we try GOT indirection to PC relative optimization only
+  // when relaxation is enabled. This approach avoids determining whether
+  // relocation types are paired and whether the destination register of
+  // pcalau12i is only used by the immediately following instruction.
+  // Moreover, if the original code sequence can be relaxed to a single
+  // instruction `pcaddi`, the first instruction will be removed and it 
will
+  // not reach here.
+  if (isP

[llvm-branch-commits] [lld] [lld][LoongArch] GOT indirection to PC relative optimization. (PR #123743)

2025-01-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-lld-elf

Author: Zhaoxin Yang (ylzsx)


Changes

In LoongArch, this optimization is only supported when relaxation is enabled.
From:
* pcalau12i $a0, %got_pc_hi20(sym_got)
* ld.w/d $a0, $a0, %got_pc_lo12(sym_got)
To:
* pcalau12i $a0, %pc_hi20(sym)
* addi.w/d $a0, $a0, %pc_lo12(sym)

If the original code sequence can be relaxed into a single instruction 
`pcaddi`, this patch will not be taken (see 
https://github.com/llvm/llvm-project/pull/123566).
The implementation related to `got` is split into two locations because the 
`relax()` function is part of an iteration fixed-point algorithm. We should 
minimize it to achieve better linker performance.

FIXME: Althouth the optimization has been performed, the GOT entries still 
exists, similarly to AArch64. Eliminating the entries may be require additional 
marking in the common code.

---
Full diff: https://github.com/llvm/llvm-project/pull/123743.diff


2 Files Affected:

- (modified) lld/ELF/Arch/LoongArch.cpp (+70) 
- (modified) lld/test/ELF/loongarch-relax-pc-hi20-lo12.s (+6-4) 


``diff
diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp
index 5f49b23e8ffb1a..6ae45e109a6dec 100644
--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -47,6 +47,8 @@ class LoongArch final : public TargetInfo {
   void tlsIeToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
   void tlsdescToIe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
   void tlsdescToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
+  bool tryGotToPCRel(uint8_t *loc, const Relocation &rHi20,
+ const Relocation &rLo12, uint64_t secAddr) const;
 };
 } // end anonymous namespace
 
@@ -1150,6 +1152,58 @@ void LoongArch::tlsdescToLe(uint8_t *loc, const 
Relocation &rel,
   }
 }
 
+// Try GOT indirection to PC relative optimization when relaxation is enabled.
+// From:
+//  * pcalau12i $a0, %got_pc_hi20(sym_got)
+//  * ld.w/d$a0, $a0, %got_pc_lo12(sym_got)
+// To:
+//  * pcalau12i $a0, %pc_hi20(sym)
+//  * addi.w/d  $a0, $a0, %pc_lo12(sym)
+//
+// FIXME: Althouth the optimization has been performed, the GOT entries still
+// exists, similarly to AArch64. Eliminating the entries may be require
+// additional marking in the common code.
+bool LoongArch::tryGotToPCRel(uint8_t *loc, const Relocation &rHi20,
+  const Relocation &rLo12, uint64_t secAddr) const 
{
+  if (!rHi20.sym->isDefined() || rHi20.sym->isPreemptible ||
+  rHi20.sym->isGnuIFunc() ||
+  (ctx.arg.isPic && !cast(*rHi20.sym).section))
+return false;
+
+  Symbol &sym = *rHi20.sym;
+  uint64_t symLocal = sym.getVA(ctx) + rHi20.addend;
+  // Check if the address difference is within +/-2GB range.
+  // For simplicity, the range mentioned here is an approximate estimate and is
+  // not fully equivalent to the entire region that PC-relative addressing can
+  // cover.
+  int64_t pageOffset =
+  getLoongArchPage(symLocal) - getLoongArchPage(secAddr + rHi20.offset);
+  if (!isInt<20>(pageOffset >> 12))
+return false;
+
+  Relocation newRHi20 = {RE_LOONGARCH_PAGE_PC, R_LARCH_PCALA_HI20, 
rHi20.offset,
+ rHi20.addend, &sym};
+  Relocation newRLo12 = {R_ABS, R_LARCH_PCALA_LO12, rLo12.offset, rLo12.addend,
+ &sym};
+
+  const uint32_t currInsn = read32le(loc);
+  const uint32_t nextInsn = read32le(loc + 4);
+  // Check if use the same register.
+  if (getD5(currInsn) != getJ5(nextInsn) || getJ5(nextInsn) != getD5(nextInsn))
+return false;
+
+  uint64_t pageDelta =
+  getLoongArchPageDelta(symLocal, secAddr + rHi20.offset, rHi20.type);
+  // pcalau12i $a0, %pc_hi20
+  write32le(loc, insn(PCALAU12I, getD5(currInsn), 0, 0));
+  relocate(loc, newRHi20, pageDelta);
+  // addi.w/d $a0, $a0, %pc_lo12
+  write32le(loc + 4, insn(ctx.arg.is64 ? ADDI_D : ADDI_W, getD5(nextInsn),
+  getJ5(nextInsn), 0));
+  relocate(loc + 4, newRLo12, SignExtend64(symLocal, 64));
+  return true;
+}
+
 // During TLSDESC GD_TO_IE, the converted code sequence always includes an
 // instruction related to the Lo12 relocation (ld.[wd]). To obtain correct val
 // in `getRelocTargetVA`, expr of this instruction should be adjusted to
@@ -1259,6 +1313,22 @@ void LoongArch::relocateAlloc(InputSectionBase &sec, 
uint8_t *buf) const {
 tlsdescToLe(loc, rel, val);
   }
   continue;
+case RE_LOONGARCH_GOT_PAGE_PC:
+  // In LoongArch, we try GOT indirection to PC relative optimization only
+  // when relaxation is enabled. This approach avoids determining whether
+  // relocation types are paired and whether the destination register of
+  // pcalau12i is only used by the immediately following instruction.
+  // Moreover, if the original code sequence can be relaxed to a single
+  // instruction `pcaddi`, the first instruction will be removed and it 
will
+  // not reach here.
+  if 

[llvm-branch-commits] [lld] [lld][LoongArch] GOT indirection to PC relative optimization. (PR #123743)

2025-01-21 Thread Zhaoxin Yang via llvm-branch-commits

https://github.com/ylzsx created 
https://github.com/llvm/llvm-project/pull/123743

In LoongArch, this optimization is only supported when relaxation is enabled.
From:
* pcalau12i $a0, %got_pc_hi20(sym_got)
* ld.w/d $a0, $a0, %got_pc_lo12(sym_got)
To:
* pcalau12i $a0, %pc_hi20(sym)
* addi.w/d $a0, $a0, %pc_lo12(sym)

If the original code sequence can be relaxed into a single instruction 
`pcaddi`, this patch will not be taken (see 
https://github.com/llvm/llvm-project/pull/123566).
The implementation related to `got` is split into two locations because the 
`relax()` function is part of an iteration fixed-point algorithm. We should 
minimize it to achieve better linker performance.

FIXME: Althouth the optimization has been performed, the GOT entries still 
exists, similarly to AArch64. Eliminating the entries may be require additional 
marking in the common code.

>From 6e05f360c33f6a1999032d104d020afa5191c21c Mon Sep 17 00:00:00 2001
From: yangzhaoxin 
Date: Tue, 14 Jan 2025 15:50:49 +0800
Subject: [PATCH 1/2] [lld][LoongArch] GOT indirection to PC relative
 optimization.

In LoongArch, this optimization is only supported when relaxation is enabled.
From:
 * pcalau12i $a0, %got_pc_hi20(sym_got)
 * ld.w/d $a0, $a0, %got_pc_lo12(sym_got)
To:
 * pcalau12i $a0, %pc_hi20(sym)
 * addi.w/d $a0, $a0, %pc_lo12(sym)

If the original code sequence can be relaxed into a single instruction
`pcaddi`, this patch will not be taken (see https://).
The implementation related to `got` is split into two locations because
the `relax()` function is part of an iteration fixed-point algorithm. We
should minimize it to achieve better linker performance.

FIXME: Althouth the optimization has been performed, the GOT entries still
exists, similarly to AArch64. Eliminating the entries may be require
additional marking in the common code.
---
 lld/ELF/Arch/LoongArch.cpp  | 66 +
 lld/test/ELF/loongarch-relax-pc-hi20-lo12.s | 10 ++--
 2 files changed, 72 insertions(+), 4 deletions(-)

diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp
index 5f49b23e8ffb1a..bd98831fba5257 100644
--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -47,6 +47,8 @@ class LoongArch final : public TargetInfo {
   void tlsIeToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
   void tlsdescToIe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
   void tlsdescToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
+  bool tryGotToPCRel(uint8_t *loc, const Relocation &rHi20,
+ const Relocation &rLo12, uint64_t secAddr) const;
 };
 } // end anonymous namespace
 
@@ -1150,6 +1152,54 @@ void LoongArch::tlsdescToLe(uint8_t *loc, const 
Relocation &rel,
   }
 }
 
+// Try GOT indirection to PC relative optimization when relaxation is enabled.
+// From:
+//  * pcalau12i $a0, %got_pc_hi20(sym_got)
+//  * ld.w/d$a0, $a0, %got_pc_lo12(sym_got)
+// To:
+//  * pcalau12i $a0, %pc_hi20(sym)
+//  * addi.w/d  $a0, $a0, %pc_lo12(sym)
+//
+// FIXME: Althouth the optimization has been performed, the GOT entries still
+// exists, similarly to AArch64. Eliminating the entries may be require
+// additional marking in the common code.
+bool LoongArch::tryGotToPCRel(uint8_t *loc, const Relocation &rHi20,
+  const Relocation &rLo12, uint64_t secAddr) const 
{
+  if (!rHi20.sym->isDefined() || rHi20.sym->isPreemptible ||
+  rHi20.sym->isGnuIFunc() ||
+  (ctx.arg.isPic && !cast(*rHi20.sym).section))
+return false;
+
+  Symbol &sym = *rHi20.sym;
+  uint64_t symLocal = sym.getVA(ctx) + rHi20.addend;
+  // Check if the address difference is within +/-2GB range.
+  // For simplicity, the range mentioned here is an approximate estimate and is
+  // not fully equivalent to the entire region that PC-relative addressing can
+  // cover.
+  int64_t pageOffset =
+  getLoongArchPage(symLocal) - getLoongArchPage(secAddr + rHi20.offset);
+  if (!isInt<20>(pageOffset >> 12))
+return false;
+
+  Relocation newRHi20 = {RE_LOONGARCH_PAGE_PC, R_LARCH_PCALA_HI20, 
rHi20.offset,
+ rHi20.addend, &sym};
+  Relocation newRLo12 = {R_ABS, R_LARCH_PCALA_LO12, rLo12.offset, rLo12.addend,
+ &sym};
+
+  const uint32_t currInsn = read32le(loc);
+  const uint32_t nextInsn = read32le(loc + 4);
+  uint64_t pageDelta =
+  getLoongArchPageDelta(symLocal, secAddr + rHi20.offset, rHi20.type);
+  // pcalau12i $a0, %pc_hi20
+  write32le(loc, insn(PCALAU12I, getD5(currInsn), 0, 0));
+  relocate(loc, newRHi20, pageDelta);
+  // addi.w/d $a0, $a0, %pc_lo12
+  write32le(loc + 4, insn(ctx.arg.is64 ? ADDI_D : ADDI_W, getD5(nextInsn),
+  getJ5(nextInsn), 0));
+  relocate(loc + 4, newRLo12, SignExtend64(symLocal, 64));
+  return true;
+}
+
 // During TLSDESC GD_TO_IE, the converted code sequence always includes an
 // instruction related to the Lo12 relocation (ld.[wd]). To obtain correct 

[llvm-branch-commits] [lld] [lld][LoongArch] Support TLSDESC GD/LD to IE/LE. (PR #123715)

2025-01-21 Thread Zhaoxin Yang via llvm-branch-commits

https://github.com/ylzsx updated 
https://github.com/llvm/llvm-project/pull/123715

>From dff3031fdb2ca3755b73e3b81e56f8008a409470 Mon Sep 17 00:00:00 2001
From: yangzhaoxin 
Date: Fri, 3 Jan 2025 14:29:17 +0800
Subject: [PATCH 1/5] [lld][LoongArch] Implement TLSDESC GD/LD to IE/LE.

Support TLSDESC to initial-exec or local-exec optimizations. Introduce a new 
hook
RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC and use existing 
R_RELAX_TLS_GD_TO_IE_ABS
to support TLSDESC => IE, while use existing R_RELAX_TLS_GD_TO_LE to support
TLSDESC => LE.

In normal or medium code model, there are two forms of code sequences:
 * pcalau12i  $a0, %desc_pc_hi20(sym_desc)
 * addi.d $a0, $a0, %desc_pc_lo12(sym_desc)
 * ld.d   $ra, $a0, %desc_ld(sym_desc)
 * jirl   $ra, $ra, %desc_call(sym_desc)
 --
 * pcaddi $a0, %desc_pcrel_20(sym_desc)
 * ld.d   $ra, $a0, %desc_ld(sym_desc)
 * jirl   $ra, $ra, %desc_call(sym_desc)

The code sequence obtained is as follows:
 * pcalau12i $a0, %ie_pc_hi20(sym_ie)
 * ld.[wd]   $a0, $a0, %ie_pc_lo12(sym_ie)

Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to convert the 
preceding
instructions to NOPs, due to both forms of code sequence (corresponding to 
relocation
combinations: R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and 
R_LARCH_TLS_DESC_PCREL20_S2)
have same process.

FIXME: When relaxation enables, redundant NOPs can be removed. It will
be implemented in a future patch.

Note: All forms of TLSDESC code sequences should not appear interleaved in the 
normal,
medium or extreme code model, which compilers do not generate and lld is 
unsupported.
This is thanks to the guard in PostRASchedulerList.cpp in llvm.
```
Calls are not scheduling boundaries before register allocation, but
post-ra we don't gain anything by scheduling across calls since we
don't need to worry about register pressure.
```
---
 lld/ELF/Arch/LoongArch.cpp | 146 -
 lld/ELF/InputSection.cpp   |   1 +
 lld/ELF/Relocations.cpp|  38 ++
 lld/ELF/Relocations.h  |   1 +
 4 files changed, 169 insertions(+), 17 deletions(-)

diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp
index ef25e741901d93..37614c3e9615d6 100644
--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -39,11 +39,14 @@ class LoongArch final : public TargetInfo {
   void relocate(uint8_t *loc, const Relocation &rel,
 uint64_t val) const override;
   bool relaxOnce(int pass) const override;
+  RelExpr adjustTlsExpr(RelType type, RelExpr expr) const override;
   void relocateAlloc(InputSectionBase &sec, uint8_t *buf) const override;
   void finalizeRelax(int passes) const override;
 
 private:
   void tlsIeToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
+  void tlsdescToIe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
+  void tlsdescToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
 };
 } // end anonymous namespace
 
@@ -61,6 +64,7 @@ enum Op {
   LU12I_W = 0x1400,
   PCADDI = 0x1800,
   PCADDU12I = 0x1c00,
+  PCALAU12I = 0x1a00,
   LD_W = 0x2880,
   LD_D = 0x28c0,
   JIRL = 0x4c00,
@@ -72,6 +76,7 @@ enum Reg {
   R_ZERO = 0,
   R_RA = 1,
   R_TP = 2,
+  R_A0 = 4,
   R_T0 = 12,
   R_T1 = 13,
   R_T2 = 14,
@@ -962,7 +967,8 @@ static bool relax(Ctx &ctx, InputSection &sec) {
 case R_LARCH_TLS_LD_PC_HI20:
 case R_LARCH_TLS_DESC_PC_HI20:
   // The overflow check for i+2 will be carried out in isPairRelaxable.
-  if (isPairRelaxable(relocs, i))
+  if (r.expr != RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC &&
+  r.expr != R_RELAX_TLS_GD_TO_LE && isPairRelaxable(relocs, i))
 relaxPCHi20Lo12(ctx, sec, i, loc, r, relocs[i + 2], remove);
   break;
 case R_LARCH_CALL36:
@@ -1047,6 +1053,103 @@ void LoongArch::tlsIeToLe(uint8_t *loc, const 
Relocation &rel,
   }
 }
 
+// Convert TLSDESC GD/LD to IE.
+// In normal or medium code model, there are two forms of code sequences:
+//  * pcalau12i  $a0, %desc_pc_hi20(sym_desc)
+//  * addi.d $a0, $a0, %desc_pc_lo12(sym_desc)
+//  * ld.d   $ra, $a0, %desc_ld(sym_desc)
+//  * jirl   $ra, $ra, %desc_call(sym_desc)
+//  --
+//  * pcaddi $a0, %desc_pcrel_20(a)
+//  * load $ra, $a0, %desc_ld(a)
+//  * jirl $ra, $ra, %desc_call(a)
+//
+// The code sequence obtained is as follows:
+//  * pcalau12i $a0, %ie_pc_hi20(sym_ie)
+//  * ld.[wd]   $a0, $a0, %ie_pc_lo12(sym_ie)
+//
+// Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to convert 
the
+// preceding instructions to NOPs, due to both forms of code sequence
+// (corresponding to relocation combinations:
+// R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and
+// R_LARCH_TLS_DESC_PCREL20_S2) have same process.
+//
+// When relaxation enables, redundant NOPs can be removed.
+void LoongArch::tlsdescToIe(uint8_t *loc, const Relocation &rel,
+uint64_t val) const {
+  switch (rel.type) {
+

[llvm-branch-commits] [llvm] AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (PR #123684)

2025-01-21 Thread Matt Arsenault via llvm-branch-commits


@@ -489,6 +489,90 @@ void AMDGPUDAGToDAGISel::SelectBuildVector(SDNode *N, 
unsigned RegClassID) {
   CurDAG->SelectNodeTo(N, AMDGPU::REG_SEQUENCE, N->getVTList(), RegSeqArgs);
 }
 
+void AMDGPUDAGToDAGISel::SelectVectorShuffle(SDNode *N) {
+  EVT VT = N->getValueType(0);
+  EVT EltVT = VT.getVectorElementType();
+
+  // TODO: Handle 16-bit element vectors with even aligned masks.
+  if (!Subtarget->hasPkMovB32() || !EltVT.bitsEq(MVT::i32) ||
+  VT.getVectorNumElements() != 2) {
+SelectCode(N);
+return;
+  }
+
+  auto *SVN = cast(N);
+
+  SDValue Src0 = SVN->getOperand(0);
+  SDValue Src1 = SVN->getOperand(1);
+  ArrayRef Mask = SVN->getMask();
+  SDLoc DL(N);
+
+  assert(Src0.getValueType().getVectorNumElements() == 2 && Mask.size() == 2 &&
+ Mask[0] < 4 && Mask[1] < 4);
+
+  SDValue VSrc0 = Mask[0] < 2 ? Src0 : Src1;
+  SDValue VSrc1 = Mask[1] < 2 ? Src0 : Src1;
+  unsigned Src0SubReg = Mask[0] & 1 ? AMDGPU::sub1 : AMDGPU::sub0;
+  unsigned Src1SubReg = Mask[1] & 1 ? AMDGPU::sub1 : AMDGPU::sub0;
+
+  if (Mask[0] < 0) {
+Src0SubReg = Src1SubReg;
+MachineSDNode *ImpDef =
+CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT);
+VSrc0 = SDValue(ImpDef, 0);
+  }
+
+  if (Mask[1] < 0) {
+Src1SubReg = Src0SubReg;
+MachineSDNode *ImpDef =
+CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT);
+VSrc1 = SDValue(ImpDef, 0);
+  }
+
+  // SGPR case needs to lower to copies.
+  //
+  // Also use subregister extract when we can directly blend the registers with
+  // a simple subregister copy.
+  //
+  // TODO: Maybe we should fold this out earlier
+  if (N->isDivergent() && Src0SubReg == AMDGPU::sub1 &&
+  Src1SubReg == AMDGPU::sub0) {
+// The low element of the result always comes from src0.
+// The high element of the result always comes from src1.
+// op_sel selects the high half of src0.
+// op_sel_hi selects the high half of src1.
+
+unsigned Src0OpSel =
+Src0SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE;
+unsigned Src1OpSel =
+Src1SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE;

arsenm wrote:

I'm not sure this is correctly encoded. I'm confused by how op_sel and 
op_sel_hi are supposed to be represented. We set fields in the source 
modifiers. I guess this should probably be OP_SEL_1?

https://github.com/llvm/llvm-project/pull/123684
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] [lld][LoongArch] Support relaxation during TLSDESC GD/LD to IE/LE conversion. (PR #123730)

2025-01-21 Thread Zhaoxin Yang via llvm-branch-commits

https://github.com/ylzsx created 
https://github.com/llvm/llvm-project/pull/123730

Complement https://github.com/llvm/llvm-project/pull/123715. When relaxation 
enable, remove redundant NOPs.

>From ddb64ee49845b302df8ea50546164cceb87cf288 Mon Sep 17 00:00:00 2001
From: yangzhaoxin 
Date: Sat, 4 Jan 2025 15:03:47 +0800
Subject: [PATCH 1/2] Support relaxation during TLSDESC GD/LD to IE/LE
 conversion.

Complement https://. When relaxation enable, remove redundant NOPs.
---
 lld/ELF/Arch/LoongArch.cpp | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp
index 37614c3e9615d6..5f49b23e8ffb1a 100644
--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -965,10 +965,16 @@ static bool relax(Ctx &ctx, InputSection &sec) {
 case R_LARCH_GOT_PC_HI20:
 case R_LARCH_TLS_GD_PC_HI20:
 case R_LARCH_TLS_LD_PC_HI20:
-case R_LARCH_TLS_DESC_PC_HI20:
   // The overflow check for i+2 will be carried out in isPairRelaxable.
-  if (r.expr != RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC &&
-  r.expr != R_RELAX_TLS_GD_TO_LE && isPairRelaxable(relocs, i))
+  if (isPairRelaxable(relocs, i))
+relaxPCHi20Lo12(ctx, sec, i, loc, r, relocs[i + 2], remove);
+  break;
+case R_LARCH_TLS_DESC_PC_HI20:
+  if (r.expr == RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC ||
+  r.expr == R_RELAX_TLS_GD_TO_LE) {
+if (relaxable(relocs, i))
+  remove = 4;
+  } else if (isPairRelaxable(relocs, i))
 relaxPCHi20Lo12(ctx, sec, i, loc, r, relocs[i + 2], remove);
   break;
 case R_LARCH_CALL36:
@@ -986,6 +992,17 @@ static bool relax(Ctx &ctx, InputSection &sec) {
   isUInt<12>(r.sym->getVA(ctx, r.addend)))
 remove = 4;
   break;
+case R_LARCH_TLS_DESC_PC_LO12:
+  if (relaxable(relocs, i) &&
+  (r.expr == RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC ||
+   r.expr == R_RELAX_TLS_GD_TO_LE))
+remove = 4;
+  break;
+case R_LARCH_TLS_DESC_LD:
+  if (relaxable(relocs, i) && r.expr == R_RELAX_TLS_GD_TO_LE &&
+  isUInt<12>(r.sym->getVA(ctx, r.addend)))
+remove = 4;
+  break;
 }
 
 // For all anchors whose offsets are <= r.offset, they are preceded by
@@ -1214,6 +1231,10 @@ void LoongArch::relocateAlloc(InputSectionBase &sec, 
uint8_t *buf) const {
bits);
 relocateNoSym(loc, rel.type, val);
   } else {
+isRelax = relaxable(relocs, i);
+if (isRelax && (rel.type == R_LARCH_TLS_DESC_PC_HI20 ||
+rel.type == R_LARCH_TLS_DESC_PC_LO12))
+  continue;
 tlsdescToIe(loc, rel, val);
   }
   continue;
@@ -1230,6 +1251,11 @@ void LoongArch::relocateAlloc(InputSectionBase &sec, 
uint8_t *buf) const {
bits);
 relocateNoSym(loc, rel.type, val);
   } else {
+isRelax = relaxable(relocs, i);
+if (isRelax && (rel.type == R_LARCH_TLS_DESC_PC_HI20 ||
+rel.type == R_LARCH_TLS_DESC_PC_LO12 ||
+(rel.type == R_LARCH_TLS_DESC_LD && isUInt<12>(val
+  continue;
 tlsdescToLe(loc, rel, val);
   }
   continue;

>From ad372aff2088a03136424af2c845194a79ae7221 Mon Sep 17 00:00:00 2001
From: yangzhaoxin 
Date: Sat, 4 Jan 2025 15:30:55 +0800
Subject: [PATCH 2/2] Modify loongarch-relax-tlsdesc.s.

---
 lld/test/ELF/loongarch-relax-tlsdesc.s | 45 +-
 1 file changed, 16 insertions(+), 29 deletions(-)

diff --git a/lld/test/ELF/loongarch-relax-tlsdesc.s 
b/lld/test/ELF/loongarch-relax-tlsdesc.s
index 5e538985d1402c..666ca6bd1e7243 100644
--- a/lld/test/ELF/loongarch-relax-tlsdesc.s
+++ b/lld/test/ELF/loongarch-relax-tlsdesc.s
@@ -9,7 +9,6 @@
 # RUN: llvm-readobj -r -x .got a.64.so | FileCheck --check-prefix=GD64-RELA %s
 # RUN: llvm-objdump --no-show-raw-insn -dr -h a.64.so | FileCheck %s 
--check-prefix=GD64
 
-## FIXME: IE/LE relaxation have not yet been implemented, --relax/--no-relax 
obtain the same results.
 ## Transition from TLSDESC to IE/LE. Also check --emit-relocs.
 # RUN: ld.lld --relax -e 0 -z now --emit-relocs a.64.o c.64.o -o a.64.le
 # RUN: llvm-readobj -r -x .got a.64.le 2>&1 | FileCheck 
--check-prefix=LE64-RELA %s
@@ -73,25 +72,21 @@
 # LE64-RELA: could not find section '.got'
 
 ## a@tprel = 0x8
-# LE64:20158: nop
+# LE64:20158: ori $a0, $zero, 8
 # LE64-NEXT:R_LARCH_TLS_DESC_PC_HI20 a
 # LE64-NEXT:R_LARCH_RELAX *ABS*
-# LE64-NEXT:  nop
 # LE64-NEXT:R_LARCH_TLS_DESC_PC_LO12 a
 # LE64-NEXT:R_LARCH_RELAX *ABS*
-# LE64-NEXT:  nop
 # LE64-NEXT:R_LARCH_TLS_DESC_LD a
 # LE64-NEXT:R_LARCH_RELAX *ABS*
-# LE64-NEXT:  ori $a0, $zero, 8
 # LE64-NEXT:R_LARCH_TLS_DESC_CALL a
 # LE64-NEXT:R_LARCH_RELAX *ABS*
 # LE64

[llvm-branch-commits] [lld] [lld][LoongArch] Support relaxation during TLSDESC GD/LD to IE/LE conversion. (PR #123730)

2025-01-21 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-lld-elf

@llvm/pr-subscribers-lld

Author: Zhaoxin Yang (ylzsx)


Changes

Complement https://github.com/llvm/llvm-project/pull/123715. When relaxation 
enable, remove redundant NOPs.

---
Full diff: https://github.com/llvm/llvm-project/pull/123730.diff


2 Files Affected:

- (modified) lld/ELF/Arch/LoongArch.cpp (+29-3) 
- (modified) lld/test/ELF/loongarch-relax-tlsdesc.s (+16-29) 


``diff
diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp
index 37614c3e9615d6..5f49b23e8ffb1a 100644
--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -965,10 +965,16 @@ static bool relax(Ctx &ctx, InputSection &sec) {
 case R_LARCH_GOT_PC_HI20:
 case R_LARCH_TLS_GD_PC_HI20:
 case R_LARCH_TLS_LD_PC_HI20:
-case R_LARCH_TLS_DESC_PC_HI20:
   // The overflow check for i+2 will be carried out in isPairRelaxable.
-  if (r.expr != RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC &&
-  r.expr != R_RELAX_TLS_GD_TO_LE && isPairRelaxable(relocs, i))
+  if (isPairRelaxable(relocs, i))
+relaxPCHi20Lo12(ctx, sec, i, loc, r, relocs[i + 2], remove);
+  break;
+case R_LARCH_TLS_DESC_PC_HI20:
+  if (r.expr == RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC ||
+  r.expr == R_RELAX_TLS_GD_TO_LE) {
+if (relaxable(relocs, i))
+  remove = 4;
+  } else if (isPairRelaxable(relocs, i))
 relaxPCHi20Lo12(ctx, sec, i, loc, r, relocs[i + 2], remove);
   break;
 case R_LARCH_CALL36:
@@ -986,6 +992,17 @@ static bool relax(Ctx &ctx, InputSection &sec) {
   isUInt<12>(r.sym->getVA(ctx, r.addend)))
 remove = 4;
   break;
+case R_LARCH_TLS_DESC_PC_LO12:
+  if (relaxable(relocs, i) &&
+  (r.expr == RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC ||
+   r.expr == R_RELAX_TLS_GD_TO_LE))
+remove = 4;
+  break;
+case R_LARCH_TLS_DESC_LD:
+  if (relaxable(relocs, i) && r.expr == R_RELAX_TLS_GD_TO_LE &&
+  isUInt<12>(r.sym->getVA(ctx, r.addend)))
+remove = 4;
+  break;
 }
 
 // For all anchors whose offsets are <= r.offset, they are preceded by
@@ -1214,6 +1231,10 @@ void LoongArch::relocateAlloc(InputSectionBase &sec, 
uint8_t *buf) const {
bits);
 relocateNoSym(loc, rel.type, val);
   } else {
+isRelax = relaxable(relocs, i);
+if (isRelax && (rel.type == R_LARCH_TLS_DESC_PC_HI20 ||
+rel.type == R_LARCH_TLS_DESC_PC_LO12))
+  continue;
 tlsdescToIe(loc, rel, val);
   }
   continue;
@@ -1230,6 +1251,11 @@ void LoongArch::relocateAlloc(InputSectionBase &sec, 
uint8_t *buf) const {
bits);
 relocateNoSym(loc, rel.type, val);
   } else {
+isRelax = relaxable(relocs, i);
+if (isRelax && (rel.type == R_LARCH_TLS_DESC_PC_HI20 ||
+rel.type == R_LARCH_TLS_DESC_PC_LO12 ||
+(rel.type == R_LARCH_TLS_DESC_LD && isUInt<12>(val
+  continue;
 tlsdescToLe(loc, rel, val);
   }
   continue;
diff --git a/lld/test/ELF/loongarch-relax-tlsdesc.s 
b/lld/test/ELF/loongarch-relax-tlsdesc.s
index 5e538985d1402c..666ca6bd1e7243 100644
--- a/lld/test/ELF/loongarch-relax-tlsdesc.s
+++ b/lld/test/ELF/loongarch-relax-tlsdesc.s
@@ -9,7 +9,6 @@
 # RUN: llvm-readobj -r -x .got a.64.so | FileCheck --check-prefix=GD64-RELA %s
 # RUN: llvm-objdump --no-show-raw-insn -dr -h a.64.so | FileCheck %s 
--check-prefix=GD64
 
-## FIXME: IE/LE relaxation have not yet been implemented, --relax/--no-relax 
obtain the same results.
 ## Transition from TLSDESC to IE/LE. Also check --emit-relocs.
 # RUN: ld.lld --relax -e 0 -z now --emit-relocs a.64.o c.64.o -o a.64.le
 # RUN: llvm-readobj -r -x .got a.64.le 2>&1 | FileCheck 
--check-prefix=LE64-RELA %s
@@ -73,25 +72,21 @@
 # LE64-RELA: could not find section '.got'
 
 ## a@tprel = 0x8
-# LE64:20158: nop
+# LE64:20158: ori $a0, $zero, 8
 # LE64-NEXT:R_LARCH_TLS_DESC_PC_HI20 a
 # LE64-NEXT:R_LARCH_RELAX *ABS*
-# LE64-NEXT:  nop
 # LE64-NEXT:R_LARCH_TLS_DESC_PC_LO12 a
 # LE64-NEXT:R_LARCH_RELAX *ABS*
-# LE64-NEXT:  nop
 # LE64-NEXT:R_LARCH_TLS_DESC_LD a
 # LE64-NEXT:R_LARCH_RELAX *ABS*
-# LE64-NEXT:  ori $a0, $zero, 8
 # LE64-NEXT:R_LARCH_TLS_DESC_CALL a
 # LE64-NEXT:R_LARCH_RELAX *ABS*
 # LE64-NEXT:  add.d   $a1, $a0, $tp
 
 ## b@tprel = 0x7ff
-# LE64:2016c: nop
+# LE64:20160: nop
 # LE64-NEXT:R_LARCH_TLS_DESC_PC_HI20 b
 # LE64-NEXT:R_LARCH_RELAX *ABS*
-# LE64-NEXT:  nop
 # LE64-NEXT:R_LARCH_TLS_DESC_PC_LO12 b
 # LE64-NEXT:  nop
 # LE64-NEXT:R_LARCH_TLS_DESC_LD b
@@ -101,7 +96,7 @@
 
 ## c@tprel = 0x800
 ## Without R_LARCH_RELAX relocation. No relaxation.
-# LE64:

[llvm-branch-commits] [llvm] [mlir] [mlir][LLVM] add argument and result attributes to llvm.call (PR #123177)

2025-01-21 Thread via llvm-branch-commits

https://github.com/jeanPerier updated 
https://github.com/llvm/llvm-project/pull/123177

>From 137705661c184ea1530982c19163341933ab421e Mon Sep 17 00:00:00 2001
From: Jean Perier 
Date: Wed, 15 Jan 2025 09:09:53 -0800
Subject: [PATCH 1/4] [mlir][LLVM] add argument and result attributes to
 llvm.call

---
 llvm/include/llvm/IR/InstrTypes.h | 11 +++
 mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td   |  4 +-
 .../include/mlir/Target/LLVMIR/ModuleImport.h |  8 ++-
 .../mlir/Target/LLVMIR/ModuleTranslation.h|  9 ++-
 mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp| 67 +--
 .../LLVMIR/LLVMToLLVMIRTranslation.cpp| 21 ++
 mlir/lib/Target/LLVMIR/ModuleImport.cpp   | 35 ++
 mlir/lib/Target/LLVMIR/ModuleTranslation.cpp  | 52 ++
 mlir/test/Dialect/LLVMIR/invalid.mlir |  2 +
 mlir/test/Dialect/LLVMIR/roundtrip.mlir   | 20 ++
 .../LLVMIR/Import/call-argument-attributes.ll | 25 +++
 .../LLVMIR/call-argument-attributes.mlir  | 17 +
 12 files changed, 230 insertions(+), 41 deletions(-)
 create mode 100644 mlir/test/Target/LLVMIR/Import/call-argument-attributes.ll
 create mode 100644 mlir/test/Target/LLVMIR/call-argument-attributes.mlir

diff --git a/llvm/include/llvm/IR/InstrTypes.h 
b/llvm/include/llvm/IR/InstrTypes.h
index b8d9cc10292f4a..0e391325eebdce 100644
--- a/llvm/include/llvm/IR/InstrTypes.h
+++ b/llvm/include/llvm/IR/InstrTypes.h
@@ -1490,6 +1490,11 @@ class CallBase : public Instruction {
 Attrs = Attrs.addRetAttribute(getContext(), Attr);
   }
 
+  /// Adds attributes to the return value.
+  void addRetAttrs(const AttrBuilder &B) {
+Attrs = Attrs.addRetAttributes(getContext(), B);
+  }
+
   /// Adds the attribute to the indicated argument
   void addParamAttr(unsigned ArgNo, Attribute::AttrKind Kind) {
 assert(ArgNo < arg_size() && "Out of bounds");
@@ -1502,6 +1507,12 @@ class CallBase : public Instruction {
 Attrs = Attrs.addParamAttribute(getContext(), ArgNo, Attr);
   }
 
+  /// Adds attributes to the indicated argument
+  void addParamAttrs(unsigned ArgNo, const AttrBuilder &B) {
+assert(ArgNo < arg_size() && "Out of bounds");
+Attrs = Attrs.addParamAttributes(getContext(), ArgNo, B);
+  }
+
   /// removes the attribute from the list of attributes.
   void removeAttributeAtIndex(unsigned i, Attribute::AttrKind Kind) {
 Attrs = Attrs.removeAttributeAtIndex(getContext(), i, Kind);
diff --git a/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td 
b/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
index b2281536aa40b6..85f5c6cc8cca07 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
@@ -755,7 +755,9 @@ def LLVM_CallOp : LLVM_MemAccessOpBase<"call",
   VariadicOfVariadic:$op_bundle_operands,
   DenseI32ArrayAttr:$op_bundle_sizes,
-  OptionalAttr:$op_bundle_tags);
+  OptionalAttr:$op_bundle_tags,
+  OptionalAttr:$arg_attrs,
+  OptionalAttr:$res_attrs);
   // Append the aliasing related attributes defined in LLVM_MemAccessOpBase.
   let arguments = !con(args, aliasAttrs);
   let results = (outs Optional:$result);
diff --git a/mlir/include/mlir/Target/LLVMIR/ModuleImport.h 
b/mlir/include/mlir/Target/LLVMIR/ModuleImport.h
index 33c9af7c6335a4..86e1d6a04cd096 100644
--- a/mlir/include/mlir/Target/LLVMIR/ModuleImport.h
+++ b/mlir/include/mlir/Target/LLVMIR/ModuleImport.h
@@ -326,14 +326,18 @@ class ModuleImport {
SmallVectorImpl &types,
SmallVectorImpl &operands,
bool allowInlineAsm = false);
-  /// Converts the parameter attributes attached to `func` and adds them to the
-  /// `funcOp`.
+  /// Converts the parameter and result attributes attached to `func` and adds
+  /// them to the `funcOp`.
   void convertParameterAttributes(llvm::Function *func, LLVMFuncOp funcOp,
   OpBuilder &builder);
   /// Converts the AttributeSet of one parameter in LLVM IR to a corresponding
   /// DictionaryAttr for the LLVM dialect.
   DictionaryAttr convertParameterAttribute(llvm::AttributeSet llvmParamAttrs,
OpBuilder &builder);
+  /// Converts the parameter and result attributes attached to `call` and adds
+  /// them to the `callOp`.
+  void convertParameterAttributes(llvm::CallBase *call, CallOpInterface callOp,
+  OpBuilder &builder);
   /// Returns the builtin type equivalent to the given LLVM dialect type or
   /// nullptr if there is no equivalent. The returned type can be used to 
create
   /// an attribute for a GlobalOp or a ConstantOp.
diff --git a/mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h 
b/mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h
index 1b62437761ed9d..88fc17ca4fda24 100644
--- a/mlir/include/mlir/Target/LL

[llvm-branch-commits] [lldb] 7aac81a - Revert "[LLDB] Add draft docstrings for SBSaveCoreOptions (#123132)"

2025-01-21 Thread via llvm-branch-commits

Author: Jacob Lalonde
Date: 2025-01-21T19:54:42-08:00
New Revision: 7aac81a4e30f9e46907a60009ec051473089319e

URL: 
https://github.com/llvm/llvm-project/commit/7aac81a4e30f9e46907a60009ec051473089319e
DIFF: 
https://github.com/llvm/llvm-project/commit/7aac81a4e30f9e46907a60009ec051473089319e.diff

LOG: Revert "[LLDB] Add draft docstrings for SBSaveCoreOptions (#123132)"

This reverts commit d33e33fde770214e134ed58f992a5a95a522f7ff.

Added: 


Modified: 
lldb/bindings/interface/SBSaveCoreOptionsDocstrings.i

Removed: 




diff  --git a/lldb/bindings/interface/SBSaveCoreOptionsDocstrings.i 
b/lldb/bindings/interface/SBSaveCoreOptionsDocstrings.i
index 3e99101203ee9c..e69de29bb2d1d6 100644
--- a/lldb/bindings/interface/SBSaveCoreOptionsDocstrings.i
+++ b/lldb/bindings/interface/SBSaveCoreOptionsDocstrings.i
@@ -1,71 +0,0 @@
-%feature("docstring",
-"A container to specify how to save a core file.
-
-SBSaveCoreOptions includes API's to specify the memory regions and threads to 
include
-when generating a core file. It extends the existing SaveCoreStyle option.
-
-* eSaveCoreFull will save off all thread and memory regions, ignoring the 
memory regions and threads in
-the options object.
-
-* eSaveCoreDirtyOnly pages will capture all threads and all rw- memory 
regions, in addition to the regions specified
-in the options object if they are not already captured.
-
-* eSaveCoreStackOnly will capture all threads, but no memory regions unless 
specified.
-
-* eSaveCoreCustomOnly Custom defers entirely to the SBSaveCoreOptions object 
and will only save what is specified. 
-  Picking custom and specifying nothing will result in an error being returned.
-
-Note that currently ELF Core files are not supported.
-")
-
-%feature("docstring", "
-Set the plugin name to save a Core file with. Only plugins registered with 
Plugin manager will be accepted
-Examples are Minidump and Mach-O."
-) lldb::SBSaveCoreOptions::SetPluginName
-
-%feature("docstring", "
-Get the specified plugin name, or None if the name is not set."
-) lldb::SBSaveCoreOptions::GetPluginName
-
-%feature("docstring", "
-Set the lldb.SaveCoreStyle."
-) lldb::SBSaveCoreOptions::SetStyle
-
-%feature("docstring", "
-Get the specified lldb.SaveCoreStyle, or eSaveCoreUnspecified if not set."
-) lldb::SBSaveCoreOptions::GetStyle
-
-%feature("docstring", "
-Set the file path to save the Core file at."
-) lldb::SBSaveCoreOptions::SetOutputFile
-
-%feature("docstring", "
-Get an SBFileSpec corresponding to the specified output path, or none if 
not set."
-) lldb::SBSaveCoreOptions::GetOutputFile
-
-%feature("docstring", "
-Set the process to save, or unset a process by providing a default 
SBProcess. 
-Resetting will result in the reset of all process specific options, such 
as Threads to save."
-) lldb::SBSaveCoreOptions::SetProcess
-
-%feature("docstring", "
-Add an SBThread to be saved, an error will be returned if an SBThread from 
a 
diff erent process is specified. 
-The process is set either by the first SBThread added to the options 
container, or explicitly by the SetProcess call."
-) lldb::SBSaveCoreOptions::AddThread
-
-%feature("docstring", "
-Remove an SBthread if present in the container, returns true if a matching 
thread was found and removed."
-) lldb::SBSaveCoreOptions::RemoveThread
-
-%feature("docstring", "
-Add a memory region to save, an error will be returned in the region is 
invalid. 
-Ranges that overlap will be unioned into a single region."
-) lldb::SBSaveCoreOptions::AddMemoryRegionToSave
-
-%feature("docstring", "
-Get an SBThreadCollection of all threads marked to be saved. This 
collection is not sorted according to insertion order."
-) lldb::SBSaveCoreOptions::GetThreadsToSave
-
-%feature("docstring", "
-Unset all options."
-) lldb::SBSaveCoreOptions::Clear



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Custom lower 32-bit element shuffles (PR #123711)

2025-01-21 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/123711
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] [lld][LoongArch] Support TLSDESC GD/LD to IE/LE. (PR #123715)

2025-01-21 Thread Zhaoxin Yang via llvm-branch-commits

https://github.com/ylzsx updated 
https://github.com/llvm/llvm-project/pull/123715

>From dff3031fdb2ca3755b73e3b81e56f8008a409470 Mon Sep 17 00:00:00 2001
From: yangzhaoxin 
Date: Fri, 3 Jan 2025 14:29:17 +0800
Subject: [PATCH 1/7] [lld][LoongArch] Implement TLSDESC GD/LD to IE/LE.

Support TLSDESC to initial-exec or local-exec optimizations. Introduce a new 
hook
RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC and use existing 
R_RELAX_TLS_GD_TO_IE_ABS
to support TLSDESC => IE, while use existing R_RELAX_TLS_GD_TO_LE to support
TLSDESC => LE.

In normal or medium code model, there are two forms of code sequences:
 * pcalau12i  $a0, %desc_pc_hi20(sym_desc)
 * addi.d $a0, $a0, %desc_pc_lo12(sym_desc)
 * ld.d   $ra, $a0, %desc_ld(sym_desc)
 * jirl   $ra, $ra, %desc_call(sym_desc)
 --
 * pcaddi $a0, %desc_pcrel_20(sym_desc)
 * ld.d   $ra, $a0, %desc_ld(sym_desc)
 * jirl   $ra, $ra, %desc_call(sym_desc)

The code sequence obtained is as follows:
 * pcalau12i $a0, %ie_pc_hi20(sym_ie)
 * ld.[wd]   $a0, $a0, %ie_pc_lo12(sym_ie)

Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to convert the 
preceding
instructions to NOPs, due to both forms of code sequence (corresponding to 
relocation
combinations: R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and 
R_LARCH_TLS_DESC_PCREL20_S2)
have same process.

FIXME: When relaxation enables, redundant NOPs can be removed. It will
be implemented in a future patch.

Note: All forms of TLSDESC code sequences should not appear interleaved in the 
normal,
medium or extreme code model, which compilers do not generate and lld is 
unsupported.
This is thanks to the guard in PostRASchedulerList.cpp in llvm.
```
Calls are not scheduling boundaries before register allocation, but
post-ra we don't gain anything by scheduling across calls since we
don't need to worry about register pressure.
```
---
 lld/ELF/Arch/LoongArch.cpp | 146 -
 lld/ELF/InputSection.cpp   |   1 +
 lld/ELF/Relocations.cpp|  38 ++
 lld/ELF/Relocations.h  |   1 +
 4 files changed, 169 insertions(+), 17 deletions(-)

diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp
index ef25e741901d93..37614c3e9615d6 100644
--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -39,11 +39,14 @@ class LoongArch final : public TargetInfo {
   void relocate(uint8_t *loc, const Relocation &rel,
 uint64_t val) const override;
   bool relaxOnce(int pass) const override;
+  RelExpr adjustTlsExpr(RelType type, RelExpr expr) const override;
   void relocateAlloc(InputSectionBase &sec, uint8_t *buf) const override;
   void finalizeRelax(int passes) const override;
 
 private:
   void tlsIeToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
+  void tlsdescToIe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
+  void tlsdescToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
 };
 } // end anonymous namespace
 
@@ -61,6 +64,7 @@ enum Op {
   LU12I_W = 0x1400,
   PCADDI = 0x1800,
   PCADDU12I = 0x1c00,
+  PCALAU12I = 0x1a00,
   LD_W = 0x2880,
   LD_D = 0x28c0,
   JIRL = 0x4c00,
@@ -72,6 +76,7 @@ enum Reg {
   R_ZERO = 0,
   R_RA = 1,
   R_TP = 2,
+  R_A0 = 4,
   R_T0 = 12,
   R_T1 = 13,
   R_T2 = 14,
@@ -962,7 +967,8 @@ static bool relax(Ctx &ctx, InputSection &sec) {
 case R_LARCH_TLS_LD_PC_HI20:
 case R_LARCH_TLS_DESC_PC_HI20:
   // The overflow check for i+2 will be carried out in isPairRelaxable.
-  if (isPairRelaxable(relocs, i))
+  if (r.expr != RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC &&
+  r.expr != R_RELAX_TLS_GD_TO_LE && isPairRelaxable(relocs, i))
 relaxPCHi20Lo12(ctx, sec, i, loc, r, relocs[i + 2], remove);
   break;
 case R_LARCH_CALL36:
@@ -1047,6 +1053,103 @@ void LoongArch::tlsIeToLe(uint8_t *loc, const 
Relocation &rel,
   }
 }
 
+// Convert TLSDESC GD/LD to IE.
+// In normal or medium code model, there are two forms of code sequences:
+//  * pcalau12i  $a0, %desc_pc_hi20(sym_desc)
+//  * addi.d $a0, $a0, %desc_pc_lo12(sym_desc)
+//  * ld.d   $ra, $a0, %desc_ld(sym_desc)
+//  * jirl   $ra, $ra, %desc_call(sym_desc)
+//  --
+//  * pcaddi $a0, %desc_pcrel_20(a)
+//  * load $ra, $a0, %desc_ld(a)
+//  * jirl $ra, $ra, %desc_call(a)
+//
+// The code sequence obtained is as follows:
+//  * pcalau12i $a0, %ie_pc_hi20(sym_ie)
+//  * ld.[wd]   $a0, $a0, %ie_pc_lo12(sym_ie)
+//
+// Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to convert 
the
+// preceding instructions to NOPs, due to both forms of code sequence
+// (corresponding to relocation combinations:
+// R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and
+// R_LARCH_TLS_DESC_PCREL20_S2) have same process.
+//
+// When relaxation enables, redundant NOPs can be removed.
+void LoongArch::tlsdescToIe(uint8_t *loc, const Relocation &rel,
+uint64_t val) const {
+  switch (rel.type) {
+

[llvm-branch-commits] [lld] [lld][LoongArch] GOT indirection to PC relative optimization. (PR #123743)

2025-01-21 Thread Fangrui Song via llvm-branch-commits


@@ -1150,6 +1152,58 @@ void LoongArch::tlsdescToLe(uint8_t *loc, const 
Relocation &rel,
   }
 }
 
+// Try GOT indirection to PC relative optimization when relaxation is enabled.
+// From:
+//  * pcalau12i $a0, %got_pc_hi20(sym_got)
+//  * ld.w/d$a0, $a0, %got_pc_lo12(sym_got)
+// To:
+//  * pcalau12i $a0, %pc_hi20(sym)
+//  * addi.w/d  $a0, $a0, %pc_lo12(sym)
+//
+// FIXME: Althouth the optimization has been performed, the GOT entries still
+// exists, similarly to AArch64. Eliminating the entries may be require
+// additional marking in the common code.
+bool LoongArch::tryGotToPCRel(uint8_t *loc, const Relocation &rHi20,
+  const Relocation &rLo12, uint64_t secAddr) const 
{
+  if (!rHi20.sym->isDefined() || rHi20.sym->isPreemptible ||

MaskRay wrote:

We need symbol tests to test each condition here. 
aarch64-adrp-ldr-got-symbols.s has an example

https://github.com/llvm/llvm-project/pull/123743
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] [lld][LoongArch] GOT indirection to PC relative optimization. (PR #123743)

2025-01-21 Thread Fangrui Song via llvm-branch-commits


@@ -1150,6 +1152,58 @@ void LoongArch::tlsdescToLe(uint8_t *loc, const 
Relocation &rel,
   }
 }
 
+// Try GOT indirection to PC relative optimization when relaxation is enabled.
+// From:
+//  * pcalau12i $a0, %got_pc_hi20(sym_got)
+//  * ld.w/d$a0, $a0, %got_pc_lo12(sym_got)
+// To:
+//  * pcalau12i $a0, %pc_hi20(sym)
+//  * addi.w/d  $a0, $a0, %pc_lo12(sym)
+//
+// FIXME: Althouth the optimization has been performed, the GOT entries still

MaskRay wrote:

Delete FIXME. While the GOT entries are not eliminated, this might actually be 
desired to reduce code complexity.

https://github.com/llvm/llvm-project/pull/123743
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


  1   2   >