[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Add ISD::PTRADD DAG combines (PR #142739)

2025-06-05 Thread Fabian Ritter via llvm-branch-commits


@@ -2627,6 +2629,93 @@ SDValue DAGCombiner::foldSubToAvg(SDNode *N, const SDLoc 
&DL) {
   return SDValue();
 }
 
+/// Try to fold a pointer arithmetic node.
+/// This needs to be done separately from normal addition, because pointer
+/// addition is not commutative.
+SDValue DAGCombiner::visitPTRADD(SDNode *N) {
+  SDValue N0 = N->getOperand(0);
+  SDValue N1 = N->getOperand(1);
+  EVT PtrVT = N0.getValueType();
+  EVT IntVT = N1.getValueType();
+  SDLoc DL(N);
+
+  // This is already ensured by an assert in SelectionDAG::getNode(). Several
+  // combines here depend on this assumption.
+  assert(PtrVT == IntVT &&
+ "PTRADD with different operand types is not supported");
+
+  // fold (ptradd undef, y) -> undef
+  if (N0.isUndef())
+return N0;
+
+  // fold (ptradd x, undef) -> undef
+  if (N1.isUndef())
+return DAG.getUNDEF(PtrVT);
+
+  // fold (ptradd x, 0) -> x
+  if (isNullConstant(N1))
+return N0;
+
+  // fold (ptradd 0, x) -> x
+  if (isNullConstant(N0))
+return N1;
+
+  if (N0.getOpcode() == ISD::PTRADD &&
+  !reassociationCanBreakAddressingModePattern(ISD::PTRADD, DL, N, N0, N1)) 
{
+SDValue X = N0.getOperand(0);
+SDValue Y = N0.getOperand(1);
+SDValue Z = N1;
+bool N0OneUse = N0.hasOneUse();
+bool YIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Y);
+bool ZIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Z);
+
+// (ptradd (ptradd x, y), z) -> (ptradd x, (add y, z)) if:
+//   * y is a constant and (ptradd x, y) has one use; or
+//   * y and z are both constants.

ritter-x2a wrote:

So that `y + z` can be folded into a single constant, which might be folded as 
an immediate offset into a memory instruction. `SeparateConstOffsetFromGEP` 
should do that for AMDGPU already in many cases when it's beneficial, but
- I don't think that every backend uses `SeparateConstOffsetFromGEP`, so it can 
be worthwhile to have anyway,
- There are cases where these are introduced after `SeparateConstOffsetFromGEP` 
runs; for example when a wide vector load/store with an offset is legalized to 
several loads/stores with nested offsets, for example in `store_v16i32` in 
`ptradd-sdag-optimizations.ll`; with this reassociation we get the code that we 
would get with the old non-PTRADD code path, and
- while it's probably possible that this could lead to worse code, the 
`reassociationCanBreakAddressingModePattern` check above _should_ avoid these 
(I'm not 100% convinced the logic in there is sound, but that seems like a 
different problem). 

https://github.com/llvm/llvm-project/pull/142739
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Baseline fneg-fabs.bf16.ll tests. NFC. (PR #142910)

2025-06-05 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/142910
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [CI] Use LLVM_ENABLE_RUNTIMES for runtimes builds on Linux (PR #142694)

2025-06-05 Thread Vlad Serebrennikov via llvm-branch-commits

https://github.com/Endilll commented:

> We're using LLVM_ENABLE_RUNTIMES. It uses the just built clang to build the 
> runtimes specified.

That explains it, thank you.
There's still an outstanding question of unrelated changes to libc++ tests that 
are included in this PR.

https://github.com/llvm/llvm-project/pull/142694
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Patterns for <2 x bfloat> fneg (fabs) (PR #142911)

2025-06-05 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/142911
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [CI] Use LLVM_ENABLE_RUNTIMES for runtimes builds on Linux (PR #142694)

2025-06-05 Thread Aiden Grossman via llvm-branch-commits

boomanaiden154 wrote:

> There's still an outstanding question of unrelated changes to libc++ tests 
> that are included in this PR.

I'm still not sure how they're ending up in here. I haven't seen this before 
with `spr`. This will definitely be fixed before I end up landing the patch and 
I'm guessing will be resolved when I change the branch target to `main`.

https://github.com/llvm/llvm-project/pull/142694
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Add ISD::PTRADD DAG combines (PR #142739)

2025-06-05 Thread Fabian Ritter via llvm-branch-commits


@@ -2627,6 +2629,93 @@ SDValue DAGCombiner::foldSubToAvg(SDNode *N, const SDLoc 
&DL) {
   return SDValue();
 }
 
+/// Try to fold a pointer arithmetic node.
+/// This needs to be done separately from normal addition, because pointer
+/// addition is not commutative.
+SDValue DAGCombiner::visitPTRADD(SDNode *N) {
+  SDValue N0 = N->getOperand(0);
+  SDValue N1 = N->getOperand(1);
+  EVT PtrVT = N0.getValueType();
+  EVT IntVT = N1.getValueType();
+  SDLoc DL(N);
+
+  // This is already ensured by an assert in SelectionDAG::getNode(). Several
+  // combines here depend on this assumption.
+  assert(PtrVT == IntVT &&
+ "PTRADD with different operand types is not supported");
+
+  // fold (ptradd undef, y) -> undef
+  if (N0.isUndef())
+return N0;
+
+  // fold (ptradd x, undef) -> undef
+  if (N1.isUndef())
+return DAG.getUNDEF(PtrVT);
+
+  // fold (ptradd x, 0) -> x
+  if (isNullConstant(N1))
+return N0;
+
+  // fold (ptradd 0, x) -> x
+  if (isNullConstant(N0))
+return N1;
+
+  if (N0.getOpcode() == ISD::PTRADD &&
+  !reassociationCanBreakAddressingModePattern(ISD::PTRADD, DL, N, N0, N1)) 
{
+SDValue X = N0.getOperand(0);
+SDValue Y = N0.getOperand(1);
+SDValue Z = N1;
+bool N0OneUse = N0.hasOneUse();
+bool YIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Y);
+bool ZIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Z);
+
+// (ptradd (ptradd x, y), z) -> (ptradd x, (add y, z)) if:
+//   * y is a constant and (ptradd x, y) has one use; or
+//   * y and z are both constants.
+if ((YIsConstant && N0OneUse) || (YIsConstant && ZIsConstant)) {
+  SDNodeFlags Flags;
+  // If both additions in the original were NUW, the new ones are as well.
+  if (N->getFlags().hasNoUnsignedWrap() &&
+  N0->getFlags().hasNoUnsignedWrap())
+Flags |= SDNodeFlags::NoUnsignedWrap;
+  SDValue Add = DAG.getNode(ISD::ADD, DL, IntVT, {Y, Z}, Flags);
+  AddToWorklist(Add.getNode());
+  return DAG.getMemBasePlusOffset(X, Add, DL, Flags);
+}
+
+// TODO: There is another possible fold here that was proven useful.
+// It would be this:
+//
+// (ptradd (ptradd x, y), z) -> (ptradd (ptradd x, z), y) if:
+//   * (ptradd x, y) has one use; and
+//   * y is a constant; and
+//   * z is not a constant.
+//
+// In some cases, specifically in AArch64's FEAT_CPA, it exposes the
+// opportunity to select more complex instructions such as SUBPT and
+// MSUBPT. However, a hypothetical corner case has been found that we could
+// not avoid. Consider this (pseudo-POSIX C):
+//
+// char *foo(char *x, int z) {return (x + LARGE_CONSTANT) + z;}
+// char *p = mmap(LARGE_CONSTANT);
+// char *q = foo(p, -LARGE_CONSTANT);
+//
+// Then x + LARGE_CONSTANT is one-past-the-end, so valid, and a
+// further + z takes it back to the start of the mapping, so valid,
+// regardless of the address mmap gave back. However, if mmap gives you an
+// address < LARGE_CONSTANT (ignoring high bits), x - LARGE_CONSTANT will
+// borrow from the high bits (with the subsequent + z carrying back into
+// the high bits to give you a well-defined pointer) and thus trip
+// FEAT_CPA's pointer corruption checks.
+//
+// We leave this fold as an opportunity for future work, addressing the
+// corner case for FEAT_CPA, as well as reconciling the solution with the
+// more general application of pointer arithmetic in other future targets.

ritter-x2a wrote:

My vague idea of handling this properly in the future would be to
- have an `inbounds` flag on `PTRADD` nodes (see #131862),
- have backends generate instructions that break for out-of-bounds arithmetic 
only when the `inbounds` flag is present, and
- give targets an option to request that transformations that would generate 
`PTRADD`s without `inbounds` flag are not applied.

I think that would give things like the CPA implementation a better semantic 
footing, since otherwise they would just be miscompiling the IR's 
`getelementptr`s without `inbounds` flags. However, at least the last point 
above is currently not on my critical path, so I'm open to adding the comment 
here or moving the other transform here.

https://github.com/llvm/llvm-project/pull/142739
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)

2025-06-05 Thread Pierre van Houtryve via llvm-branch-commits


@@ -137,7 +138,123 @@ class AMDGPURegBankLegalizeCombiner {
 return {MatchMI, MatchMI->getOperand(1).getReg()};
   }
 
+  std::tuple tryMatchRALFromUnmerge(Register Src) {
+auto *ReadAnyLane = MRI.getVRegDef(Src);
+if (ReadAnyLane->getOpcode() == AMDGPU::G_AMDGPU_READANYLANE) {
+  Register RALSrc = ReadAnyLane->getOperand(1).getReg();
+  auto *UnMerge = getOpcodeDef(RALSrc, MRI);
+  if (UnMerge)
+return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)};
+}
+return {nullptr, -1};
+  }
+
+  Register getReadAnyLaneSrc(Register Src) {
+// Src = G_AMDGPU_READANYLANE RALSrc
+auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE);
+if (RAL)
+  return RALSrc;
+
+// LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc
+// LoSgpr = G_AMDGPU_READANYLANE LoVgpr
+// HiSgpr = G_AMDGPU_READANYLANE HiVgpr
+// Src G_MERGE_VALUES LoSgpr, HiSgpr
+auto *Merge = getOpcodeDef(Src, MRI);
+if (Merge) {
+  unsigned NumElts = Merge->getNumSources();
+  auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0));
+  if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0)
+return {};
+
+  // check if all elements are from same unmerge and there is no shuffling
+  for (unsigned i = 1; i < NumElts; ++i) {
+auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i));
+if (UnmergeI != Unmerge || (unsigned)IdxI != i)
+  return {};
+  }
+  return Unmerge->getSourceReg();
+}
+
+// ..., VgprI, ... = G_UNMERGE_VALUES VgprLarge
+// SgprI = G_AMDGPU_READANYLANE VgprI
+// SgprLarge G_MERGE_VALUES ..., SgprI, ...
+// ..., Src, ... = G_UNMERGE_VALUES SgprLarge
+auto *UnMerge = getOpcodeDef(Src, MRI);
+if (UnMerge) {
+  int Idx = UnMerge->findRegisterDefOperandIdx(Src, nullptr);
+  auto *Merge = getOpcodeDef(UnMerge->getSourceReg(), 
MRI);
+  if (Merge) {
+auto [RAL, RALSrc] =
+tryMatch(Merge->getSourceReg(Idx), AMDGPU::G_AMDGPU_READANYLANE);
+if (RAL)
+  return RALSrc;
+  }
+}
+
+return {};
+  }
+
+  bool tryEliminateReadAnyLane(MachineInstr &Copy) {
+Register Dst = Copy.getOperand(0).getReg();
+Register Src = Copy.getOperand(1).getReg();
+if (!Src.isVirtual())
+  return false;
+
+Register RALDst = Src;
+MachineInstr &SrcMI = *MRI.getVRegDef(Src);
+if (SrcMI.getOpcode() == AMDGPU::G_BITCAST) {
+  RALDst = SrcMI.getOperand(1).getReg();
+}
+
+Register RALSrc = getReadAnyLaneSrc(RALDst);
+if (!RALSrc)
+  return false;
+
+if (Dst.isVirtual()) {
+  if (SrcMI.getOpcode() != AMDGPU::G_BITCAST) {
+// Src = READANYLANE RALSrc
+// Dst = Copy Src
+// ->
+// Dst = RALSrc
+MRI.replaceRegWith(Dst, RALSrc);
+  } else {
+// RALDst = READANYLANE RALSrc
+// Src = G_BITCAST RALDst
+// Dst = Copy Src
+// ->
+// NewVgpr = G_BITCAST RALDst
+// Dst = NewVgpr
+auto Bitcast = B.buildBitcast({VgprRB, MRI.getType(Src)}, RALSrc);

Pierre-vh wrote:

Does this work as intended without the `B.setInstr(Copy)` call?

https://github.com/llvm/llvm-project/pull/142789
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-05 Thread Momchil Velikov via llvm-branch-commits

https://github.com/momchil-velikov updated 
https://github.com/llvm/llvm-project/pull/142422

>From 2eb6c95955dc22b6b59eb4e5ba269e4744bbdd2a Mon Sep 17 00:00:00 2001
From: Momchil Velikov 
Date: Mon, 2 Jun 2025 15:13:13 +
Subject: [PATCH 1/3] [MLIR] Fix incorrect slice contiguity inference in
 `vector::isContiguousSlice`

Previously, slices were sometimes marked as non-contiguous when
they were actually contiguous. This occurred when the vector type had
leading unit dimensions, e.g., `vector<1x1x...x1xd0xd1x...xdn-1xT>``.
In such cases, only the trailing n dimensions of the memref need to be
contiguous, not the entire vector rank.

This affects how `FlattenContiguousRowMajorTransfer{Read,Write}Pattern`
flattens `transfer_read` and `transfer_write`` ops. The pattern used
to collapse a number of dimensions equal the vector rank, which
may be is incorrect when leading dimensions are unit-sized.

This patch fixes the issue by collapsing only as many trailing memref
dimensions as are actually contiguous.
---
 .../mlir/Dialect/Vector/Utils/VectorUtils.h   |  54 -
 .../Transforms/VectorTransferOpTransforms.cpp |   8 +-
 mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp |  25 ++--
 .../Vector/vector-transfer-flatten.mlir   | 108 +-
 4 files changed, 120 insertions(+), 75 deletions(-)

diff --git a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h 
b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
index 6609b28d77b6c..ed06d7a029494 100644
--- a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
+++ b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
@@ -49,35 +49,37 @@ FailureOr> 
isTranspose2DSlice(vector::TransposeOp op);
 
 /// Return true if `vectorType` is a contiguous slice of `memrefType`.
 ///
-/// Only the N = vectorType.getRank() trailing dims of `memrefType` are
-/// checked (the other dims are not relevant). Note that for `vectorType` to be
-/// a contiguous slice of `memrefType`, the trailing dims of the latter have
-/// to be contiguous - this is checked by looking at the corresponding strides.
+/// The leading unit dimensions of the vector type are ignored as they
+/// are not relevant to the result. Let N be the number of the vector
+/// dimensions after ignoring a leading sequence of unit ones.
 ///
-/// There might be some restriction on the leading dim of `VectorType`:
+/// For `vectorType` to be a contiguous slice of `memrefType`
+///   a) the N trailing dimensions of the latter must be contiguous, and
+///   b) the trailing N dimensions of `vectorType` and `memrefType`,
+///  except the first of them, must match.
 ///
-/// Case 1. If all the trailing dims of `vectorType` match the trailing dims
-/// of `memrefType` then the leading dim of `vectorType` can be
-/// arbitrary.
-///
-///Ex. 1.1 contiguous slice, perfect match
-///  vector<4x3x2xi32> from memref<5x4x3x2xi32>
-///Ex. 1.2 contiguous slice, the leading dim does not match (2 != 4)
-///  vector<2x3x2xi32> from memref<5x4x3x2xi32>
-///
-/// Case 2. If an "internal" dim of `vectorType` does not match the
-/// corresponding trailing dim in `memrefType` then the remaining
-/// leading dims of `vectorType` have to be 1 (the first non-matching
-/// dim can be arbitrary).
+/// Examples:
 ///
-///Ex. 2.1 non-contiguous slice, 2 != 3 and the leading dim != <1>
-///  vector<2x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.2  contiguous slice, 2 != 3 and the leading dim == <1>
-///  vector<1x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.3. contiguous slice, 2 != 3 and the leading dims == <1x1>
-///  vector<1x1x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.4. non-contiguous slice, 2 != 3 and the leading dims != <1x1>
-/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>)
+///   Ex.1 contiguous slice, perfect match
+/// vector<4x3x2xi32> from memref<5x4x3x2xi32>
+///   Ex.2 contiguous slice, the leading dim does not match (2 != 4)
+/// vector<2x3x2xi32> from memref<5x4x3x2xi32>
+///   Ex.3 non-contiguous slice, 2 != 3
+/// vector<2x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.4 contiguous slice, leading unit dimension of the vector ignored,
+///2 != 3 (allowed)
+/// vector<1x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.5. contiguous slice, leasing two unit dims of the vector ignored,
+/// 2 != 3 (allowed)
+/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.6. non-contiguous slice, 2 != 3, no leading sequence of unit dims
+/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>)
+///   Ex.7 contiguous slice, memref needs to be contiguous only on the last
+///dimension
+/// vector<1x1x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>>
+///   Ex.8 non-contiguous slice, memref needs to be contiguous one the last
+///two dimensions, and it isn't
+/// vector<1x2x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>>
 bool isContiguo

[llvm-branch-commits] [CI] Use LLVM_ENABLE_RUNTIMES for runtimes builds on Linux (PR #142694)

2025-06-05 Thread Vlad Serebrennikov via llvm-branch-commits

https://github.com/Endilll approved this pull request.


https://github.com/llvm/llvm-project/pull/142694
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)

2025-06-05 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/142789

>From 64d7853a9edefabe8de40748e01348d2d5c017c5 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 5 Jun 2025 12:17:13 +0200
Subject: [PATCH] AMDGPU/GlobalISel: Improve readanylane combines in
 regbanklegalize

---
 .../Target/AMDGPU/AMDGPURegBankLegalize.cpp   | 122 +++---
 .../AMDGPU/GlobalISel/readanylane-combines.ll |  25 +---
 .../GlobalISel/readanylane-combines.mir   |  78 +++
 3 files changed, 125 insertions(+), 100 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
index ba661348ca5b5..6707b641b0d25 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
@@ -23,6 +23,7 @@
 #include "GCNSubtarget.h"
 #include "llvm/CodeGen/GlobalISel/CSEInfo.h"
 #include "llvm/CodeGen/GlobalISel/CSEMIRBuilder.h"
+#include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineUniformityAnalysis.h"
 #include "llvm/CodeGen/TargetPassConfig.h"
@@ -137,7 +138,109 @@ class AMDGPURegBankLegalizeCombiner {
 return {MatchMI, MatchMI->getOperand(1).getReg()};
   }
 
+  std::pair tryMatchRALFromUnmerge(Register Src) {
+MachineInstr *ReadAnyLane = MRI.getVRegDef(Src);
+if (ReadAnyLane->getOpcode() == AMDGPU::G_AMDGPU_READANYLANE) {
+  Register RALSrc = ReadAnyLane->getOperand(1).getReg();
+  if (auto *UnMerge = getOpcodeDef(RALSrc, MRI))
+return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)};
+}
+return {nullptr, -1};
+  }
+
+  Register getReadAnyLaneSrc(Register Src) {
+// Src = G_AMDGPU_READANYLANE RALSrc
+auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE);
+if (RAL)
+  return RALSrc;
+
+// LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc
+// LoSgpr = G_AMDGPU_READANYLANE LoVgpr
+// HiSgpr = G_AMDGPU_READANYLANE HiVgpr
+// Src G_MERGE_VALUES LoSgpr, HiSgpr
+auto *Merge = getOpcodeDef(Src, MRI);
+if (Merge) {
+  unsigned NumElts = Merge->getNumSources();
+  auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0));
+  if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0)
+return {};
+
+  // check if all elements are from same unmerge and there is no shuffling
+  for (unsigned i = 1; i < NumElts; ++i) {
+auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i));
+if (UnmergeI != Unmerge || (unsigned)IdxI != i)
+  return {};
+  }
+  return Unmerge->getSourceReg();
+}
+
+// ..., VgprI, ... = G_UNMERGE_VALUES VgprLarge
+// SgprI = G_AMDGPU_READANYLANE VgprI
+// SgprLarge G_MERGE_VALUES ..., SgprI, ...
+// ..., Src, ... = G_UNMERGE_VALUES SgprLarge
+auto *UnMerge = getOpcodeDef(Src, MRI);
+if (UnMerge) {
+  int Idx = UnMerge->findRegisterDefOperandIdx(Src, nullptr);
+  auto *Merge = getOpcodeDef(UnMerge->getSourceReg(), 
MRI);
+  if (Merge) {
+auto [RAL, RALSrc] =
+tryMatch(Merge->getSourceReg(Idx), AMDGPU::G_AMDGPU_READANYLANE);
+if (RAL)
+  return RALSrc;
+  }
+}
+
+return {};
+  }
+
+  void replaceRegWithOrBuildCopy(Register Dst, Register Src) {
+if (Dst.isVirtual())
+  MRI.replaceRegWith(Dst, Src);
+else
+  B.buildCopy(Dst, Src);
+  }
+
+  bool tryEliminateReadAnyLane(MachineInstr &Copy) {
+Register Dst = Copy.getOperand(0).getReg();
+Register Src = Copy.getOperand(1).getReg();
+if (!Src.isVirtual())
+  return false;
+
+Register RALDst = Src;
+MachineInstr &SrcMI = *MRI.getVRegDef(Src);
+if (SrcMI.getOpcode() == AMDGPU::G_BITCAST)
+  RALDst = SrcMI.getOperand(1).getReg();
+
+Register RALSrc = getReadAnyLaneSrc(RALDst);
+if (!RALSrc)
+  return false;
+
+B.setInstr(Copy);
+if (SrcMI.getOpcode() != AMDGPU::G_BITCAST) {
+  // Src = READANYLANE RALSrc Src = READANYLANE RALSrc
+  // Dst = Copy Src   $Dst = Copy Src
+  // ->   ->
+  // Dst = RALSrc $Dst = Copy RALSrc
+  replaceRegWithOrBuildCopy(Dst, RALSrc);
+} else {
+  // RALDst = READANYLANE RALSrc  RALDst = READANYLANE RALSrc
+  // Src = G_BITCAST RALDst   Src = G_BITCAST RALDst
+  // Dst = Copy Src   Dst = Copy Src
+  // ->  ->
+  // NewVgpr = G_BITCAST RALDst   NewVgpr = G_BITCAST RALDst
+  // Dst = NewVgpr$Dst = Copy NewVgpr
+  auto Bitcast = B.buildBitcast({VgprRB, MRI.getType(Src)}, RALSrc);
+  replaceRegWithOrBuildCopy(Dst, Bitcast.getReg(0));
+}
+
+eraseInstr(Copy, MRI, nullptr);
+return true;
+  }
+
   void tryCombineCopy(MachineInstr &MI) {
+if (tryEliminateReadAnyLane(MI))
+  return;
+
 Register Dst = MI.get

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)

2025-06-05 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/142789

>From 64d7853a9edefabe8de40748e01348d2d5c017c5 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 5 Jun 2025 12:17:13 +0200
Subject: [PATCH] AMDGPU/GlobalISel: Improve readanylane combines in
 regbanklegalize

---
 .../Target/AMDGPU/AMDGPURegBankLegalize.cpp   | 122 +++---
 .../AMDGPU/GlobalISel/readanylane-combines.ll |  25 +---
 .../GlobalISel/readanylane-combines.mir   |  78 +++
 3 files changed, 125 insertions(+), 100 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
index ba661348ca5b5..6707b641b0d25 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
@@ -23,6 +23,7 @@
 #include "GCNSubtarget.h"
 #include "llvm/CodeGen/GlobalISel/CSEInfo.h"
 #include "llvm/CodeGen/GlobalISel/CSEMIRBuilder.h"
+#include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineUniformityAnalysis.h"
 #include "llvm/CodeGen/TargetPassConfig.h"
@@ -137,7 +138,109 @@ class AMDGPURegBankLegalizeCombiner {
 return {MatchMI, MatchMI->getOperand(1).getReg()};
   }
 
+  std::pair tryMatchRALFromUnmerge(Register Src) {
+MachineInstr *ReadAnyLane = MRI.getVRegDef(Src);
+if (ReadAnyLane->getOpcode() == AMDGPU::G_AMDGPU_READANYLANE) {
+  Register RALSrc = ReadAnyLane->getOperand(1).getReg();
+  if (auto *UnMerge = getOpcodeDef(RALSrc, MRI))
+return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)};
+}
+return {nullptr, -1};
+  }
+
+  Register getReadAnyLaneSrc(Register Src) {
+// Src = G_AMDGPU_READANYLANE RALSrc
+auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE);
+if (RAL)
+  return RALSrc;
+
+// LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc
+// LoSgpr = G_AMDGPU_READANYLANE LoVgpr
+// HiSgpr = G_AMDGPU_READANYLANE HiVgpr
+// Src G_MERGE_VALUES LoSgpr, HiSgpr
+auto *Merge = getOpcodeDef(Src, MRI);
+if (Merge) {
+  unsigned NumElts = Merge->getNumSources();
+  auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0));
+  if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0)
+return {};
+
+  // check if all elements are from same unmerge and there is no shuffling
+  for (unsigned i = 1; i < NumElts; ++i) {
+auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i));
+if (UnmergeI != Unmerge || (unsigned)IdxI != i)
+  return {};
+  }
+  return Unmerge->getSourceReg();
+}
+
+// ..., VgprI, ... = G_UNMERGE_VALUES VgprLarge
+// SgprI = G_AMDGPU_READANYLANE VgprI
+// SgprLarge G_MERGE_VALUES ..., SgprI, ...
+// ..., Src, ... = G_UNMERGE_VALUES SgprLarge
+auto *UnMerge = getOpcodeDef(Src, MRI);
+if (UnMerge) {
+  int Idx = UnMerge->findRegisterDefOperandIdx(Src, nullptr);
+  auto *Merge = getOpcodeDef(UnMerge->getSourceReg(), 
MRI);
+  if (Merge) {
+auto [RAL, RALSrc] =
+tryMatch(Merge->getSourceReg(Idx), AMDGPU::G_AMDGPU_READANYLANE);
+if (RAL)
+  return RALSrc;
+  }
+}
+
+return {};
+  }
+
+  void replaceRegWithOrBuildCopy(Register Dst, Register Src) {
+if (Dst.isVirtual())
+  MRI.replaceRegWith(Dst, Src);
+else
+  B.buildCopy(Dst, Src);
+  }
+
+  bool tryEliminateReadAnyLane(MachineInstr &Copy) {
+Register Dst = Copy.getOperand(0).getReg();
+Register Src = Copy.getOperand(1).getReg();
+if (!Src.isVirtual())
+  return false;
+
+Register RALDst = Src;
+MachineInstr &SrcMI = *MRI.getVRegDef(Src);
+if (SrcMI.getOpcode() == AMDGPU::G_BITCAST)
+  RALDst = SrcMI.getOperand(1).getReg();
+
+Register RALSrc = getReadAnyLaneSrc(RALDst);
+if (!RALSrc)
+  return false;
+
+B.setInstr(Copy);
+if (SrcMI.getOpcode() != AMDGPU::G_BITCAST) {
+  // Src = READANYLANE RALSrc Src = READANYLANE RALSrc
+  // Dst = Copy Src   $Dst = Copy Src
+  // ->   ->
+  // Dst = RALSrc $Dst = Copy RALSrc
+  replaceRegWithOrBuildCopy(Dst, RALSrc);
+} else {
+  // RALDst = READANYLANE RALSrc  RALDst = READANYLANE RALSrc
+  // Src = G_BITCAST RALDst   Src = G_BITCAST RALDst
+  // Dst = Copy Src   Dst = Copy Src
+  // ->  ->
+  // NewVgpr = G_BITCAST RALDst   NewVgpr = G_BITCAST RALDst
+  // Dst = NewVgpr$Dst = Copy NewVgpr
+  auto Bitcast = B.buildBitcast({VgprRB, MRI.getType(Src)}, RALSrc);
+  replaceRegWithOrBuildCopy(Dst, Bitcast.getReg(0));
+}
+
+eraseInstr(Copy, MRI, nullptr);
+return true;
+  }
+
   void tryCombineCopy(MachineInstr &MI) {
+if (tryEliminateReadAnyLane(MI))
+  return;
+
 Register Dst = MI.get

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)

2025-06-05 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/142790

>From ae9621601118004cc6b363be7fad70092e401cad Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 5 Jun 2025 12:43:04 +0200
Subject: [PATCH] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize

Add rules for G_AMDGPU_BUFFER_LOAD and implement waterfall lowering
for divergent operands that must be sgpr.
---
 .../Target/AMDGPU/AMDGPUGlobalISelUtils.cpp   |  53 +++-
 .../lib/Target/AMDGPU/AMDGPUGlobalISelUtils.h |   2 +
 .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 239 +-
 .../AMDGPU/AMDGPURegBankLegalizeHelper.h  |   1 +
 .../AMDGPU/AMDGPURegBankLegalizeRules.cpp |  28 +-
 .../AMDGPU/AMDGPURegBankLegalizeRules.h   |   6 +-
 .../AMDGPU/GlobalISel/buffer-schedule.ll  |   2 +-
 .../llvm.amdgcn.make.buffer.rsrc.ll   |   2 +-
 .../regbankselect-amdgcn.raw.buffer.load.ll   |  59 ++---
 ...egbankselect-amdgcn.raw.ptr.buffer.load.ll |  59 ++---
 ...regbankselect-amdgcn.struct.buffer.load.ll |  59 ++---
 ...ankselect-amdgcn.struct.ptr.buffer.load.ll |  59 ++---
 .../llvm.amdgcn.buffer.load-last-use.ll   |   2 +-
 .../llvm.amdgcn.raw.atomic.buffer.load.ll |  42 +--
 .../llvm.amdgcn.raw.ptr.atomic.buffer.load.ll |  42 +--
 .../llvm.amdgcn.struct.atomic.buffer.load.ll  |  48 ++--
 ...vm.amdgcn.struct.ptr.atomic.buffer.load.ll |  48 ++--
 .../CodeGen/AMDGPU/swizzle.bit.extract.ll |   4 +-
 18 files changed, 513 insertions(+), 242 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
index 00979f44f9d34..d8be3aee1f410 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
@@ -117,45 +117,72 @@ static LLT getReadAnyLaneSplitTy(LLT Ty) {
   return LLT::scalar(32);
 }
 
-static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc,
- const RegisterBankInfo &RBI);
+using ReadLaneFnTy =
+function_ref;
+
+static Register buildReadLane(MachineIRBuilder &, Register,
+  const RegisterBankInfo &, ReadLaneFnTy);
 
 static void unmergeReadAnyLane(MachineIRBuilder &B,
SmallVectorImpl &SgprDstParts,
LLT UnmergeTy, Register VgprSrc,
-   const RegisterBankInfo &RBI) {
+   const RegisterBankInfo &RBI,
+   ReadLaneFnTy BuildRL) {
   const RegisterBank *VgprRB = &RBI.getRegBank(AMDGPU::VGPRRegBankID);
   auto Unmerge = B.buildUnmerge({VgprRB, UnmergeTy}, VgprSrc);
   for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) {
-SgprDstParts.push_back(buildReadAnyLane(B, Unmerge.getReg(i), RBI));
+SgprDstParts.push_back(buildReadLane(B, Unmerge.getReg(i), RBI, BuildRL));
   }
 }
 
-static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc,
- const RegisterBankInfo &RBI) {
+static Register buildReadLane(MachineIRBuilder &B, Register VgprSrc,
+  const RegisterBankInfo &RBI,
+  ReadLaneFnTy BuildRL) {
   LLT Ty = B.getMRI()->getType(VgprSrc);
   const RegisterBank *SgprRB = &RBI.getRegBank(AMDGPU::SGPRRegBankID);
   if (Ty.getSizeInBits() == 32) {
-return B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {{SgprRB, Ty}}, 
{VgprSrc})
-.getReg(0);
+Register SgprDst = B.getMRI()->createVirtualRegister({SgprRB, Ty});
+return BuildRL(B, SgprDst, VgprSrc).getReg(0);
   }
 
   SmallVector SgprDstParts;
-  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI);
+  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI,
+ BuildRL);
 
   return B.buildMergeLikeInstr({SgprRB, Ty}, SgprDstParts).getReg(0);
 }
 
-void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst,
-  Register VgprSrc, const RegisterBankInfo &RBI) {
+static void buildReadLane(MachineIRBuilder &B, Register SgprDst,
+  Register VgprSrc, const RegisterBankInfo &RBI,
+  ReadLaneFnTy BuildReadLane) {
   LLT Ty = B.getMRI()->getType(VgprSrc);
   if (Ty.getSizeInBits() == 32) {
-B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {SgprDst}, {VgprSrc});
+BuildReadLane(B, SgprDst, VgprSrc);
 return;
   }
 
   SmallVector SgprDstParts;
-  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI);
+  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI,
+ BuildReadLane);
 
   B.buildMergeLikeInstr(SgprDst, SgprDstParts).getReg(0);
 }
+
+void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst,
+  Register VgprSrc, const RegisterBankInfo &RBI) {
+  return buildReadLane(
+  B, SgprDst, VgprSrc, RBI,
+  [](

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)

2025-06-05 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/142790

>From ae9621601118004cc6b363be7fad70092e401cad Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 5 Jun 2025 12:43:04 +0200
Subject: [PATCH] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize

Add rules for G_AMDGPU_BUFFER_LOAD and implement waterfall lowering
for divergent operands that must be sgpr.
---
 .../Target/AMDGPU/AMDGPUGlobalISelUtils.cpp   |  53 +++-
 .../lib/Target/AMDGPU/AMDGPUGlobalISelUtils.h |   2 +
 .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 239 +-
 .../AMDGPU/AMDGPURegBankLegalizeHelper.h  |   1 +
 .../AMDGPU/AMDGPURegBankLegalizeRules.cpp |  28 +-
 .../AMDGPU/AMDGPURegBankLegalizeRules.h   |   6 +-
 .../AMDGPU/GlobalISel/buffer-schedule.ll  |   2 +-
 .../llvm.amdgcn.make.buffer.rsrc.ll   |   2 +-
 .../regbankselect-amdgcn.raw.buffer.load.ll   |  59 ++---
 ...egbankselect-amdgcn.raw.ptr.buffer.load.ll |  59 ++---
 ...regbankselect-amdgcn.struct.buffer.load.ll |  59 ++---
 ...ankselect-amdgcn.struct.ptr.buffer.load.ll |  59 ++---
 .../llvm.amdgcn.buffer.load-last-use.ll   |   2 +-
 .../llvm.amdgcn.raw.atomic.buffer.load.ll |  42 +--
 .../llvm.amdgcn.raw.ptr.atomic.buffer.load.ll |  42 +--
 .../llvm.amdgcn.struct.atomic.buffer.load.ll  |  48 ++--
 ...vm.amdgcn.struct.ptr.atomic.buffer.load.ll |  48 ++--
 .../CodeGen/AMDGPU/swizzle.bit.extract.ll |   4 +-
 18 files changed, 513 insertions(+), 242 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
index 00979f44f9d34..d8be3aee1f410 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
@@ -117,45 +117,72 @@ static LLT getReadAnyLaneSplitTy(LLT Ty) {
   return LLT::scalar(32);
 }
 
-static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc,
- const RegisterBankInfo &RBI);
+using ReadLaneFnTy =
+function_ref;
+
+static Register buildReadLane(MachineIRBuilder &, Register,
+  const RegisterBankInfo &, ReadLaneFnTy);
 
 static void unmergeReadAnyLane(MachineIRBuilder &B,
SmallVectorImpl &SgprDstParts,
LLT UnmergeTy, Register VgprSrc,
-   const RegisterBankInfo &RBI) {
+   const RegisterBankInfo &RBI,
+   ReadLaneFnTy BuildRL) {
   const RegisterBank *VgprRB = &RBI.getRegBank(AMDGPU::VGPRRegBankID);
   auto Unmerge = B.buildUnmerge({VgprRB, UnmergeTy}, VgprSrc);
   for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) {
-SgprDstParts.push_back(buildReadAnyLane(B, Unmerge.getReg(i), RBI));
+SgprDstParts.push_back(buildReadLane(B, Unmerge.getReg(i), RBI, BuildRL));
   }
 }
 
-static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc,
- const RegisterBankInfo &RBI) {
+static Register buildReadLane(MachineIRBuilder &B, Register VgprSrc,
+  const RegisterBankInfo &RBI,
+  ReadLaneFnTy BuildRL) {
   LLT Ty = B.getMRI()->getType(VgprSrc);
   const RegisterBank *SgprRB = &RBI.getRegBank(AMDGPU::SGPRRegBankID);
   if (Ty.getSizeInBits() == 32) {
-return B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {{SgprRB, Ty}}, 
{VgprSrc})
-.getReg(0);
+Register SgprDst = B.getMRI()->createVirtualRegister({SgprRB, Ty});
+return BuildRL(B, SgprDst, VgprSrc).getReg(0);
   }
 
   SmallVector SgprDstParts;
-  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI);
+  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI,
+ BuildRL);
 
   return B.buildMergeLikeInstr({SgprRB, Ty}, SgprDstParts).getReg(0);
 }
 
-void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst,
-  Register VgprSrc, const RegisterBankInfo &RBI) {
+static void buildReadLane(MachineIRBuilder &B, Register SgprDst,
+  Register VgprSrc, const RegisterBankInfo &RBI,
+  ReadLaneFnTy BuildReadLane) {
   LLT Ty = B.getMRI()->getType(VgprSrc);
   if (Ty.getSizeInBits() == 32) {
-B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {SgprDst}, {VgprSrc});
+BuildReadLane(B, SgprDst, VgprSrc);
 return;
   }
 
   SmallVector SgprDstParts;
-  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI);
+  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI,
+ BuildReadLane);
 
   B.buildMergeLikeInstr(SgprDst, SgprDstParts).getReg(0);
 }
+
+void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst,
+  Register VgprSrc, const RegisterBankInfo &RBI) {
+  return buildReadLane(
+  B, SgprDst, VgprSrc, RBI,
+  [](

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)

2025-06-05 Thread Petar Avramovic via llvm-branch-commits


@@ -137,7 +138,123 @@ class AMDGPURegBankLegalizeCombiner {
 return {MatchMI, MatchMI->getOperand(1).getReg()};
   }
 
+  std::tuple tryMatchRALFromUnmerge(Register Src) {
+auto *ReadAnyLane = MRI.getVRegDef(Src);
+if (ReadAnyLane->getOpcode() == AMDGPU::G_AMDGPU_READANYLANE) {
+  Register RALSrc = ReadAnyLane->getOperand(1).getReg();
+  auto *UnMerge = getOpcodeDef(RALSrc, MRI);
+  if (UnMerge)
+return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)};
+}
+return {nullptr, -1};
+  }
+
+  Register getReadAnyLaneSrc(Register Src) {
+// Src = G_AMDGPU_READANYLANE RALSrc
+auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE);
+if (RAL)
+  return RALSrc;
+
+// LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc
+// LoSgpr = G_AMDGPU_READANYLANE LoVgpr
+// HiSgpr = G_AMDGPU_READANYLANE HiVgpr
+// Src G_MERGE_VALUES LoSgpr, HiSgpr
+auto *Merge = getOpcodeDef(Src, MRI);
+if (Merge) {
+  unsigned NumElts = Merge->getNumSources();
+  auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0));
+  if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0)
+return {};
+
+  // check if all elements are from same unmerge and there is no shuffling
+  for (unsigned i = 1; i < NumElts; ++i) {
+auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i));
+if (UnmergeI != Unmerge || (unsigned)IdxI != i)
+  return {};
+  }
+  return Unmerge->getSourceReg();
+}
+
+// ..., VgprI, ... = G_UNMERGE_VALUES VgprLarge
+// SgprI = G_AMDGPU_READANYLANE VgprI
+// SgprLarge G_MERGE_VALUES ..., SgprI, ...
+// ..., Src, ... = G_UNMERGE_VALUES SgprLarge
+auto *UnMerge = getOpcodeDef(Src, MRI);
+if (UnMerge) {
+  int Idx = UnMerge->findRegisterDefOperandIdx(Src, nullptr);
+  auto *Merge = getOpcodeDef(UnMerge->getSourceReg(), 
MRI);
+  if (Merge) {
+auto [RAL, RALSrc] =
+tryMatch(Merge->getSourceReg(Idx), AMDGPU::G_AMDGPU_READANYLANE);
+if (RAL)
+  return RALSrc;
+  }
+}
+
+return {};
+  }
+
+  bool tryEliminateReadAnyLane(MachineInstr &Copy) {
+Register Dst = Copy.getOperand(0).getReg();
+Register Src = Copy.getOperand(1).getReg();
+if (!Src.isVirtual())
+  return false;
+
+Register RALDst = Src;
+MachineInstr &SrcMI = *MRI.getVRegDef(Src);
+if (SrcMI.getOpcode() == AMDGPU::G_BITCAST) {
+  RALDst = SrcMI.getOperand(1).getReg();
+}
+
+Register RALSrc = getReadAnyLaneSrc(RALDst);
+if (!RALSrc)
+  return false;
+
+if (Dst.isVirtual()) {
+  if (SrcMI.getOpcode() != AMDGPU::G_BITCAST) {
+// Src = READANYLANE RALSrc
+// Dst = Copy Src
+// ->
+// Dst = RALSrc
+MRI.replaceRegWith(Dst, RALSrc);
+  } else {
+// RALDst = READANYLANE RALSrc
+// Src = G_BITCAST RALDst
+// Dst = Copy Src
+// ->
+// NewVgpr = G_BITCAST RALDst
+// Dst = NewVgpr
+auto Bitcast = B.buildBitcast({VgprRB, MRI.getType(Src)}, RALSrc);

petar-avramovic wrote:

No, have to set it manually before using the builder, it was a bug.

https://github.com/llvm/llvm-project/pull/142789
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)

2025-06-05 Thread Petar Avramovic via llvm-branch-commits


@@ -137,7 +138,123 @@ class AMDGPURegBankLegalizeCombiner {
 return {MatchMI, MatchMI->getOperand(1).getReg()};
   }
 
+  std::tuple tryMatchRALFromUnmerge(Register Src) {
+auto *ReadAnyLane = MRI.getVRegDef(Src);
+if (ReadAnyLane->getOpcode() == AMDGPU::G_AMDGPU_READANYLANE) {
+  Register RALSrc = ReadAnyLane->getOperand(1).getReg();
+  auto *UnMerge = getOpcodeDef(RALSrc, MRI);
+  if (UnMerge)
+return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)};
+}
+return {nullptr, -1};
+  }
+
+  Register getReadAnyLaneSrc(Register Src) {
+// Src = G_AMDGPU_READANYLANE RALSrc
+auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE);
+if (RAL)
+  return RALSrc;
+
+// LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc
+// LoSgpr = G_AMDGPU_READANYLANE LoVgpr
+// HiSgpr = G_AMDGPU_READANYLANE HiVgpr
+// Src G_MERGE_VALUES LoSgpr, HiSgpr
+auto *Merge = getOpcodeDef(Src, MRI);
+if (Merge) {
+  unsigned NumElts = Merge->getNumSources();
+  auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0));
+  if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0)
+return {};
+
+  // check if all elements are from same unmerge and there is no shuffling
+  for (unsigned i = 1; i < NumElts; ++i) {
+auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i));
+if (UnmergeI != Unmerge || (unsigned)IdxI != i)
+  return {};
+  }
+  return Unmerge->getSourceReg();
+}
+
+// ..., VgprI, ... = G_UNMERGE_VALUES VgprLarge
+// SgprI = G_AMDGPU_READANYLANE VgprI
+// SgprLarge G_MERGE_VALUES ..., SgprI, ...
+// ..., Src, ... = G_UNMERGE_VALUES SgprLarge
+auto *UnMerge = getOpcodeDef(Src, MRI);
+if (UnMerge) {
+  int Idx = UnMerge->findRegisterDefOperandIdx(Src, nullptr);
+  auto *Merge = getOpcodeDef(UnMerge->getSourceReg(), 
MRI);
+  if (Merge) {
+auto [RAL, RALSrc] =
+tryMatch(Merge->getSourceReg(Idx), AMDGPU::G_AMDGPU_READANYLANE);
+if (RAL)
+  return RALSrc;
+  }
+}
+
+return {};
+  }
+
+  bool tryEliminateReadAnyLane(MachineInstr &Copy) {
+Register Dst = Copy.getOperand(0).getReg();
+Register Src = Copy.getOperand(1).getReg();
+if (!Src.isVirtual())
+  return false;
+
+Register RALDst = Src;
+MachineInstr &SrcMI = *MRI.getVRegDef(Src);
+if (SrcMI.getOpcode() == AMDGPU::G_BITCAST) {
+  RALDst = SrcMI.getOperand(1).getReg();
+}

petar-avramovic wrote:

Not sure, did not see any cases yet

https://github.com/llvm/llvm-project/pull/142789
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)

2025-06-05 Thread Petar Avramovic via llvm-branch-commits


@@ -165,6 +165,8 @@ enum RegBankLLTMappingApplyID {
   Sgpr32Trunc,
 
   // Src only modifiers: waterfalls, extends
+  Sgpr32_W,
+  SgprV4S32_W,

petar-avramovic wrote:

Added one above, is it clear now?

https://github.com/llvm/llvm-project/pull/142790
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)

2025-06-05 Thread Petar Avramovic via llvm-branch-commits


@@ -57,6 +57,226 @@ void 
RegBankLegalizeHelper::findRuleAndApplyMapping(MachineInstr &MI) {
   lower(MI, Mapping, WaterfallSgprs);
 }
 
+bool RegBankLegalizeHelper::executeInWaterfallLoop(
+MachineIRBuilder &B, iterator_range Range,
+SmallSet &SGPROperandRegs) {
+  // Track use registers which have already been expanded with a readfirstlane
+  // sequence. This may have multiple uses if moving a sequence.
+  DenseMap WaterfalledRegMap;
+
+  MachineBasicBlock &MBB = B.getMBB();
+  MachineFunction &MF = B.getMF();
+
+  const SIRegisterInfo *TRI = ST.getRegisterInfo();
+  const TargetRegisterClass *WaveRC = TRI->getWaveMaskRegClass();
+  unsigned MovExecOpc, MovExecTermOpc, XorTermOpc, AndSaveExecOpc, ExecReg;
+  if (ST.isWave32()) {

petar-avramovic wrote:

it is instantiated per ST, MRI pair, not per function

https://github.com/llvm/llvm-project/pull/142790
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CI] Migrate to runtimes build (PR #142696)

2025-06-05 Thread Vlad Serebrennikov via llvm-branch-commits


@@ -49,8 +49,7 @@
 },
 "lld": {"bolt", "cross-project-tests"},
 # TODO(issues/132795): LLDB should be enabled on clang changes.
-"clang": {"clang-tools-extra", "compiler-rt", "cross-project-tests"},
-"clang-tools-extra": {"libc"},

Endilll wrote:

I see that `clang-tools-extra` used to depend on `libc`, but I can't find it 
anywhere now

https://github.com/llvm/llvm-project/pull/142696
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)

2025-06-05 Thread Nikita Popov via llvm-branch-commits




nikic wrote:

The way FileCheck works this will pass even if the metadata is not dropped. You 
could try whether `FileCheck --match-full-lines` works. Otherwise you could use 
explicit `CHECK-NOT` or `{{$}}`.

https://github.com/llvm/llvm-project/pull/87573
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Move soft float predicate management into RuntimeLibcalls (PR #142905)

2025-06-05 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/142905

>From a3cb3a4361182158b16e85952309c2ebbe9dfb32 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 5 Jun 2025 14:22:55 +0900
Subject: [PATCH] DAG: Move soft float predicate management into
 RuntimeLibcalls

Work towards making RuntimeLibcalls the centralized location for
all libcall information. This requires changing the encoding from
tracking the ISD::CondCode to using CmpInst::Predicate.
---
 llvm/include/llvm/CodeGen/TargetLowering.h|  14 +-
 llvm/include/llvm/IR/RuntimeLibcalls.h|  25 +++
 .../CodeGen/SelectionDAG/TargetLowering.cpp   |   5 +-
 llvm/lib/IR/RuntimeLibcalls.cpp   |  36 
 llvm/lib/Target/ARM/ARMISelLowering.cpp   | 178 +-
 llvm/lib/Target/MSP430/MSP430ISelLowering.cpp | 130 ++---
 6 files changed, 224 insertions(+), 164 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h 
b/llvm/include/llvm/CodeGen/TargetLowering.h
index 9c453f51e129d..0d157de479141 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -3572,20 +3572,18 @@ class LLVM_ABI TargetLoweringBase {
 
   /// Override the default CondCode to be used to test the result of the
   /// comparison libcall against zero.
-  /// FIXME: This can't be merged with 'RuntimeLibcallsInfo' because of the 
ISD.
-  void setCmpLibcallCC(RTLIB::Libcall Call, ISD::CondCode CC) {
-CmpLibcallCCs[Call] = CC;
+  /// FIXME: This should be removed
+  void setCmpLibcallCC(RTLIB::Libcall Call, CmpInst::Predicate Pred) {
+Libcalls.setSoftFloatCmpLibcallPredicate(Call, Pred);
   }
 
-
   /// Get the CondCode that's to be used to test the result of the comparison
   /// libcall against zero.
-  /// FIXME: This can't be merged with 'RuntimeLibcallsInfo' because of the 
ISD.
-  ISD::CondCode getCmpLibcallCC(RTLIB::Libcall Call) const {
-return CmpLibcallCCs[Call];
+  CmpInst::Predicate
+  getSoftFloatCmpLibcallPredicate(RTLIB::Libcall Call) const {
+return Libcalls.getSoftFloatCmpLibcallPredicate(Call);
   }
 
-
   /// Set the CallingConv that should be used for the specified libcall.
   void setLibcallCallingConv(RTLIB::Libcall Call, CallingConv::ID CC) {
 Libcalls.setLibcallCallingConv(Call, CC);
diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h 
b/llvm/include/llvm/IR/RuntimeLibcalls.h
index 26c085031a48a..6cc65fabfcc99 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.h
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.h
@@ -16,6 +16,7 @@
 
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/IR/CallingConv.h"
+#include "llvm/IR/InstrTypes.h"
 #include "llvm/Support/AtomicOrdering.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/TargetParser/Triple.h"
@@ -73,6 +74,20 @@ struct RuntimeLibcallsInfo {
 LibcallRoutineNames + RTLIB::UNKNOWN_LIBCALL);
   }
 
+  /// Get the comparison predicate that's to be used to test the result of the
+  /// comparison libcall against zero. This should only be used with
+  /// floating-point compare libcalls.
+  CmpInst::Predicate
+  getSoftFloatCmpLibcallPredicate(RTLIB::Libcall Call) const {
+return SoftFloatCompareLibcallPredicates[Call];
+  }
+
+  // FIXME: This should be removed. This should be private constant.
+  void setSoftFloatCmpLibcallPredicate(RTLIB::Libcall Call,
+   CmpInst::Predicate Pred) {
+SoftFloatCompareLibcallPredicates[Call] = Pred;
+  }
+
 private:
   /// Stores the name each libcall.
   const char *LibcallRoutineNames[RTLIB::UNKNOWN_LIBCALL + 1];
@@ -80,6 +95,14 @@ struct RuntimeLibcallsInfo {
   /// Stores the CallingConv that should be used for each libcall.
   CallingConv::ID LibcallCallingConvs[RTLIB::UNKNOWN_LIBCALL];
 
+  /// The condition type that should be used to test the result of each of the
+  /// soft floating-point comparison libcall against integer zero.
+  ///
+  // FIXME: This is only relevant for the handful of floating-point comparison
+  // runtime calls; it's excessive to have a table entry for every single
+  // opcode.
+  CmpInst::Predicate SoftFloatCompareLibcallPredicates[RTLIB::UNKNOWN_LIBCALL];
+
   static bool darwinHasSinCos(const Triple &TT) {
 assert(TT.isOSDarwin() && "should be called with darwin triple");
 // Don't bother with 32 bit x86.
@@ -95,6 +118,8 @@ struct RuntimeLibcallsInfo {
 return true;
   }
 
+  void initSoftFloatCmpLibcallPredicates();
+
   /// Set default libcall names. If a target wants to opt-out of a libcall it
   /// should be placed here.
   LLVM_ABI void initLibcalls(const Triple &TT);
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp 
b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 4472a031c39f6..5105c4a515fbe 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -14,6 +14,7 @@
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/Analysis/ValueTrackin

[llvm-branch-commits] [llvm] RuntimeLibcalls: Cleanup sincos predicate functions (PR #143081)

2025-06-05 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/143081

The darwinHasSinCos wasn't actually used for sincos, only the stret
variant. Rename this to reflect that, and introduce a new one for
enabling sincos.

>From ee79ca11029ca60e9b6062cde3d0f468c2d5a7b3 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 6 Jun 2025 15:15:53 +0900
Subject: [PATCH] RuntimeLibcalls: Cleanup sincos predicate functions

The darwinHasSinCos wasn't actually used for sincos, only the stret
variant. Rename this to reflect that, and introduce a new one for
enabling sincos.
---
 llvm/include/llvm/IR/RuntimeLibcalls.h | 8 +++-
 llvm/lib/IR/RuntimeLibcalls.cpp| 5 ++---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h 
b/llvm/include/llvm/IR/RuntimeLibcalls.h
index 6cc65fabfcc99..d2704d5aa2616 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.h
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.h
@@ -103,7 +103,7 @@ struct RuntimeLibcallsInfo {
   // opcode.
   CmpInst::Predicate SoftFloatCompareLibcallPredicates[RTLIB::UNKNOWN_LIBCALL];
 
-  static bool darwinHasSinCos(const Triple &TT) {
+  static bool darwinHasSinCosStret(const Triple &TT) {
 assert(TT.isOSDarwin() && "should be called with darwin triple");
 // Don't bother with 32 bit x86.
 if (TT.getArch() == Triple::x86)
@@ -118,6 +118,12 @@ struct RuntimeLibcallsInfo {
 return true;
   }
 
+  /// Return true if the target has sincosf/sincos/sincosl functions
+  static bool hasSinCos(const Triple &TT) {
+return TT.isGNUEnvironment() || TT.isOSFuchsia() ||
+   (TT.isAndroid() && !TT.isAndroidVersionLT(9));
+  }
+
   void initSoftFloatCmpLibcallPredicates();
 
   /// Set default libcall names. If a target wants to opt-out of a libcall it
diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index 91f303c9e3d3c..a6fda0cfeadd2 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -170,7 +170,7 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) {
   break;
 }
 
-if (darwinHasSinCos(TT)) {
+if (darwinHasSinCosStret(TT)) {
   setLibcallName(RTLIB::SINCOS_STRET_F32, "__sincosf_stret");
   setLibcallName(RTLIB::SINCOS_STRET_F64, "__sincos_stret");
   if (TT.isWatchABI()) {
@@ -214,8 +214,7 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) {
 setLibcallName(RTLIB::EXP10_F64, "__exp10");
   }
 
-  if (TT.isGNUEnvironment() || TT.isOSFuchsia() ||
-  (TT.isAndroid() && !TT.isAndroidVersionLT(9))) {
+  if (hasSinCos(TT)) {
 setLibcallName(RTLIB::SINCOS_F32, "sincosf");
 setLibcallName(RTLIB::SINCOS_F64, "sincos");
 setLibcallName(RTLIB::SINCOS_F80, "sincosl");

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] RuntimeLibcalls: Cleanup sincos predicate functions (PR #143081)

2025-06-05 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/143081?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#143081** https://app.graphite.dev/github/pr/llvm/llvm-project/143081?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/143081?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#142905** https://app.graphite.dev/github/pr/llvm/llvm-project/142905?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>: 1 other dependent PR 
([#142912](https://github.com/llvm/llvm-project/pull/142912) https://app.graphite.dev/github/pr/llvm/llvm-project/142912?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>)
* **#142898** https://app.graphite.dev/github/pr/llvm/llvm-project/142898?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/143081
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] RuntimeLibcalls: Use array initializers for default values (PR #143082)

2025-06-05 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/143082?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#143082** https://app.graphite.dev/github/pr/llvm/llvm-project/143082?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/143082?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#143081** https://app.graphite.dev/github/pr/llvm/llvm-project/143081?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#142905** https://app.graphite.dev/github/pr/llvm/llvm-project/142905?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>: 1 other dependent PR 
([#142912](https://github.com/llvm/llvm-project/pull/142912) https://app.graphite.dev/github/pr/llvm/llvm-project/142912?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>)
* **#142898** https://app.graphite.dev/github/pr/llvm/llvm-project/142898?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/143082
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] RuntimeLibcalls: Use array initializers for default values (PR #143082)

2025-06-05 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/143082

None

>From 8aa7850d9ddd50d57c9d9fbbef07b9ad00ffe202 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 6 Jun 2025 14:50:57 +0900
Subject: [PATCH] RuntimeLibcalls: Use array initializers for default values

---
 llvm/include/llvm/IR/RuntimeLibcalls.h |  8 +---
 llvm/lib/IR/RuntimeLibcalls.cpp| 10 --
 2 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h 
b/llvm/include/llvm/IR/RuntimeLibcalls.h
index d2704d5aa2616..d67430968edf1 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.h
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.h
@@ -90,10 +90,11 @@ struct RuntimeLibcallsInfo {
 
 private:
   /// Stores the name each libcall.
-  const char *LibcallRoutineNames[RTLIB::UNKNOWN_LIBCALL + 1];
+  const char *LibcallRoutineNames[RTLIB::UNKNOWN_LIBCALL + 1] = {nullptr};
 
   /// Stores the CallingConv that should be used for each libcall.
-  CallingConv::ID LibcallCallingConvs[RTLIB::UNKNOWN_LIBCALL];
+  CallingConv::ID LibcallCallingConvs[RTLIB::UNKNOWN_LIBCALL] = {
+  CallingConv::C};
 
   /// The condition type that should be used to test the result of each of the
   /// soft floating-point comparison libcall against integer zero.
@@ -101,7 +102,8 @@ struct RuntimeLibcallsInfo {
   // FIXME: This is only relevant for the handful of floating-point comparison
   // runtime calls; it's excessive to have a table entry for every single
   // opcode.
-  CmpInst::Predicate SoftFloatCompareLibcallPredicates[RTLIB::UNKNOWN_LIBCALL];
+  CmpInst::Predicate SoftFloatCompareLibcallPredicates[RTLIB::UNKNOWN_LIBCALL] 
=
+  {CmpInst::BAD_ICMP_PREDICATE};
 
   static bool darwinHasSinCosStret(const Triple &TT) {
 assert(TT.isOSDarwin() && "should be called with darwin triple");
diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index a6fda0cfeadd2..01978b7ae39e3 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -12,9 +12,6 @@ using namespace llvm;
 using namespace RTLIB;
 
 void RuntimeLibcallsInfo::initSoftFloatCmpLibcallPredicates() {
-  std::fill(SoftFloatCompareLibcallPredicates,
-SoftFloatCompareLibcallPredicates + RTLIB::UNKNOWN_LIBCALL,
-CmpInst::BAD_ICMP_PREDICATE);
   SoftFloatCompareLibcallPredicates[RTLIB::OEQ_F32] = CmpInst::ICMP_EQ;
   SoftFloatCompareLibcallPredicates[RTLIB::OEQ_F64] = CmpInst::ICMP_EQ;
   SoftFloatCompareLibcallPredicates[RTLIB::OEQ_F128] = CmpInst::ICMP_EQ;
@@ -48,19 +45,12 @@ void 
RuntimeLibcallsInfo::initSoftFloatCmpLibcallPredicates() {
 /// Set default libcall names. If a target wants to opt-out of a libcall it
 /// should be placed here.
 void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) {
-  std::fill(std::begin(LibcallRoutineNames), std::end(LibcallRoutineNames),
-nullptr);
-
   initSoftFloatCmpLibcallPredicates();
 
 #define HANDLE_LIBCALL(code, name) setLibcallName(RTLIB::code, name);
 #include "llvm/IR/RuntimeLibcalls.def"
 #undef HANDLE_LIBCALL
 
-  // Initialize calling conventions to their default.
-  for (int LC = 0; LC < RTLIB::UNKNOWN_LIBCALL; ++LC)
-setLibcallCallingConv((RTLIB::Libcall)LC, CallingConv::C);
-
   // Use the f128 variants of math functions on x86
   if (TT.isX86() && TT.isGNUEnvironment()) {
 setLibcallName(RTLIB::REM_F128, "fmodf128");

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] RuntimeLibcalls: Cleanup sincos predicate functions (PR #143081)

2025-06-05 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/143081
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] RuntimeLibcalls: Use array initializers for default values (PR #143082)

2025-06-05 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-ir

Author: Matt Arsenault (arsenm)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/143082.diff


2 Files Affected:

- (modified) llvm/include/llvm/IR/RuntimeLibcalls.h (+5-3) 
- (modified) llvm/lib/IR/RuntimeLibcalls.cpp (-10) 


``diff
diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h 
b/llvm/include/llvm/IR/RuntimeLibcalls.h
index d2704d5aa2616..d67430968edf1 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.h
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.h
@@ -90,10 +90,11 @@ struct RuntimeLibcallsInfo {
 
 private:
   /// Stores the name each libcall.
-  const char *LibcallRoutineNames[RTLIB::UNKNOWN_LIBCALL + 1];
+  const char *LibcallRoutineNames[RTLIB::UNKNOWN_LIBCALL + 1] = {nullptr};
 
   /// Stores the CallingConv that should be used for each libcall.
-  CallingConv::ID LibcallCallingConvs[RTLIB::UNKNOWN_LIBCALL];
+  CallingConv::ID LibcallCallingConvs[RTLIB::UNKNOWN_LIBCALL] = {
+  CallingConv::C};
 
   /// The condition type that should be used to test the result of each of the
   /// soft floating-point comparison libcall against integer zero.
@@ -101,7 +102,8 @@ struct RuntimeLibcallsInfo {
   // FIXME: This is only relevant for the handful of floating-point comparison
   // runtime calls; it's excessive to have a table entry for every single
   // opcode.
-  CmpInst::Predicate SoftFloatCompareLibcallPredicates[RTLIB::UNKNOWN_LIBCALL];
+  CmpInst::Predicate SoftFloatCompareLibcallPredicates[RTLIB::UNKNOWN_LIBCALL] 
=
+  {CmpInst::BAD_ICMP_PREDICATE};
 
   static bool darwinHasSinCosStret(const Triple &TT) {
 assert(TT.isOSDarwin() && "should be called with darwin triple");
diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index a6fda0cfeadd2..01978b7ae39e3 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -12,9 +12,6 @@ using namespace llvm;
 using namespace RTLIB;
 
 void RuntimeLibcallsInfo::initSoftFloatCmpLibcallPredicates() {
-  std::fill(SoftFloatCompareLibcallPredicates,
-SoftFloatCompareLibcallPredicates + RTLIB::UNKNOWN_LIBCALL,
-CmpInst::BAD_ICMP_PREDICATE);
   SoftFloatCompareLibcallPredicates[RTLIB::OEQ_F32] = CmpInst::ICMP_EQ;
   SoftFloatCompareLibcallPredicates[RTLIB::OEQ_F64] = CmpInst::ICMP_EQ;
   SoftFloatCompareLibcallPredicates[RTLIB::OEQ_F128] = CmpInst::ICMP_EQ;
@@ -48,19 +45,12 @@ void 
RuntimeLibcallsInfo::initSoftFloatCmpLibcallPredicates() {
 /// Set default libcall names. If a target wants to opt-out of a libcall it
 /// should be placed here.
 void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) {
-  std::fill(std::begin(LibcallRoutineNames), std::end(LibcallRoutineNames),
-nullptr);
-
   initSoftFloatCmpLibcallPredicates();
 
 #define HANDLE_LIBCALL(code, name) setLibcallName(RTLIB::code, name);
 #include "llvm/IR/RuntimeLibcalls.def"
 #undef HANDLE_LIBCALL
 
-  // Initialize calling conventions to their default.
-  for (int LC = 0; LC < RTLIB::UNKNOWN_LIBCALL; ++LC)
-setLibcallCallingConv((RTLIB::Libcall)LC, CallingConv::C);
-
   // Use the f128 variants of math functions on x86
   if (TT.isX86() && TT.isGNUEnvironment()) {
 setLibcallName(RTLIB::REM_F128, "fmodf128");

``




https://github.com/llvm/llvm-project/pull/143082
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] RuntimeLibcalls: Use array initializers for default values (PR #143082)

2025-06-05 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/143082
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] RuntimeLibcalls: Cleanup sincos predicate functions (PR #143081)

2025-06-05 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-ir

Author: Matt Arsenault (arsenm)


Changes

The darwinHasSinCos wasn't actually used for sincos, only the stret
variant. Rename this to reflect that, and introduce a new one for
enabling sincos.

---
Full diff: https://github.com/llvm/llvm-project/pull/143081.diff


2 Files Affected:

- (modified) llvm/include/llvm/IR/RuntimeLibcalls.h (+7-1) 
- (modified) llvm/lib/IR/RuntimeLibcalls.cpp (+2-3) 


``diff
diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h 
b/llvm/include/llvm/IR/RuntimeLibcalls.h
index 6cc65fabfcc99..d2704d5aa2616 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.h
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.h
@@ -103,7 +103,7 @@ struct RuntimeLibcallsInfo {
   // opcode.
   CmpInst::Predicate SoftFloatCompareLibcallPredicates[RTLIB::UNKNOWN_LIBCALL];
 
-  static bool darwinHasSinCos(const Triple &TT) {
+  static bool darwinHasSinCosStret(const Triple &TT) {
 assert(TT.isOSDarwin() && "should be called with darwin triple");
 // Don't bother with 32 bit x86.
 if (TT.getArch() == Triple::x86)
@@ -118,6 +118,12 @@ struct RuntimeLibcallsInfo {
 return true;
   }
 
+  /// Return true if the target has sincosf/sincos/sincosl functions
+  static bool hasSinCos(const Triple &TT) {
+return TT.isGNUEnvironment() || TT.isOSFuchsia() ||
+   (TT.isAndroid() && !TT.isAndroidVersionLT(9));
+  }
+
   void initSoftFloatCmpLibcallPredicates();
 
   /// Set default libcall names. If a target wants to opt-out of a libcall it
diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index 91f303c9e3d3c..a6fda0cfeadd2 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -170,7 +170,7 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) {
   break;
 }
 
-if (darwinHasSinCos(TT)) {
+if (darwinHasSinCosStret(TT)) {
   setLibcallName(RTLIB::SINCOS_STRET_F32, "__sincosf_stret");
   setLibcallName(RTLIB::SINCOS_STRET_F64, "__sincos_stret");
   if (TT.isWatchABI()) {
@@ -214,8 +214,7 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) {
 setLibcallName(RTLIB::EXP10_F64, "__exp10");
   }
 
-  if (TT.isGNUEnvironment() || TT.isOSFuchsia() ||
-  (TT.isAndroid() && !TT.isAndroidVersionLT(9))) {
+  if (hasSinCos(TT)) {
 setLibcallName(RTLIB::SINCOS_F32, "sincosf");
 setLibcallName(RTLIB::SINCOS_F64, "sincos");
 setLibcallName(RTLIB::SINCOS_F80, "sincosl");

``




https://github.com/llvm/llvm-project/pull/143081
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)

2025-06-05 Thread Nikita Popov via llvm-branch-commits


@@ -1302,6 +1302,24 @@ static void addRange(SmallVectorImpl 
&EndPoints,
   EndPoints.push_back(High);
 }
 
+MDNode *MDNode::getMergedCalleeTypeMetadata(LLVMContext &Ctx, MDNode *A,
+MDNode *B) {
+  SmallVector AB;
+  SmallSet MergedCallees;
+  auto AddUniqueCallees = [&AB, &MergedCallees](llvm::MDNode *N) {
+if (!N)
+  return;
+for (const MDOperand &Op : N->operands()) {
+  Metadata *MD = Op.get();
+  if (MergedCallees.insert(MD).second)
+AB.push_back(MD);
+}
+  };
+  AddUniqueCallees(A);
+  AddUniqueCallees(B);
+  return llvm::MDNode::get(Ctx, AB);

nikic wrote:

```suggestion
  return MDNode::get(Ctx, AB);
```

https://github.com/llvm/llvm-project/pull/87573
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)

2025-06-05 Thread Nikita Popov via llvm-branch-commits


@@ -1252,6 +1252,12 @@ class MDNode : public Metadata {
   bool isReplaceable() const { return isTemporary() || isAlwaysReplaceable(); }
   bool isAlwaysReplaceable() const { return getMetadataID() == DIAssignIDKind; 
}
 
+  bool hasGeneralizedMDString() const {

nikic wrote:

This looks too specific to be part of the main Metadata API.

https://github.com/llvm/llvm-project/pull/87573
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)

2025-06-05 Thread Nikita Popov via llvm-branch-commits


@@ -5096,6 +5097,19 @@ void Verifier::visitCallsiteMetadata(Instruction &I, 
MDNode *MD) {
   visitCallStackMetadata(MD);
 }
 
+void Verifier::visitCalleeTypeMetadata(Instruction &I, MDNode *MD) {
+  Check(isa(I), "!callee_type metadata should only exist on calls",
+&I);
+  for (const MDOperand &Op : MD->operands()) {
+Check(isa(Op.get()),
+  "The callee_type metadata must be a list of type metadata nodes");
+auto *TypeMD = cast(Op.get());
+Check(TypeMD->hasGeneralizedMDString(),
+  "Only generalized type metadata can be part of the callee_type "
+  "metadata list");

nikic wrote:

The CalleeTypeMetadata.rst could be clearer on this requirement. 
Generalizations are mentioned, but not what this means for the metadata.

https://github.com/llvm/llvm-project/pull/87573
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)

2025-06-05 Thread Nikita Popov via llvm-branch-commits


@@ -1302,6 +1302,24 @@ static void addRange(SmallVectorImpl 
&EndPoints,
   EndPoints.push_back(High);
 }
 
+MDNode *MDNode::getMergedCalleeTypeMetadata(LLVMContext &Ctx, MDNode *A,
+MDNode *B) {
+  SmallVector AB;
+  SmallSet MergedCallees;
+  auto AddUniqueCallees = [&AB, &MergedCallees](llvm::MDNode *N) {

nikic wrote:

```suggestion
  auto AddUniqueCallees = [&AB, &MergedCallees](MDNode *N) {
```

https://github.com/llvm/llvm-project/pull/87573
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)

2025-06-05 Thread Nikita Popov via llvm-branch-commits


@@ -3377,6 +3377,11 @@ static void combineMetadata(Instruction *K, const 
Instruction *J,
   K->setMetadata(Kind,
 MDNode::getMostGenericAlignmentOrDereferenceable(JMD, KMD));
 break;
+  case LLVMContext::MD_callee_type:
+if (!AAOnly)
+  K->setMetadata(Kind, MDNode::getMergedCalleeTypeMetadata(
+   K->getContext(), KMD, JMD));

nikic wrote:

This code appears to be untested. Check out existing metadata tests in 
SimplifyCFG.

https://github.com/llvm/llvm-project/pull/87573
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)

2025-06-05 Thread Nikita Popov via llvm-branch-commits


@@ -4161,6 +4161,11 @@ Instruction *InstCombinerImpl::visitCallBase(CallBase 
&Call) {
 Call, Builder.CreateBitOrPointerCast(ReturnedArg, CallTy));
 }
 
+  // Drop unnecessary callee_type metadata from calls that were converted
+  // into direct calls.
+  if (Call.getMetadata(LLVMContext::MD_callee_type) && !Call.isIndirectCall())
+Call.setMetadata(LLVMContext::MD_callee_type, nullptr);

nikic wrote:

Should indicate IR change.

https://github.com/llvm/llvm-project/pull/87573
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [mlir] [OpenMP] Add directive spellings introduced in spec v6.0 (PR #141772)

2025-06-05 Thread Krzysztof Parzyszek via llvm-branch-commits

https://github.com/kparzysz reopened 
https://github.com/llvm/llvm-project/pull/141772
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] CodeGen: Move ABI option enums to support (PR #142912)

2025-06-05 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/142912

>From f8721bd055a0fb775543df2059d0979d9c3487de Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 5 Jun 2025 16:08:26 +0900
Subject: [PATCH] CodeGen: Move ABI option enums to support

Move these out of TargetOptions and into Support to avoid
the dependency on Target. There are similar ABI options
already in Support/CodeGen.h.
---
 llvm/include/llvm/Support/CodeGen.h  | 16 
 llvm/include/llvm/Target/TargetOptions.h | 17 +
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/llvm/include/llvm/Support/CodeGen.h 
b/llvm/include/llvm/Support/CodeGen.h
index 0e42789ba932e..b7896ae5d0f83 100644
--- a/llvm/include/llvm/Support/CodeGen.h
+++ b/llvm/include/llvm/Support/CodeGen.h
@@ -50,6 +50,22 @@ namespace llvm {
 };
   }
 
+  namespace FloatABI {
+  enum ABIType {
+Default, // Target-specific (either soft or hard depending on triple, etc).
+Soft,// Soft float.
+Hard // Hard float.
+  };
+  }
+
+  enum class EABI {
+Unknown,
+Default, // Default means not specified
+EABI4,   // Target-specific (either 4, 5 or gnu depending on triple).
+EABI5,
+GNU
+  };
+
   /// Code generation optimization level.
   enum class CodeGenOptLevel {
 None = 0,  ///< -O0
diff --git a/llvm/include/llvm/Target/TargetOptions.h 
b/llvm/include/llvm/Target/TargetOptions.h
index fd8dad4f6f791..08d6aa36e19d8 100644
--- a/llvm/include/llvm/Target/TargetOptions.h
+++ b/llvm/include/llvm/Target/TargetOptions.h
@@ -16,6 +16,7 @@
 
 #include "llvm/ADT/FloatingPointMode.h"
 #include "llvm/MC/MCTargetOptions.h"
+#include "llvm/Support/CodeGen.h"
 
 #include 
 
@@ -24,14 +25,6 @@ namespace llvm {
   class MachineFunction;
   class MemoryBuffer;
 
-  namespace FloatABI {
-enum ABIType {
-  Default, // Target-specific (either soft or hard depending on triple, 
etc).
-  Soft,// Soft float.
-  Hard // Hard float.
-};
-  }
-
   namespace FPOpFusion {
 enum FPOpFusionMode {
   Fast, // Enable fusion of FP ops wherever it's profitable.
@@ -70,14 +63,6 @@ namespace llvm {
 None// Do not use Basic Block Sections.
   };
 
-  enum class EABI {
-Unknown,
-Default, // Default means not specified
-EABI4,   // Target-specific (either 4, 5 or gnu depending on triple).
-EABI5,
-GNU
-  };
-
   /// Identify a debugger for "tuning" the debug info.
   ///
   /// The "debugger tuning" concept allows us to present a more intuitive

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Move soft float predicate management into RuntimeLibcalls (PR #142905)

2025-06-05 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/142905

>From a3cb3a4361182158b16e85952309c2ebbe9dfb32 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 5 Jun 2025 14:22:55 +0900
Subject: [PATCH] DAG: Move soft float predicate management into
 RuntimeLibcalls

Work towards making RuntimeLibcalls the centralized location for
all libcall information. This requires changing the encoding from
tracking the ISD::CondCode to using CmpInst::Predicate.
---
 llvm/include/llvm/CodeGen/TargetLowering.h|  14 +-
 llvm/include/llvm/IR/RuntimeLibcalls.h|  25 +++
 .../CodeGen/SelectionDAG/TargetLowering.cpp   |   5 +-
 llvm/lib/IR/RuntimeLibcalls.cpp   |  36 
 llvm/lib/Target/ARM/ARMISelLowering.cpp   | 178 +-
 llvm/lib/Target/MSP430/MSP430ISelLowering.cpp | 130 ++---
 6 files changed, 224 insertions(+), 164 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h 
b/llvm/include/llvm/CodeGen/TargetLowering.h
index 9c453f51e129d..0d157de479141 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -3572,20 +3572,18 @@ class LLVM_ABI TargetLoweringBase {
 
   /// Override the default CondCode to be used to test the result of the
   /// comparison libcall against zero.
-  /// FIXME: This can't be merged with 'RuntimeLibcallsInfo' because of the 
ISD.
-  void setCmpLibcallCC(RTLIB::Libcall Call, ISD::CondCode CC) {
-CmpLibcallCCs[Call] = CC;
+  /// FIXME: This should be removed
+  void setCmpLibcallCC(RTLIB::Libcall Call, CmpInst::Predicate Pred) {
+Libcalls.setSoftFloatCmpLibcallPredicate(Call, Pred);
   }
 
-
   /// Get the CondCode that's to be used to test the result of the comparison
   /// libcall against zero.
-  /// FIXME: This can't be merged with 'RuntimeLibcallsInfo' because of the 
ISD.
-  ISD::CondCode getCmpLibcallCC(RTLIB::Libcall Call) const {
-return CmpLibcallCCs[Call];
+  CmpInst::Predicate
+  getSoftFloatCmpLibcallPredicate(RTLIB::Libcall Call) const {
+return Libcalls.getSoftFloatCmpLibcallPredicate(Call);
   }
 
-
   /// Set the CallingConv that should be used for the specified libcall.
   void setLibcallCallingConv(RTLIB::Libcall Call, CallingConv::ID CC) {
 Libcalls.setLibcallCallingConv(Call, CC);
diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h 
b/llvm/include/llvm/IR/RuntimeLibcalls.h
index 26c085031a48a..6cc65fabfcc99 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.h
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.h
@@ -16,6 +16,7 @@
 
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/IR/CallingConv.h"
+#include "llvm/IR/InstrTypes.h"
 #include "llvm/Support/AtomicOrdering.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/TargetParser/Triple.h"
@@ -73,6 +74,20 @@ struct RuntimeLibcallsInfo {
 LibcallRoutineNames + RTLIB::UNKNOWN_LIBCALL);
   }
 
+  /// Get the comparison predicate that's to be used to test the result of the
+  /// comparison libcall against zero. This should only be used with
+  /// floating-point compare libcalls.
+  CmpInst::Predicate
+  getSoftFloatCmpLibcallPredicate(RTLIB::Libcall Call) const {
+return SoftFloatCompareLibcallPredicates[Call];
+  }
+
+  // FIXME: This should be removed. This should be private constant.
+  void setSoftFloatCmpLibcallPredicate(RTLIB::Libcall Call,
+   CmpInst::Predicate Pred) {
+SoftFloatCompareLibcallPredicates[Call] = Pred;
+  }
+
 private:
   /// Stores the name each libcall.
   const char *LibcallRoutineNames[RTLIB::UNKNOWN_LIBCALL + 1];
@@ -80,6 +95,14 @@ struct RuntimeLibcallsInfo {
   /// Stores the CallingConv that should be used for each libcall.
   CallingConv::ID LibcallCallingConvs[RTLIB::UNKNOWN_LIBCALL];
 
+  /// The condition type that should be used to test the result of each of the
+  /// soft floating-point comparison libcall against integer zero.
+  ///
+  // FIXME: This is only relevant for the handful of floating-point comparison
+  // runtime calls; it's excessive to have a table entry for every single
+  // opcode.
+  CmpInst::Predicate SoftFloatCompareLibcallPredicates[RTLIB::UNKNOWN_LIBCALL];
+
   static bool darwinHasSinCos(const Triple &TT) {
 assert(TT.isOSDarwin() && "should be called with darwin triple");
 // Don't bother with 32 bit x86.
@@ -95,6 +118,8 @@ struct RuntimeLibcallsInfo {
 return true;
   }
 
+  void initSoftFloatCmpLibcallPredicates();
+
   /// Set default libcall names. If a target wants to opt-out of a libcall it
   /// should be placed here.
   LLVM_ABI void initLibcalls(const Triple &TT);
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp 
b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 4472a031c39f6..5105c4a515fbe 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -14,6 +14,7 @@
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/Analysis/ValueTrackin

[llvm-branch-commits] [llvm] CodeGen: Move ABI option enums to support (PR #142912)

2025-06-05 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/142912

>From f8721bd055a0fb775543df2059d0979d9c3487de Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 5 Jun 2025 16:08:26 +0900
Subject: [PATCH] CodeGen: Move ABI option enums to support

Move these out of TargetOptions and into Support to avoid
the dependency on Target. There are similar ABI options
already in Support/CodeGen.h.
---
 llvm/include/llvm/Support/CodeGen.h  | 16 
 llvm/include/llvm/Target/TargetOptions.h | 17 +
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/llvm/include/llvm/Support/CodeGen.h 
b/llvm/include/llvm/Support/CodeGen.h
index 0e42789ba932e..b7896ae5d0f83 100644
--- a/llvm/include/llvm/Support/CodeGen.h
+++ b/llvm/include/llvm/Support/CodeGen.h
@@ -50,6 +50,22 @@ namespace llvm {
 };
   }
 
+  namespace FloatABI {
+  enum ABIType {
+Default, // Target-specific (either soft or hard depending on triple, etc).
+Soft,// Soft float.
+Hard // Hard float.
+  };
+  }
+
+  enum class EABI {
+Unknown,
+Default, // Default means not specified
+EABI4,   // Target-specific (either 4, 5 or gnu depending on triple).
+EABI5,
+GNU
+  };
+
   /// Code generation optimization level.
   enum class CodeGenOptLevel {
 None = 0,  ///< -O0
diff --git a/llvm/include/llvm/Target/TargetOptions.h 
b/llvm/include/llvm/Target/TargetOptions.h
index fd8dad4f6f791..08d6aa36e19d8 100644
--- a/llvm/include/llvm/Target/TargetOptions.h
+++ b/llvm/include/llvm/Target/TargetOptions.h
@@ -16,6 +16,7 @@
 
 #include "llvm/ADT/FloatingPointMode.h"
 #include "llvm/MC/MCTargetOptions.h"
+#include "llvm/Support/CodeGen.h"
 
 #include 
 
@@ -24,14 +25,6 @@ namespace llvm {
   class MachineFunction;
   class MemoryBuffer;
 
-  namespace FloatABI {
-enum ABIType {
-  Default, // Target-specific (either soft or hard depending on triple, 
etc).
-  Soft,// Soft float.
-  Hard // Hard float.
-};
-  }
-
   namespace FPOpFusion {
 enum FPOpFusionMode {
   Fast, // Enable fusion of FP ops wherever it's profitable.
@@ -70,14 +63,6 @@ namespace llvm {
 None// Do not use Basic Block Sections.
   };
 
-  enum class EABI {
-Unknown,
-Default, // Default means not specified
-EABI4,   // Target-specific (either 4, 5 or gnu depending on triple).
-EABI5,
-GNU
-  };
-
   /// Identify a debugger for "tuning" the debug info.
   ///
   /// The "debugger tuning" concept allows us to present a more intuitive

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] [libcxx] Include __fwd/span.h in (PR #142925)

2025-06-05 Thread Nikolas Klauser via llvm-branch-commits


@@ -451,6 +451,7 @@ namespace std {
 
 #  if _LIBCPP_STD_VER >= 23
 #include <__fwd/mdspan.h>
+#include <__fwd/span.h>

philnik777 wrote:

Can you add a comment with the LWG issue number? If the answer is that we 
indeed expect users to include `` we should remove the include again. I 
don't expect it, but it's better to have a comment that this is technically an 
extension currently.

https://github.com/llvm/llvm-project/pull/142925
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] [libcxx] Include __fwd/span.h in (PR #142925)

2025-06-05 Thread Nikolas Klauser via llvm-branch-commits

https://github.com/philnik777 edited 
https://github.com/llvm/llvm-project/pull/142925
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] [libcxx] Include __fwd/span.h in (PR #142925)

2025-06-05 Thread Nikolas Klauser via llvm-branch-commits

https://github.com/philnik777 approved this pull request.

LGTM with added comment.

https://github.com/llvm/llvm-project/pull/142925
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-05 Thread James Newling via llvm-branch-commits


@@ -582,6 +582,15 @@ static SmallVector getCollapsedIndices(RewriterBase 
&rewriter,
 
 namespace {
 
+/// Helper functon to return the index of the last dynamic dimension in 
`shape`.

newling wrote:

```suggestion
/// Helper functon to return the index of the last dynamic dimension in 
`shape`. or -1 if there are no dynamic dimensions
```
... if I understand correctly, although it might be static_cast(0ULL - 
1), not sure what that is

https://github.com/llvm/llvm-project/pull/142422
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-05 Thread James Newling via llvm-branch-commits


@@ -49,35 +49,37 @@ FailureOr> 
isTranspose2DSlice(vector::TransposeOp op);
 
 /// Return true if `vectorType` is a contiguous slice of `memrefType`.
 ///
-/// Only the N = vectorType.getRank() trailing dims of `memrefType` are
-/// checked (the other dims are not relevant). Note that for `vectorType` to be
-/// a contiguous slice of `memrefType`, the trailing dims of the latter have
-/// to be contiguous - this is checked by looking at the corresponding strides.
+/// The leading unit dimensions of the vector type are ignored as they
+/// are not relevant to the result. Let N be the number of the vector
+/// dimensions after ignoring a leading sequence of unit ones.
 ///
-/// There might be some restriction on the leading dim of `VectorType`:
+/// For `vectorType` to be a contiguous slice of `memrefType`
+///   a) the N trailing dimensions of the latter must be contiguous, and
+///   b) the trailing N dimensions of `vectorType` and `memrefType`,
+///  except the first of them, must match.
 ///
-/// Case 1. If all the trailing dims of `vectorType` match the trailing dims
-/// of `memrefType` then the leading dim of `vectorType` can be
-/// arbitrary.
-///
-///Ex. 1.1 contiguous slice, perfect match
-///  vector<4x3x2xi32> from memref<5x4x3x2xi32>
-///Ex. 1.2 contiguous slice, the leading dim does not match (2 != 4)
-///  vector<2x3x2xi32> from memref<5x4x3x2xi32>
-///
-/// Case 2. If an "internal" dim of `vectorType` does not match the
-/// corresponding trailing dim in `memrefType` then the remaining
-/// leading dims of `vectorType` have to be 1 (the first non-matching
-/// dim can be arbitrary).
+/// Examples:
 ///
-///Ex. 2.1 non-contiguous slice, 2 != 3 and the leading dim != <1>
-///  vector<2x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.2  contiguous slice, 2 != 3 and the leading dim == <1>
-///  vector<1x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.3. contiguous slice, 2 != 3 and the leading dims == <1x1>
-///  vector<1x1x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.4. non-contiguous slice, 2 != 3 and the leading dims != <1x1>
-/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>)
+///   Ex.1 contiguous slice, perfect match
+/// vector<4x3x2xi32> from memref<5x4x3x2xi32>
+///   Ex.2 contiguous slice, the leading dim does not match (2 != 4)
+/// vector<2x3x2xi32> from memref<5x4x3x2xi32>
+///   Ex.3 non-contiguous slice, 2 != 3
+/// vector<2x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.4 contiguous slice, leading unit dimension of the vector ignored,
+///2 != 3 (allowed)
+/// vector<1x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.5. contiguous slice, leasing two unit dims of the vector ignored,

newling wrote:

```suggestion
///   Ex.5. contiguous slice, leading two unit dims of the vector ignored,
```

https://github.com/llvm/llvm-project/pull/142422
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-05 Thread James Newling via llvm-branch-commits

https://github.com/newling edited 
https://github.com/llvm/llvm-project/pull/142422
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-05 Thread James Newling via llvm-branch-commits


@@ -203,21 +206,21 @@ func.func @transfer_read_dynamic_dim_to_flatten(
   return %res : vector<1x2x6xi32>
 }
 
-// CHECK: #[[$MAP:.*]] = affine_map<()[s0, s1] -> (s0 * 24 + s1 * 6)>
+// CHECK: #[[$MAP:.+]] = affine_map<()[s0, s1] -> (s0 * 24 + s1 * 6)>
 
 // CHECK-LABEL: func.func @transfer_read_dynamic_dim_to_flatten
 // CHECK-SAME:%[[IDX_1:arg0]]
 // CHECK-SAME:%[[IDX_2:arg1]]
 // CHECK-SAME:%[[MEM:arg2]]
-// CHECK:  %[[C0_I32:.*]] = arith.constant 0 : i32

newling wrote:

For my own learning, is there an advantage to using + over * ? Maybe lit can 
process/match this faster?

https://github.com/llvm/llvm-project/pull/142422
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-05 Thread James Newling via llvm-branch-commits


@@ -49,35 +49,37 @@ FailureOr> 
isTranspose2DSlice(vector::TransposeOp op);
 
 /// Return true if `vectorType` is a contiguous slice of `memrefType`.
 ///
-/// Only the N = vectorType.getRank() trailing dims of `memrefType` are
-/// checked (the other dims are not relevant). Note that for `vectorType` to be
-/// a contiguous slice of `memrefType`, the trailing dims of the latter have
-/// to be contiguous - this is checked by looking at the corresponding strides.
+/// The leading unit dimensions of the vector type are ignored as they
+/// are not relevant to the result. Let N be the number of the vector
+/// dimensions after ignoring a leading sequence of unit ones.
 ///
-/// There might be some restriction on the leading dim of `VectorType`:
+/// For `vectorType` to be a contiguous slice of `memrefType`
+///   a) the N trailing dimensions of the latter must be contiguous, and
+///   b) the trailing N dimensions of `vectorType` and `memrefType`,
+///  except the first of them, must match.

newling wrote:

```suggestion

```

https://github.com/llvm/llvm-project/pull/142422
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-05 Thread James Newling via llvm-branch-commits


@@ -49,35 +49,37 @@ FailureOr> 
isTranspose2DSlice(vector::TransposeOp op);
 
 /// Return true if `vectorType` is a contiguous slice of `memrefType`.
 ///
-/// Only the N = vectorType.getRank() trailing dims of `memrefType` are
-/// checked (the other dims are not relevant). Note that for `vectorType` to be
-/// a contiguous slice of `memrefType`, the trailing dims of the latter have
-/// to be contiguous - this is checked by looking at the corresponding strides.
+/// The leading unit dimensions of the vector type are ignored as they
+/// are not relevant to the result. Let N be the number of the vector
+/// dimensions after ignoring a leading sequence of unit ones.
 ///
-/// There might be some restriction on the leading dim of `VectorType`:
+/// For `vectorType` to be a contiguous slice of `memrefType`
+///   a) the N trailing dimensions of the latter must be contiguous, and
+///   b) the trailing N dimensions of `vectorType` and `memrefType`,
+///  except the first of them, must match.
 ///
-/// Case 1. If all the trailing dims of `vectorType` match the trailing dims
-/// of `memrefType` then the leading dim of `vectorType` can be
-/// arbitrary.
-///
-///Ex. 1.1 contiguous slice, perfect match
-///  vector<4x3x2xi32> from memref<5x4x3x2xi32>
-///Ex. 1.2 contiguous slice, the leading dim does not match (2 != 4)
-///  vector<2x3x2xi32> from memref<5x4x3x2xi32>
-///
-/// Case 2. If an "internal" dim of `vectorType` does not match the
-/// corresponding trailing dim in `memrefType` then the remaining
-/// leading dims of `vectorType` have to be 1 (the first non-matching
-/// dim can be arbitrary).
+/// Examples:
 ///
-///Ex. 2.1 non-contiguous slice, 2 != 3 and the leading dim != <1>
-///  vector<2x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.2  contiguous slice, 2 != 3 and the leading dim == <1>
-///  vector<1x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.3. contiguous slice, 2 != 3 and the leading dims == <1x1>
-///  vector<1x1x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.4. non-contiguous slice, 2 != 3 and the leading dims != <1x1>
-/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>)
+///   Ex.1 contiguous slice, perfect match
+/// vector<4x3x2xi32> from memref<5x4x3x2xi32>
+///   Ex.2 contiguous slice, the leading dim does not match (2 != 4)
+/// vector<2x3x2xi32> from memref<5x4x3x2xi32>
+///   Ex.3 non-contiguous slice, 2 != 3
+/// vector<2x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.4 contiguous slice, leading unit dimension of the vector ignored,
+///2 != 3 (allowed)
+/// vector<1x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.5. contiguous slice, leasing two unit dims of the vector ignored,
+/// 2 != 3 (allowed)
+/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.6. non-contiguous slice, 2 != 3, no leading sequence of unit dims
+/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>)
+///   Ex.7 contiguous slice, memref needs to be contiguous only on the last
+///dimension
+/// vector<1x1x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>>
+///   Ex.8 non-contiguous slice, memref needs to be contiguous one the last

newling wrote:

```suggestion
///   Ex.8 non-contiguous slice, memref needs to be contiguous in the last
```

https://github.com/llvm/llvm-project/pull/142422
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-05 Thread James Newling via llvm-branch-commits


@@ -630,7 +639,10 @@ class FlattenContiguousRowMajorTransferReadPattern
 if (transferReadOp.getMask())
   return failure();
 
-int64_t firstDimToCollapse = sourceType.getRank() - vectorType.getRank();

newling wrote:

Why does this need to change? 

If memref is rank n+2 and vector is rank n, isn't it always fine to flatten the 
memref from index 2? So that memref becomes rank 3 and vector becomes rank 1. 
Isn't having a rank-1 vector the goal here? 


https://github.com/llvm/llvm-project/pull/142422
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-05 Thread James Newling via llvm-branch-commits


@@ -49,35 +49,37 @@ FailureOr> 
isTranspose2DSlice(vector::TransposeOp op);
 
 /// Return true if `vectorType` is a contiguous slice of `memrefType`.
 ///
-/// Only the N = vectorType.getRank() trailing dims of `memrefType` are
-/// checked (the other dims are not relevant). Note that for `vectorType` to be
-/// a contiguous slice of `memrefType`, the trailing dims of the latter have
-/// to be contiguous - this is checked by looking at the corresponding strides.
+/// The leading unit dimensions of the vector type are ignored as they
+/// are not relevant to the result. Let N be the number of the vector
+/// dimensions after ignoring a leading sequence of unit ones.
 ///
-/// There might be some restriction on the leading dim of `VectorType`:
+/// For `vectorType` to be a contiguous slice of `memrefType`
+///   a) the N trailing dimensions of the latter must be contiguous, and

newling wrote:

```suggestion
///   a) the N trailing dimensions of `memrefType` must be contiguous, and
```

https://github.com/llvm/llvm-project/pull/142422
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-05 Thread James Newling via llvm-branch-commits


@@ -49,35 +49,37 @@ FailureOr> 
isTranspose2DSlice(vector::TransposeOp op);
 
 /// Return true if `vectorType` is a contiguous slice of `memrefType`.
 ///
-/// Only the N = vectorType.getRank() trailing dims of `memrefType` are
-/// checked (the other dims are not relevant). Note that for `vectorType` to be
-/// a contiguous slice of `memrefType`, the trailing dims of the latter have
-/// to be contiguous - this is checked by looking at the corresponding strides.
+/// The leading unit dimensions of the vector type are ignored as they
+/// are not relevant to the result. Let N be the number of the vector
+/// dimensions after ignoring a leading sequence of unit ones.
 ///
-/// There might be some restriction on the leading dim of `VectorType`:
+/// For `vectorType` to be a contiguous slice of `memrefType`
+///   a) the N trailing dimensions of the latter must be contiguous, and
+///   b) the trailing N dimensions of `vectorType` and `memrefType`,
+///  except the first of them, must match.
 ///
-/// Case 1. If all the trailing dims of `vectorType` match the trailing dims
-/// of `memrefType` then the leading dim of `vectorType` can be
-/// arbitrary.
-///
-///Ex. 1.1 contiguous slice, perfect match
-///  vector<4x3x2xi32> from memref<5x4x3x2xi32>
-///Ex. 1.2 contiguous slice, the leading dim does not match (2 != 4)
-///  vector<2x3x2xi32> from memref<5x4x3x2xi32>
-///
-/// Case 2. If an "internal" dim of `vectorType` does not match the
-/// corresponding trailing dim in `memrefType` then the remaining
-/// leading dims of `vectorType` have to be 1 (the first non-matching
-/// dim can be arbitrary).
+/// Examples:
 ///
-///Ex. 2.1 non-contiguous slice, 2 != 3 and the leading dim != <1>
-///  vector<2x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.2  contiguous slice, 2 != 3 and the leading dim == <1>
-///  vector<1x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.3. contiguous slice, 2 != 3 and the leading dims == <1x1>
-///  vector<1x1x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.4. non-contiguous slice, 2 != 3 and the leading dims != <1x1>
-/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>)
+///   Ex.1 contiguous slice, perfect match
+/// vector<4x3x2xi32> from memref<5x4x3x2xi32>
+///   Ex.2 contiguous slice, the leading dim does not match (2 != 4)
+/// vector<2x3x2xi32> from memref<5x4x3x2xi32>
+///   Ex.3 non-contiguous slice, 2 != 3
+/// vector<2x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.4 contiguous slice, leading unit dimension of the vector ignored,
+///2 != 3 (allowed)
+/// vector<1x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.5. contiguous slice, leasing two unit dims of the vector ignored,
+/// 2 != 3 (allowed)
+/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.6. non-contiguous slice, 2 != 3, no leading sequence of unit dims
+/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>)
+///   Ex.7 contiguous slice, memref needs to be contiguous only on the last
+///dimension
+/// vector<1x1x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>>
+///   Ex.8 non-contiguous slice, memref needs to be contiguous one the last
+///two dimensions, and it isn't
+/// vector<1x2x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>>

newling wrote:

These 8 examples cover all the situations I can think of,  other than where 
memref has a dynamic size. Can you please confirm that they're all tested?

https://github.com/llvm/llvm-project/pull/142422
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-05 Thread James Newling via llvm-branch-commits


@@ -49,35 +49,37 @@ FailureOr> 
isTranspose2DSlice(vector::TransposeOp op);
 
 /// Return true if `vectorType` is a contiguous slice of `memrefType`.
 ///
-/// Only the N = vectorType.getRank() trailing dims of `memrefType` are
-/// checked (the other dims are not relevant). Note that for `vectorType` to be
-/// a contiguous slice of `memrefType`, the trailing dims of the latter have
-/// to be contiguous - this is checked by looking at the corresponding strides.
+/// The leading unit dimensions of the vector type are ignored as they
+/// are not relevant to the result. Let N be the number of the vector
+/// dimensions after ignoring a leading sequence of unit ones.
 ///
-/// There might be some restriction on the leading dim of `VectorType`:
+/// For `vectorType` to be a contiguous slice of `memrefType`
+///   a) the N trailing dimensions of the latter must be contiguous, and
+///   b) the trailing N dimensions of `vectorType` and `memrefType`,

newling wrote:

```suggestion
///   b) the trailing N-1 dimensions of `vectorType` and `memrefType` must 
match.
```

https://github.com/llvm/llvm-project/pull/142422
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-05 Thread James Newling via llvm-branch-commits

https://github.com/newling commented:

Thanks!  Other than my question about the change to first dimension of the 
memref that gets collapsed, my comments are all quite minor.

https://github.com/llvm/llvm-project/pull/142422
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Expose external entry count for functions (PR #141674)

2025-06-05 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov updated 
https://github.com/llvm/llvm-project/pull/141674


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Expose external entry count for functions (PR #141674)

2025-06-05 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov updated 
https://github.com/llvm/llvm-project/pull/141674


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Expose external entry count for functions (PR #141674)

2025-06-05 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov edited 
https://github.com/llvm/llvm-project/pull/141674
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] LowerTypeTests: Shrink check size by 1 instruction on x86. (PR #142887)

2025-06-05 Thread Florian Mayer via llvm-branch-commits

https://github.com/fmayer commented:

Could we have a test that demonstrates the new better instruction sequence (by 
precommiting to show the diff here)?

https://github.com/llvm/llvm-project/pull/142887
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] WebAssembly: Stop directly using RuntimeLibcalls.def (PR #143054)

2025-06-05 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-webassembly

Author: Matt Arsenault (arsenm)


Changes

Construct RuntimeLibcallsInfo instead of manually creating a map.
This was repeating the setting of the RETURN_ADDRESS. This removes
an obstacle to generating libcall information with tablegen.

This is also not great, since it's setting a static map which
would be broken if there were ever a triple with a different libcall
configuration.

---
Full diff: https://github.com/llvm/llvm-project/pull/143054.diff


1 Files Affected:

- (modified) 
llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp (+12-15) 


``diff
diff --git 
a/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp 
b/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp
index ce795d3dedc6a..9622b5a54dc62 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp
@@ -528,23 +528,20 @@ RuntimeLibcallSignatureTable 
&getRuntimeLibcallSignatures() {
 // constructor for use with a static variable
 struct StaticLibcallNameMap {
   StringMap Map;
-  StaticLibcallNameMap() {
-static const std::pair NameLibcalls[] = {
-#define HANDLE_LIBCALL(code, name) {(const char *)name, RTLIB::code},
-#include "llvm/IR/RuntimeLibcalls.def"
-#undef HANDLE_LIBCALL
-};
-for (const auto &NameLibcall : NameLibcalls) {
-  if (NameLibcall.first != nullptr &&
-  getRuntimeLibcallSignatures().Table[NameLibcall.second] !=
-  unsupported) {
-assert(!Map.contains(NameLibcall.first) &&
+  StaticLibcallNameMap(const Triple &TT) {
+// FIXME: This is broken if there are ever different triples compiled with
+// different libcalls.
+RTLIB::RuntimeLibcallsInfo RTCI(TT);
+for (int I = 0; I < RTLIB::UNKNOWN_LIBCALL; ++I) {
+  RTLIB::Libcall LC = static_cast(I);
+  const char *NameLibcall = RTCI.getLibcallName(LC);
+  if (NameLibcall != nullptr &&
+  getRuntimeLibcallSignatures().Table[LC] != unsupported) {
+assert(!Map.contains(NameLibcall) &&
"duplicate libcall names in name map");
-Map[NameLibcall.first] = NameLibcall.second;
+Map[NameLibcall] = LC;
   }
 }
-
-Map["emscripten_return_address"] = RTLIB::RETURN_ADDRESS;
   }
 };
 
@@ -940,7 +937,7 @@ void WebAssembly::getLibcallSignature(const 
WebAssemblySubtarget &Subtarget,
   StringRef Name,
   SmallVectorImpl &Rets,
   SmallVectorImpl &Params) {
-  static StaticLibcallNameMap LibcallNameMap;
+  static StaticLibcallNameMap LibcallNameMap(Subtarget.getTargetTriple());
   auto &Map = LibcallNameMap.Map;
   auto Val = Map.find(Name);
 #ifndef NDEBUG

``




https://github.com/llvm/llvm-project/pull/143054
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] WebAssembly: Stop directly using RuntimeLibcalls.def (PR #143054)

2025-06-05 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/143054

Construct RuntimeLibcallsInfo instead of manually creating a map.
This was repeating the setting of the RETURN_ADDRESS. This removes
an obstacle to generating libcall information with tablegen.

This is also not great, since it's setting a static map which
would be broken if there were ever a triple with a different libcall
configuration.

>From 9405d81822edcfc0071c8de5c1d09dcb8ea22910 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 6 Jun 2025 10:01:59 +0900
Subject: [PATCH] WebAssembly: Stop directly using RuntimeLibcalls.def

Construct RuntimeLibcallsInfo instead of manually creating a map.
This was repeating the setting of the RETURN_ADDRESS. This removes
an obstacle to generating libcall information with tablegen.

This is also not great, since it's setting a static map which
would be broken if there were ever a triple with a different libcall
configuration.
---
 .../WebAssemblyRuntimeLibcallSignatures.cpp   | 27 +--
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git 
a/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp 
b/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp
index ce795d3dedc6a..9622b5a54dc62 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp
@@ -528,23 +528,20 @@ RuntimeLibcallSignatureTable 
&getRuntimeLibcallSignatures() {
 // constructor for use with a static variable
 struct StaticLibcallNameMap {
   StringMap Map;
-  StaticLibcallNameMap() {
-static const std::pair NameLibcalls[] = {
-#define HANDLE_LIBCALL(code, name) {(const char *)name, RTLIB::code},
-#include "llvm/IR/RuntimeLibcalls.def"
-#undef HANDLE_LIBCALL
-};
-for (const auto &NameLibcall : NameLibcalls) {
-  if (NameLibcall.first != nullptr &&
-  getRuntimeLibcallSignatures().Table[NameLibcall.second] !=
-  unsupported) {
-assert(!Map.contains(NameLibcall.first) &&
+  StaticLibcallNameMap(const Triple &TT) {
+// FIXME: This is broken if there are ever different triples compiled with
+// different libcalls.
+RTLIB::RuntimeLibcallsInfo RTCI(TT);
+for (int I = 0; I < RTLIB::UNKNOWN_LIBCALL; ++I) {
+  RTLIB::Libcall LC = static_cast(I);
+  const char *NameLibcall = RTCI.getLibcallName(LC);
+  if (NameLibcall != nullptr &&
+  getRuntimeLibcallSignatures().Table[LC] != unsupported) {
+assert(!Map.contains(NameLibcall) &&
"duplicate libcall names in name map");
-Map[NameLibcall.first] = NameLibcall.second;
+Map[NameLibcall] = LC;
   }
 }
-
-Map["emscripten_return_address"] = RTLIB::RETURN_ADDRESS;
   }
 };
 
@@ -940,7 +937,7 @@ void WebAssembly::getLibcallSignature(const 
WebAssemblySubtarget &Subtarget,
   StringRef Name,
   SmallVectorImpl &Rets,
   SmallVectorImpl &Params) {
-  static StaticLibcallNameMap LibcallNameMap;
+  static StaticLibcallNameMap LibcallNameMap(Subtarget.getTargetTriple());
   auto &Map = LibcallNameMap.Map;
   auto Val = Map.find(Name);
 #ifndef NDEBUG

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] WebAssembly: Stop directly using RuntimeLibcalls.def (PR #143054)

2025-06-05 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/143054
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] WebAssembly: Stop directly using RuntimeLibcalls.def (PR #143054)

2025-06-05 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/143054?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#143054** https://app.graphite.dev/github/pr/llvm/llvm-project/143054?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/143054?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#142624** https://app.graphite.dev/github/pr/llvm/llvm-project/142624?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/143054
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Add SimplifyTypeTests pass. (PR #141327)

2025-06-05 Thread Teresa Johnson via llvm-branch-commits

https://github.com/teresajohnson approved this pull request.

lgtm but I think there is a code formatting error reported that should be fixed 
before merging.

https://github.com/llvm/llvm-project/pull/141327
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-05 Thread Momchil Velikov via llvm-branch-commits

https://github.com/momchil-velikov updated 
https://github.com/llvm/llvm-project/pull/142422

>From 8f9a4002820dcd3de2a5986d53749386a2507eab Mon Sep 17 00:00:00 2001
From: Momchil Velikov 
Date: Mon, 2 Jun 2025 15:13:13 +
Subject: [PATCH 1/4] [MLIR] Fix incorrect slice contiguity inference in
 `vector::isContiguousSlice`

Previously, slices were sometimes marked as non-contiguous when
they were actually contiguous. This occurred when the vector type had
leading unit dimensions, e.g., `vector<1x1x...x1xd0xd1x...xdn-1xT>``.
In such cases, only the trailing n dimensions of the memref need to be
contiguous, not the entire vector rank.

This affects how `FlattenContiguousRowMajorTransfer{Read,Write}Pattern`
flattens `transfer_read` and `transfer_write`` ops. The pattern used
to collapse a number of dimensions equal the vector rank, which
may be is incorrect when leading dimensions are unit-sized.

This patch fixes the issue by collapsing only as many trailing memref
dimensions as are actually contiguous.
---
 .../mlir/Dialect/Vector/Utils/VectorUtils.h   |  54 -
 .../Transforms/VectorTransferOpTransforms.cpp |   8 +-
 mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp |  25 ++--
 .../Vector/vector-transfer-flatten.mlir   | 108 +-
 4 files changed, 120 insertions(+), 75 deletions(-)

diff --git a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h 
b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
index 6609b28d77b6c..ed06d7a029494 100644
--- a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
+++ b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
@@ -49,35 +49,37 @@ FailureOr> 
isTranspose2DSlice(vector::TransposeOp op);
 
 /// Return true if `vectorType` is a contiguous slice of `memrefType`.
 ///
-/// Only the N = vectorType.getRank() trailing dims of `memrefType` are
-/// checked (the other dims are not relevant). Note that for `vectorType` to be
-/// a contiguous slice of `memrefType`, the trailing dims of the latter have
-/// to be contiguous - this is checked by looking at the corresponding strides.
+/// The leading unit dimensions of the vector type are ignored as they
+/// are not relevant to the result. Let N be the number of the vector
+/// dimensions after ignoring a leading sequence of unit ones.
 ///
-/// There might be some restriction on the leading dim of `VectorType`:
+/// For `vectorType` to be a contiguous slice of `memrefType`
+///   a) the N trailing dimensions of the latter must be contiguous, and
+///   b) the trailing N dimensions of `vectorType` and `memrefType`,
+///  except the first of them, must match.
 ///
-/// Case 1. If all the trailing dims of `vectorType` match the trailing dims
-/// of `memrefType` then the leading dim of `vectorType` can be
-/// arbitrary.
-///
-///Ex. 1.1 contiguous slice, perfect match
-///  vector<4x3x2xi32> from memref<5x4x3x2xi32>
-///Ex. 1.2 contiguous slice, the leading dim does not match (2 != 4)
-///  vector<2x3x2xi32> from memref<5x4x3x2xi32>
-///
-/// Case 2. If an "internal" dim of `vectorType` does not match the
-/// corresponding trailing dim in `memrefType` then the remaining
-/// leading dims of `vectorType` have to be 1 (the first non-matching
-/// dim can be arbitrary).
+/// Examples:
 ///
-///Ex. 2.1 non-contiguous slice, 2 != 3 and the leading dim != <1>
-///  vector<2x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.2  contiguous slice, 2 != 3 and the leading dim == <1>
-///  vector<1x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.3. contiguous slice, 2 != 3 and the leading dims == <1x1>
-///  vector<1x1x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.4. non-contiguous slice, 2 != 3 and the leading dims != <1x1>
-/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>)
+///   Ex.1 contiguous slice, perfect match
+/// vector<4x3x2xi32> from memref<5x4x3x2xi32>
+///   Ex.2 contiguous slice, the leading dim does not match (2 != 4)
+/// vector<2x3x2xi32> from memref<5x4x3x2xi32>
+///   Ex.3 non-contiguous slice, 2 != 3
+/// vector<2x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.4 contiguous slice, leading unit dimension of the vector ignored,
+///2 != 3 (allowed)
+/// vector<1x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.5. contiguous slice, leasing two unit dims of the vector ignored,
+/// 2 != 3 (allowed)
+/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.6. non-contiguous slice, 2 != 3, no leading sequence of unit dims
+/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>)
+///   Ex.7 contiguous slice, memref needs to be contiguous only on the last
+///dimension
+/// vector<1x1x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>>
+///   Ex.8 non-contiguous slice, memref needs to be contiguous one the last
+///two dimensions, and it isn't
+/// vector<1x2x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>>
 bool isContiguo

[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-05 Thread Momchil Velikov via llvm-branch-commits

https://github.com/momchil-velikov updated 
https://github.com/llvm/llvm-project/pull/142422

>From 8f9a4002820dcd3de2a5986d53749386a2507eab Mon Sep 17 00:00:00 2001
From: Momchil Velikov 
Date: Mon, 2 Jun 2025 15:13:13 +
Subject: [PATCH 1/4] [MLIR] Fix incorrect slice contiguity inference in
 `vector::isContiguousSlice`

Previously, slices were sometimes marked as non-contiguous when
they were actually contiguous. This occurred when the vector type had
leading unit dimensions, e.g., `vector<1x1x...x1xd0xd1x...xdn-1xT>``.
In such cases, only the trailing n dimensions of the memref need to be
contiguous, not the entire vector rank.

This affects how `FlattenContiguousRowMajorTransfer{Read,Write}Pattern`
flattens `transfer_read` and `transfer_write`` ops. The pattern used
to collapse a number of dimensions equal the vector rank, which
may be is incorrect when leading dimensions are unit-sized.

This patch fixes the issue by collapsing only as many trailing memref
dimensions as are actually contiguous.
---
 .../mlir/Dialect/Vector/Utils/VectorUtils.h   |  54 -
 .../Transforms/VectorTransferOpTransforms.cpp |   8 +-
 mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp |  25 ++--
 .../Vector/vector-transfer-flatten.mlir   | 108 +-
 4 files changed, 120 insertions(+), 75 deletions(-)

diff --git a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h 
b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
index 6609b28d77b6c..ed06d7a029494 100644
--- a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
+++ b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
@@ -49,35 +49,37 @@ FailureOr> 
isTranspose2DSlice(vector::TransposeOp op);
 
 /// Return true if `vectorType` is a contiguous slice of `memrefType`.
 ///
-/// Only the N = vectorType.getRank() trailing dims of `memrefType` are
-/// checked (the other dims are not relevant). Note that for `vectorType` to be
-/// a contiguous slice of `memrefType`, the trailing dims of the latter have
-/// to be contiguous - this is checked by looking at the corresponding strides.
+/// The leading unit dimensions of the vector type are ignored as they
+/// are not relevant to the result. Let N be the number of the vector
+/// dimensions after ignoring a leading sequence of unit ones.
 ///
-/// There might be some restriction on the leading dim of `VectorType`:
+/// For `vectorType` to be a contiguous slice of `memrefType`
+///   a) the N trailing dimensions of the latter must be contiguous, and
+///   b) the trailing N dimensions of `vectorType` and `memrefType`,
+///  except the first of them, must match.
 ///
-/// Case 1. If all the trailing dims of `vectorType` match the trailing dims
-/// of `memrefType` then the leading dim of `vectorType` can be
-/// arbitrary.
-///
-///Ex. 1.1 contiguous slice, perfect match
-///  vector<4x3x2xi32> from memref<5x4x3x2xi32>
-///Ex. 1.2 contiguous slice, the leading dim does not match (2 != 4)
-///  vector<2x3x2xi32> from memref<5x4x3x2xi32>
-///
-/// Case 2. If an "internal" dim of `vectorType` does not match the
-/// corresponding trailing dim in `memrefType` then the remaining
-/// leading dims of `vectorType` have to be 1 (the first non-matching
-/// dim can be arbitrary).
+/// Examples:
 ///
-///Ex. 2.1 non-contiguous slice, 2 != 3 and the leading dim != <1>
-///  vector<2x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.2  contiguous slice, 2 != 3 and the leading dim == <1>
-///  vector<1x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.3. contiguous slice, 2 != 3 and the leading dims == <1x1>
-///  vector<1x1x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.4. non-contiguous slice, 2 != 3 and the leading dims != <1x1>
-/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>)
+///   Ex.1 contiguous slice, perfect match
+/// vector<4x3x2xi32> from memref<5x4x3x2xi32>
+///   Ex.2 contiguous slice, the leading dim does not match (2 != 4)
+/// vector<2x3x2xi32> from memref<5x4x3x2xi32>
+///   Ex.3 non-contiguous slice, 2 != 3
+/// vector<2x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.4 contiguous slice, leading unit dimension of the vector ignored,
+///2 != 3 (allowed)
+/// vector<1x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.5. contiguous slice, leasing two unit dims of the vector ignored,
+/// 2 != 3 (allowed)
+/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.6. non-contiguous slice, 2 != 3, no leading sequence of unit dims
+/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>)
+///   Ex.7 contiguous slice, memref needs to be contiguous only on the last
+///dimension
+/// vector<1x1x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>>
+///   Ex.8 non-contiguous slice, memref needs to be contiguous one the last
+///two dimensions, and it isn't
+/// vector<1x2x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>>
 bool isContiguo

[llvm-branch-commits] [flang] [Flang][OpenMP] - When mapping a `fir.boxchar`, map the underlying data pointer as a member (PR #141715)

2025-06-05 Thread Pranav Bhandarkar via llvm-branch-commits

https://github.com/bhandarkar-pranav updated 
https://github.com/llvm/llvm-project/pull/141715

>From 2d411fc5d24c7e3e933447307fc958b7e544490b Mon Sep 17 00:00:00 2001
From: Pranav Bhandarkar 
Date: Fri, 23 May 2025 10:26:14 -0500
Subject: [PATCH 1/5] Fix boxchar with firstprivate

---
 .../Optimizer/Builder/DirectivesCommon.h  | 85 +-
 flang/lib/Optimizer/Dialect/FIRType.cpp   |  3 +
 .../Optimizer/OpenMP/MapInfoFinalization.cpp  | 88 ++-
 .../OpenMP/MapsForPrivatizedSymbols.cpp   | 67 --
 .../Fir/convert-to-llvm-openmp-and-fir.fir| 27 ++
 flang/test/Lower/OpenMP/map-character.f90 | 23 +++--
 .../Lower/OpenMP/optional-argument-map-2.f90  | 63 +++--
 7 files changed, 297 insertions(+), 59 deletions(-)

diff --git a/flang/include/flang/Optimizer/Builder/DirectivesCommon.h 
b/flang/include/flang/Optimizer/Builder/DirectivesCommon.h
index 3f30c761acb4e..be11b9b5ede7c 100644
--- a/flang/include/flang/Optimizer/Builder/DirectivesCommon.h
+++ b/flang/include/flang/Optimizer/Builder/DirectivesCommon.h
@@ -91,6 +91,16 @@ inline AddrAndBoundsInfo 
getDataOperandBaseAddr(fir::FirOpBuilder &builder,
 
 return AddrAndBoundsInfo(symAddr, rawInput, isPresent, boxTy);
   }
+  // For boxchar references, do the same as what is done above for box
+  // references - Load the boxchar so that it is easier to retrieve the length
+  // of the underlying character and the data pointer.
+  if (auto boxCharType = mlir::dyn_cast(
+  fir::unwrapRefType((symAddr.getType() {
+if (!isOptional && mlir::isa(symAddr.getType())) {
+  mlir::Value boxChar = builder.create(loc, symAddr);
+  return AddrAndBoundsInfo(boxChar, rawInput, isPresent);
+}
+  }
   return AddrAndBoundsInfo(symAddr, rawInput, isPresent);
 }
 
@@ -137,26 +147,61 @@ template 
 mlir::Value
 genBoundsOpFromBoxChar(fir::FirOpBuilder &builder, mlir::Location loc,
fir::ExtendedValue dataExv, AddrAndBoundsInfo &info) {
-  // TODO: Handle info.isPresent.
-  if (auto boxCharType =
-  mlir::dyn_cast(info.addr.getType())) {
-mlir::Type idxTy = builder.getIndexType();
-mlir::Type lenType = builder.getCharacterLengthType();
+
+  if (!mlir::isa(fir::unwrapRefType(info.addr.getType(
+return mlir::Value{};
+
+  mlir::Type idxTy = builder.getIndexType();
+  mlir::Type lenType = builder.getCharacterLengthType();
+  mlir::Value zero = builder.createIntegerConstant(loc, idxTy, 0);
+  mlir::Value one = builder.createIntegerConstant(loc, idxTy, 1);
+  using ExtentAndStride = std::tuple;
+  auto [extent, stride] = [&]() -> ExtentAndStride {
+if (info.isPresent) {
+  llvm::SmallVector resTypes = {idxTy, idxTy};
+  mlir::Operation::result_range ifRes =
+  builder.genIfOp(loc, resTypes, info.isPresent, 
/*withElseRegion=*/true)
+  .genThen([&]() {
+mlir::Value boxChar =
+fir::isa_ref_type(info.addr.getType())
+? builder.create(loc, info.addr)
+: info.addr;
+fir::BoxCharType boxCharType =
+mlir::cast(boxChar.getType());
+mlir::Type refType = 
builder.getRefType(boxCharType.getEleTy());
+auto unboxed = builder.create(
+loc, refType, lenType, boxChar);
+mlir::SmallVector results = 
{unboxed.getResult(1), one };
+builder.create(loc, results);
+  })
+  .genElse([&]() {
+mlir::SmallVector results = {zero, zero };
+builder.create(loc, results); })
+  .getResults();
+  return {ifRes[0], ifRes[1]};
+}
+// We have already established that info.addr.getType() is a boxchar
+// or a boxchar address. If an address, load the boxchar.
+mlir::Value boxChar = fir::isa_ref_type(info.addr.getType())
+  ? builder.create(loc, info.addr)
+  : info.addr;
+fir::BoxCharType boxCharType =
+mlir::cast(boxChar.getType());
 mlir::Type refType = builder.getRefType(boxCharType.getEleTy());
 auto unboxed =
-builder.create(loc, refType, lenType, info.addr);
-mlir::Value zero = builder.createIntegerConstant(loc, idxTy, 0);
-mlir::Value one = builder.createIntegerConstant(loc, idxTy, 1);
-mlir::Value extent = unboxed.getResult(1);
-mlir::Value stride = one;
-mlir::Value ub = builder.create(loc, extent, one);
-mlir::Type boundTy = builder.getType();
-return builder.create(
-loc, boundTy, /*lower_bound=*/zero,
-/*upper_bound=*/ub, /*extent=*/extent, /*stride=*/stride,
-/*stride_in_bytes=*/true, /*start_idx=*/zero);
-  }
-  return mlir::Value{};
+builder.create(loc, refType, lenType, boxChar);
+return {unboxed.getResult(1), one};
+  }();
+
+  mlir::Value ub = builder.create(loc, extent, one);
+  mlir::Type boundTy = builder.getType()

[llvm-branch-commits] [llvm] [AMDGPU] Patterns for <2 x bfloat> fneg (fabs) (PR #142911)

2025-06-05 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/142911

>From c8524591999f495dd86261daecc44071737a227b Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Wed, 4 Jun 2025 23:49:43 -0700
Subject: [PATCH] [AMDGPU] Patterns for <2 x bfloat> fneg (fabs)

---
 llvm/lib/Target/AMDGPU/SIInstructions.td   | 11 +++
 llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll | 38 +-
 2 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td 
b/llvm/lib/Target/AMDGPU/SIInstructions.td
index a0285e3512a08..360fd05cb3d96 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -1840,22 +1840,21 @@ def : GCNPat <
   (UniformUnaryFrag (v2fp16vt SReg_32:$src)),
   (S_AND_B32 SReg_32:$src, (S_MOV_B32 (i32 0x7fff7fff)))
 >;
-}
 
 // This is really (fneg (fabs v2f16:$src))
 //
 // fabs is not reported as free because there is modifier for it in
 // VOP3P instructions, so it is turned into the bit op.
 def : GCNPat <
-  (UniformUnaryFrag (v2f16 (bitconvert (and_oneuse (i32 SReg_32:$src), 
0x7fff7fff,
+  (UniformUnaryFrag (v2fp16vt (bitconvert (and_oneuse (i32 
SReg_32:$src), 0x7fff7fff,
   (S_OR_B32 SReg_32:$src, (S_MOV_B32 (i32 0x80008000))) // Set sign bit
 >;
 
 def : GCNPat <
-  (UniformUnaryFrag (v2f16 (fabs SReg_32:$src))),
+  (UniformUnaryFrag (v2fp16vt (fabs SReg_32:$src))),
   (S_OR_B32 SReg_32:$src, (S_MOV_B32 (i32 0x80008000))) // Set sign bit
 >;
-
+}
 
 // COPY_TO_REGCLASS is needed to avoid using SCC from S_XOR_B32 instead
 // of the real value.
@@ -1986,12 +1985,12 @@ def : GCNPat <
   (fabs (v2fp16vt VGPR_32:$src)),
   (V_AND_B32_e64 (S_MOV_B32 (i32 0x7fff7fff)), VGPR_32:$src)
 >;
-}
 
 def : GCNPat <
-  (fneg (v2f16 (fabs VGPR_32:$src))),
+  (fneg (v2fp16vt (fabs VGPR_32:$src))),
   (V_OR_B32_e64 (S_MOV_B32 (i32 0x80008000)), VGPR_32:$src)
 >;
+}
 
 def : GCNPat <
   (fabs (f64 VReg_64:$src)),
diff --git a/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll 
b/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll
index 243469d39cc11..d189b6d4c1e83 100644
--- a/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll
+++ b/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll
@@ -523,8 +523,7 @@ define amdgpu_kernel void 
@s_fneg_fabs_v2bf16_non_bc_src(ptr addrspace(1) %out,
 ; VI-NEXT:v_cndmask_b32_e32 v1, v2, v3, vcc
 ; VI-NEXT:v_lshrrev_b32_e32 v1, 16, v1
 ; VI-NEXT:v_alignbit_b32 v0, v1, v0, 16
-; VI-NEXT:v_and_b32_e32 v0, 0x7fff7fff, v0
-; VI-NEXT:v_xor_b32_e32 v2, 0x80008000, v0
+; VI-NEXT:v_or_b32_e32 v2, 0x80008000, v0
 ; VI-NEXT:v_mov_b32_e32 v0, s0
 ; VI-NEXT:v_mov_b32_e32 v1, s1
 ; VI-NEXT:flat_store_dword v[0:1], v2
@@ -556,8 +555,7 @@ define amdgpu_kernel void 
@s_fneg_fabs_v2bf16_non_bc_src(ptr addrspace(1) %out,
 ; GFX9-NEXT:v_lshrrev_b32_e32 v1, 16, v1
 ; GFX9-NEXT:v_and_b32_sdwa v2, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD 
src0_sel:DWORD src1_sel:WORD_1
 ; GFX9-NEXT:v_lshl_or_b32 v1, v1, 16, v2
-; GFX9-NEXT:v_and_b32_e32 v1, 0x7fff7fff, v1
-; GFX9-NEXT:v_xor_b32_e32 v1, 0x80008000, v1
+; GFX9-NEXT:v_or_b32_e32 v1, 0x80008000, v1
 ; GFX9-NEXT:global_store_dword v0, v1, s[0:1]
 ; GFX9-NEXT:s_endpgm
 ;
@@ -590,9 +588,9 @@ define amdgpu_kernel void 
@s_fneg_fabs_v2bf16_non_bc_src(ptr addrspace(1) %out,
 ; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | 
instid1(VALU_DEP_1)
 ; GFX11-NEXT:v_lshrrev_b32_e32 v1, 16, v1
 ; GFX11-NEXT:v_lshl_or_b32 v0, v1, 16, v0
-; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | 
instid1(VALU_DEP_1)
-; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_and_b32 v0, 0x7fff7fff, v0
-; GFX11-NEXT:v_xor_b32_e32 v0, 0x80008000, v0
+; GFX11-NEXT:v_mov_b32_e32 v1, 0
+; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT:v_or_b32_e32 v0, 0x80008000, v0
 ; GFX11-NEXT:s_waitcnt lgkmcnt(0)
 ; GFX11-NEXT:global_store_b32 v1, v0, s[0:1]
 ; GFX11-NEXT:s_endpgm
@@ -634,8 +632,7 @@ define amdgpu_kernel void @s_fneg_fabs_v2bf16_bc_src(ptr 
addrspace(1) %out, <2 x
 ; VI-NEXT:s_mov_b32 flat_scratch_lo, s13
 ; VI-NEXT:s_lshr_b32 flat_scratch_hi, s12, 8
 ; VI-NEXT:s_waitcnt lgkmcnt(0)
-; VI-NEXT:s_and_b32 s2, s2, 0x7fff7fff
-; VI-NEXT:s_xor_b32 s2, s2, 0x80008000
+; VI-NEXT:s_or_b32 s2, s2, 0x80008000
 ; VI-NEXT:v_mov_b32_e32 v0, s0
 ; VI-NEXT:v_mov_b32_e32 v1, s1
 ; VI-NEXT:v_mov_b32_e32 v2, s2
@@ -648,8 +645,7 @@ define amdgpu_kernel void @s_fneg_fabs_v2bf16_bc_src(ptr 
addrspace(1) %out, <2 x
 ; GFX9-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x0
 ; GFX9-NEXT:v_mov_b32_e32 v0, 0
 ; GFX9-NEXT:s_waitcnt lgkmcnt(0)
-; GFX9-NEXT:s_and_b32 s2, s2, 0x7fff7fff
-; GFX9-NEXT:s_xor_b32 s2, s2, 0x80008000
+; GFX9-NEXT:s_or_b32 s2, s2, 0x80008000
 ; GFX9-NEXT:v_mov_b32_e32 v1, s2
 ; GFX9-NEXT:global_store_dword v0, v1, s[0:1]
 ; GFX9-NEXT:s_endpgm
@@ -660,9 +656,8 @@ define amdgpu_kernel voi

[llvm-branch-commits] [llvm] [AMDGPU] Baseline fneg-fabs.bf16.ll tests. NFC. (PR #142910)

2025-06-05 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/142910

>From 641fb5022daeca9b71527e18ea2df7982856a105 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Wed, 4 Jun 2025 23:46:28 -0700
Subject: [PATCH] [AMDGPU] Baseline fneg-fabs.bf16.ll tests. NFC.

---
 llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll | 1223 
 1 file changed, 1223 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll

diff --git a/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll 
b/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll
new file mode 100644
index 0..243469d39cc11
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll
@@ -0,0 +1,1223 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 2
+; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=kaveri < %s | FileCheck 
--check-prefixes=CIVI,CI %s
+; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=tonga < %s | FileCheck 
--check-prefixes=CIVI,VI %s
+; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx900 < %s | FileCheck 
--check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -mattr=+real-true16 < %s | 
FileCheck --check-prefixes=GFX11,GFX11-TRUE16 %s
+; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -mattr=-real-true16 < %s | 
FileCheck --check-prefixes=GFX11,GFX11-FAKE16 %s
+
+define amdgpu_kernel void @fneg_fabs_fadd_bf16(ptr addrspace(1) %out, bfloat 
%x, bfloat %y) {
+; CI-LABEL: fneg_fabs_fadd_bf16:
+; CI:   ; %bb.0:
+; CI-NEXT:s_load_dword s2, s[8:9], 0x2
+; CI-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x0
+; CI-NEXT:s_add_i32 s12, s12, s17
+; CI-NEXT:s_mov_b32 flat_scratch_lo, s13
+; CI-NEXT:s_lshr_b32 flat_scratch_hi, s12, 8
+; CI-NEXT:s_waitcnt lgkmcnt(0)
+; CI-NEXT:s_and_b32 s3, s2, 0x7fff
+; CI-NEXT:s_lshl_b32 s3, s3, 16
+; CI-NEXT:s_and_b32 s2, s2, 0x
+; CI-NEXT:v_mov_b32_e32 v0, s3
+; CI-NEXT:v_sub_f32_e32 v0, s2, v0
+; CI-NEXT:v_lshrrev_b32_e32 v2, 16, v0
+; CI-NEXT:v_mov_b32_e32 v0, s0
+; CI-NEXT:v_mov_b32_e32 v1, s1
+; CI-NEXT:flat_store_short v[0:1], v2
+; CI-NEXT:s_endpgm
+;
+; VI-LABEL: fneg_fabs_fadd_bf16:
+; VI:   ; %bb.0:
+; VI-NEXT:s_load_dword s2, s[8:9], 0x8
+; VI-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x0
+; VI-NEXT:s_add_i32 s12, s12, s17
+; VI-NEXT:s_mov_b32 flat_scratch_lo, s13
+; VI-NEXT:s_lshr_b32 flat_scratch_hi, s12, 8
+; VI-NEXT:s_waitcnt lgkmcnt(0)
+; VI-NEXT:s_and_b32 s3, s2, 0x7fff
+; VI-NEXT:s_lshl_b32 s3, s3, 16
+; VI-NEXT:s_and_b32 s2, s2, 0x
+; VI-NEXT:v_mov_b32_e32 v0, s3
+; VI-NEXT:v_sub_f32_e32 v0, s2, v0
+; VI-NEXT:v_bfe_u32 v1, v0, 16, 1
+; VI-NEXT:v_add_u32_e32 v1, vcc, v1, v0
+; VI-NEXT:v_add_u32_e32 v1, vcc, 0x7fff, v1
+; VI-NEXT:v_or_b32_e32 v2, 0x40, v0
+; VI-NEXT:v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT:v_cndmask_b32_e32 v0, v1, v2, vcc
+; VI-NEXT:v_lshrrev_b32_e32 v2, 16, v0
+; VI-NEXT:v_mov_b32_e32 v0, s0
+; VI-NEXT:v_mov_b32_e32 v1, s1
+; VI-NEXT:flat_store_short v[0:1], v2
+; VI-NEXT:s_endpgm
+;
+; GFX9-LABEL: fneg_fabs_fadd_bf16:
+; GFX9:   ; %bb.0:
+; GFX9-NEXT:s_load_dword s2, s[8:9], 0x8
+; GFX9-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x0
+; GFX9-NEXT:v_mov_b32_e32 v0, 0
+; GFX9-NEXT:s_waitcnt lgkmcnt(0)
+; GFX9-NEXT:s_and_b32 s3, s2, 0x7fff
+; GFX9-NEXT:s_lshl_b32 s3, s3, 16
+; GFX9-NEXT:s_and_b32 s2, s2, 0x
+; GFX9-NEXT:v_mov_b32_e32 v1, s3
+; GFX9-NEXT:v_sub_f32_e32 v1, s2, v1
+; GFX9-NEXT:v_bfe_u32 v2, v1, 16, 1
+; GFX9-NEXT:v_add_u32_e32 v2, v2, v1
+; GFX9-NEXT:v_or_b32_e32 v3, 0x40, v1
+; GFX9-NEXT:v_add_u32_e32 v2, 0x7fff, v2
+; GFX9-NEXT:v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT:v_cndmask_b32_e32 v1, v2, v3, vcc
+; GFX9-NEXT:global_store_short_d16_hi v0, v1, s[0:1]
+; GFX9-NEXT:s_endpgm
+;
+; GFX11-TRUE16-LABEL: fneg_fabs_fadd_bf16:
+; GFX11-TRUE16:   ; %bb.0:
+; GFX11-TRUE16-NEXT:s_load_b32 s0, s[4:5], 0x8
+; GFX11-TRUE16-NEXT:s_waitcnt lgkmcnt(0)
+; GFX11-TRUE16-NEXT:s_mov_b32 s1, s0
+; GFX11-TRUE16-NEXT:s_and_b32 s0, s0, 0x
+; GFX11-TRUE16-NEXT:s_and_b32 s1, s1, 0x7fff
+; GFX11-TRUE16-NEXT:s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | 
instid1(SALU_CYCLE_1)
+; GFX11-TRUE16-NEXT:s_lshl_b32 s1, s1, 16
+; GFX11-TRUE16-NEXT:v_sub_f32_e64 v0, s0, s1
+; GFX11-TRUE16-NEXT:s_load_b64 s[0:1], s[4:5], 0x0
+; GFX11-TRUE16-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | 
instid1(VALU_DEP_3)
+; GFX11-TRUE16-NEXT:v_bfe_u32 v1, v0, 16, 1
+; GFX11-TRUE16-NEXT:v_or_b32_e32 v2, 0x40, v0
+; GFX11-TRUE16-NEXT:v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-TRUE16-NEXT:v_add_nc_u32_e32 v1, v1, v0
+; GFX11-TRUE16-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | 
instid1(VALU_DEP_1)
+; GFX11-TRUE16-NEXT:v_add_nc_u32_e32 v1, 0x7fff, v1
+; GFX11-TRUE16-NEXT:v_dual_mov_b32

[llvm-branch-commits] [llvm] [AMDGPU] Baseline fneg-fabs.bf16.ll tests. NFC. (PR #142910)

2025-06-05 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/142910

>From 641fb5022daeca9b71527e18ea2df7982856a105 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Wed, 4 Jun 2025 23:46:28 -0700
Subject: [PATCH] [AMDGPU] Baseline fneg-fabs.bf16.ll tests. NFC.

---
 llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll | 1223 
 1 file changed, 1223 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll

diff --git a/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll 
b/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll
new file mode 100644
index 0..243469d39cc11
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll
@@ -0,0 +1,1223 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 2
+; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=kaveri < %s | FileCheck 
--check-prefixes=CIVI,CI %s
+; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=tonga < %s | FileCheck 
--check-prefixes=CIVI,VI %s
+; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx900 < %s | FileCheck 
--check-prefixes=GFX9 %s
+; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -mattr=+real-true16 < %s | 
FileCheck --check-prefixes=GFX11,GFX11-TRUE16 %s
+; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -mattr=-real-true16 < %s | 
FileCheck --check-prefixes=GFX11,GFX11-FAKE16 %s
+
+define amdgpu_kernel void @fneg_fabs_fadd_bf16(ptr addrspace(1) %out, bfloat 
%x, bfloat %y) {
+; CI-LABEL: fneg_fabs_fadd_bf16:
+; CI:   ; %bb.0:
+; CI-NEXT:s_load_dword s2, s[8:9], 0x2
+; CI-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x0
+; CI-NEXT:s_add_i32 s12, s12, s17
+; CI-NEXT:s_mov_b32 flat_scratch_lo, s13
+; CI-NEXT:s_lshr_b32 flat_scratch_hi, s12, 8
+; CI-NEXT:s_waitcnt lgkmcnt(0)
+; CI-NEXT:s_and_b32 s3, s2, 0x7fff
+; CI-NEXT:s_lshl_b32 s3, s3, 16
+; CI-NEXT:s_and_b32 s2, s2, 0x
+; CI-NEXT:v_mov_b32_e32 v0, s3
+; CI-NEXT:v_sub_f32_e32 v0, s2, v0
+; CI-NEXT:v_lshrrev_b32_e32 v2, 16, v0
+; CI-NEXT:v_mov_b32_e32 v0, s0
+; CI-NEXT:v_mov_b32_e32 v1, s1
+; CI-NEXT:flat_store_short v[0:1], v2
+; CI-NEXT:s_endpgm
+;
+; VI-LABEL: fneg_fabs_fadd_bf16:
+; VI:   ; %bb.0:
+; VI-NEXT:s_load_dword s2, s[8:9], 0x8
+; VI-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x0
+; VI-NEXT:s_add_i32 s12, s12, s17
+; VI-NEXT:s_mov_b32 flat_scratch_lo, s13
+; VI-NEXT:s_lshr_b32 flat_scratch_hi, s12, 8
+; VI-NEXT:s_waitcnt lgkmcnt(0)
+; VI-NEXT:s_and_b32 s3, s2, 0x7fff
+; VI-NEXT:s_lshl_b32 s3, s3, 16
+; VI-NEXT:s_and_b32 s2, s2, 0x
+; VI-NEXT:v_mov_b32_e32 v0, s3
+; VI-NEXT:v_sub_f32_e32 v0, s2, v0
+; VI-NEXT:v_bfe_u32 v1, v0, 16, 1
+; VI-NEXT:v_add_u32_e32 v1, vcc, v1, v0
+; VI-NEXT:v_add_u32_e32 v1, vcc, 0x7fff, v1
+; VI-NEXT:v_or_b32_e32 v2, 0x40, v0
+; VI-NEXT:v_cmp_u_f32_e32 vcc, v0, v0
+; VI-NEXT:v_cndmask_b32_e32 v0, v1, v2, vcc
+; VI-NEXT:v_lshrrev_b32_e32 v2, 16, v0
+; VI-NEXT:v_mov_b32_e32 v0, s0
+; VI-NEXT:v_mov_b32_e32 v1, s1
+; VI-NEXT:flat_store_short v[0:1], v2
+; VI-NEXT:s_endpgm
+;
+; GFX9-LABEL: fneg_fabs_fadd_bf16:
+; GFX9:   ; %bb.0:
+; GFX9-NEXT:s_load_dword s2, s[8:9], 0x8
+; GFX9-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x0
+; GFX9-NEXT:v_mov_b32_e32 v0, 0
+; GFX9-NEXT:s_waitcnt lgkmcnt(0)
+; GFX9-NEXT:s_and_b32 s3, s2, 0x7fff
+; GFX9-NEXT:s_lshl_b32 s3, s3, 16
+; GFX9-NEXT:s_and_b32 s2, s2, 0x
+; GFX9-NEXT:v_mov_b32_e32 v1, s3
+; GFX9-NEXT:v_sub_f32_e32 v1, s2, v1
+; GFX9-NEXT:v_bfe_u32 v2, v1, 16, 1
+; GFX9-NEXT:v_add_u32_e32 v2, v2, v1
+; GFX9-NEXT:v_or_b32_e32 v3, 0x40, v1
+; GFX9-NEXT:v_add_u32_e32 v2, 0x7fff, v2
+; GFX9-NEXT:v_cmp_u_f32_e32 vcc, v1, v1
+; GFX9-NEXT:v_cndmask_b32_e32 v1, v2, v3, vcc
+; GFX9-NEXT:global_store_short_d16_hi v0, v1, s[0:1]
+; GFX9-NEXT:s_endpgm
+;
+; GFX11-TRUE16-LABEL: fneg_fabs_fadd_bf16:
+; GFX11-TRUE16:   ; %bb.0:
+; GFX11-TRUE16-NEXT:s_load_b32 s0, s[4:5], 0x8
+; GFX11-TRUE16-NEXT:s_waitcnt lgkmcnt(0)
+; GFX11-TRUE16-NEXT:s_mov_b32 s1, s0
+; GFX11-TRUE16-NEXT:s_and_b32 s0, s0, 0x
+; GFX11-TRUE16-NEXT:s_and_b32 s1, s1, 0x7fff
+; GFX11-TRUE16-NEXT:s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | 
instid1(SALU_CYCLE_1)
+; GFX11-TRUE16-NEXT:s_lshl_b32 s1, s1, 16
+; GFX11-TRUE16-NEXT:v_sub_f32_e64 v0, s0, s1
+; GFX11-TRUE16-NEXT:s_load_b64 s[0:1], s[4:5], 0x0
+; GFX11-TRUE16-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | 
instid1(VALU_DEP_3)
+; GFX11-TRUE16-NEXT:v_bfe_u32 v1, v0, 16, 1
+; GFX11-TRUE16-NEXT:v_or_b32_e32 v2, 0x40, v0
+; GFX11-TRUE16-NEXT:v_cmp_u_f32_e32 vcc_lo, v0, v0
+; GFX11-TRUE16-NEXT:v_add_nc_u32_e32 v1, v1, v0
+; GFX11-TRUE16-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | 
instid1(VALU_DEP_1)
+; GFX11-TRUE16-NEXT:v_add_nc_u32_e32 v1, 0x7fff, v1
+; GFX11-TRUE16-NEXT:v_dual_mov_b32

[llvm-branch-commits] [llvm] [AMDGPU] Patterns for <2 x bfloat> fneg (fabs) (PR #142911)

2025-06-05 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/142911

>From c8524591999f495dd86261daecc44071737a227b Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Wed, 4 Jun 2025 23:49:43 -0700
Subject: [PATCH] [AMDGPU] Patterns for <2 x bfloat> fneg (fabs)

---
 llvm/lib/Target/AMDGPU/SIInstructions.td   | 11 +++
 llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll | 38 +-
 2 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td 
b/llvm/lib/Target/AMDGPU/SIInstructions.td
index a0285e3512a08..360fd05cb3d96 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -1840,22 +1840,21 @@ def : GCNPat <
   (UniformUnaryFrag (v2fp16vt SReg_32:$src)),
   (S_AND_B32 SReg_32:$src, (S_MOV_B32 (i32 0x7fff7fff)))
 >;
-}
 
 // This is really (fneg (fabs v2f16:$src))
 //
 // fabs is not reported as free because there is modifier for it in
 // VOP3P instructions, so it is turned into the bit op.
 def : GCNPat <
-  (UniformUnaryFrag (v2f16 (bitconvert (and_oneuse (i32 SReg_32:$src), 
0x7fff7fff,
+  (UniformUnaryFrag (v2fp16vt (bitconvert (and_oneuse (i32 
SReg_32:$src), 0x7fff7fff,
   (S_OR_B32 SReg_32:$src, (S_MOV_B32 (i32 0x80008000))) // Set sign bit
 >;
 
 def : GCNPat <
-  (UniformUnaryFrag (v2f16 (fabs SReg_32:$src))),
+  (UniformUnaryFrag (v2fp16vt (fabs SReg_32:$src))),
   (S_OR_B32 SReg_32:$src, (S_MOV_B32 (i32 0x80008000))) // Set sign bit
 >;
-
+}
 
 // COPY_TO_REGCLASS is needed to avoid using SCC from S_XOR_B32 instead
 // of the real value.
@@ -1986,12 +1985,12 @@ def : GCNPat <
   (fabs (v2fp16vt VGPR_32:$src)),
   (V_AND_B32_e64 (S_MOV_B32 (i32 0x7fff7fff)), VGPR_32:$src)
 >;
-}
 
 def : GCNPat <
-  (fneg (v2f16 (fabs VGPR_32:$src))),
+  (fneg (v2fp16vt (fabs VGPR_32:$src))),
   (V_OR_B32_e64 (S_MOV_B32 (i32 0x80008000)), VGPR_32:$src)
 >;
+}
 
 def : GCNPat <
   (fabs (f64 VReg_64:$src)),
diff --git a/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll 
b/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll
index 243469d39cc11..d189b6d4c1e83 100644
--- a/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll
+++ b/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll
@@ -523,8 +523,7 @@ define amdgpu_kernel void 
@s_fneg_fabs_v2bf16_non_bc_src(ptr addrspace(1) %out,
 ; VI-NEXT:v_cndmask_b32_e32 v1, v2, v3, vcc
 ; VI-NEXT:v_lshrrev_b32_e32 v1, 16, v1
 ; VI-NEXT:v_alignbit_b32 v0, v1, v0, 16
-; VI-NEXT:v_and_b32_e32 v0, 0x7fff7fff, v0
-; VI-NEXT:v_xor_b32_e32 v2, 0x80008000, v0
+; VI-NEXT:v_or_b32_e32 v2, 0x80008000, v0
 ; VI-NEXT:v_mov_b32_e32 v0, s0
 ; VI-NEXT:v_mov_b32_e32 v1, s1
 ; VI-NEXT:flat_store_dword v[0:1], v2
@@ -556,8 +555,7 @@ define amdgpu_kernel void 
@s_fneg_fabs_v2bf16_non_bc_src(ptr addrspace(1) %out,
 ; GFX9-NEXT:v_lshrrev_b32_e32 v1, 16, v1
 ; GFX9-NEXT:v_and_b32_sdwa v2, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD 
src0_sel:DWORD src1_sel:WORD_1
 ; GFX9-NEXT:v_lshl_or_b32 v1, v1, 16, v2
-; GFX9-NEXT:v_and_b32_e32 v1, 0x7fff7fff, v1
-; GFX9-NEXT:v_xor_b32_e32 v1, 0x80008000, v1
+; GFX9-NEXT:v_or_b32_e32 v1, 0x80008000, v1
 ; GFX9-NEXT:global_store_dword v0, v1, s[0:1]
 ; GFX9-NEXT:s_endpgm
 ;
@@ -590,9 +588,9 @@ define amdgpu_kernel void 
@s_fneg_fabs_v2bf16_non_bc_src(ptr addrspace(1) %out,
 ; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | 
instid1(VALU_DEP_1)
 ; GFX11-NEXT:v_lshrrev_b32_e32 v1, 16, v1
 ; GFX11-NEXT:v_lshl_or_b32 v0, v1, 16, v0
-; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | 
instid1(VALU_DEP_1)
-; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_and_b32 v0, 0x7fff7fff, v0
-; GFX11-NEXT:v_xor_b32_e32 v0, 0x80008000, v0
+; GFX11-NEXT:v_mov_b32_e32 v1, 0
+; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT:v_or_b32_e32 v0, 0x80008000, v0
 ; GFX11-NEXT:s_waitcnt lgkmcnt(0)
 ; GFX11-NEXT:global_store_b32 v1, v0, s[0:1]
 ; GFX11-NEXT:s_endpgm
@@ -634,8 +632,7 @@ define amdgpu_kernel void @s_fneg_fabs_v2bf16_bc_src(ptr 
addrspace(1) %out, <2 x
 ; VI-NEXT:s_mov_b32 flat_scratch_lo, s13
 ; VI-NEXT:s_lshr_b32 flat_scratch_hi, s12, 8
 ; VI-NEXT:s_waitcnt lgkmcnt(0)
-; VI-NEXT:s_and_b32 s2, s2, 0x7fff7fff
-; VI-NEXT:s_xor_b32 s2, s2, 0x80008000
+; VI-NEXT:s_or_b32 s2, s2, 0x80008000
 ; VI-NEXT:v_mov_b32_e32 v0, s0
 ; VI-NEXT:v_mov_b32_e32 v1, s1
 ; VI-NEXT:v_mov_b32_e32 v2, s2
@@ -648,8 +645,7 @@ define amdgpu_kernel void @s_fneg_fabs_v2bf16_bc_src(ptr 
addrspace(1) %out, <2 x
 ; GFX9-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x0
 ; GFX9-NEXT:v_mov_b32_e32 v0, 0
 ; GFX9-NEXT:s_waitcnt lgkmcnt(0)
-; GFX9-NEXT:s_and_b32 s2, s2, 0x7fff7fff
-; GFX9-NEXT:s_xor_b32 s2, s2, 0x80008000
+; GFX9-NEXT:s_or_b32 s2, s2, 0x80008000
 ; GFX9-NEXT:v_mov_b32_e32 v1, s2
 ; GFX9-NEXT:global_store_dword v0, v1, s[0:1]
 ; GFX9-NEXT:s_endpgm
@@ -660,9 +656,8 @@ define amdgpu_kernel voi

[llvm-branch-commits] [llvm] [CI] Migrate to runtimes build (PR #142696)

2025-06-05 Thread Vlad Serebrennikov via llvm-branch-commits

Endilll wrote:

> > It doesn't relate to multilib, I understand that, but does it mean we're 
> > going to test more than one runtime or that we'll test the same runtime 
> > multiple ways?
> 
> It's runtimes that we test in multiple ways (`-std=c++26` and 
> `enable_modules=clang` currently). I felt multiconfig covered that and 
> couldn't really think of a better name. If anyone else has better ideas I'd 
> be happy to change it up.

Multiconfig in this context has some strong associations with CMake's Ninja 
Multi-Config generator for me. My suggestion is `needs_reconfig`.

https://github.com/llvm/llvm-project/pull/142696
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)

2025-06-05 Thread Pierre van Houtryve via llvm-branch-commits


@@ -137,7 +138,123 @@ class AMDGPURegBankLegalizeCombiner {
 return {MatchMI, MatchMI->getOperand(1).getReg()};
   }
 
+  std::tuple tryMatchRALFromUnmerge(Register Src) {
+auto *ReadAnyLane = MRI.getVRegDef(Src);

Pierre-vh wrote:

```suggestion
MachineInstr *ReadAnyLane = MRI.getVRegDef(Src);
```
I think we generally use `auto` only if the type is already in the RHS

https://github.com/llvm/llvm-project/pull/142789
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)

2025-06-05 Thread Pierre van Houtryve via llvm-branch-commits


@@ -137,7 +138,123 @@ class AMDGPURegBankLegalizeCombiner {
 return {MatchMI, MatchMI->getOperand(1).getReg()};
   }
 
+  std::tuple tryMatchRALFromUnmerge(Register Src) {

Pierre-vh wrote:

```suggestion
  std::pair tryMatchRALFromUnmerge(Register Src) {
```

https://github.com/llvm/llvm-project/pull/142789
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)

2025-06-05 Thread Pierre van Houtryve via llvm-branch-commits


@@ -137,7 +138,123 @@ class AMDGPURegBankLegalizeCombiner {
 return {MatchMI, MatchMI->getOperand(1).getReg()};
   }
 
+  std::tuple tryMatchRALFromUnmerge(Register Src) {
+auto *ReadAnyLane = MRI.getVRegDef(Src);
+if (ReadAnyLane->getOpcode() == AMDGPU::G_AMDGPU_READANYLANE) {
+  Register RALSrc = ReadAnyLane->getOperand(1).getReg();
+  auto *UnMerge = getOpcodeDef(RALSrc, MRI);
+  if (UnMerge)
+return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)};
+}
+return {nullptr, -1};
+  }
+
+  Register getReadAnyLaneSrc(Register Src) {
+// Src = G_AMDGPU_READANYLANE RALSrc
+auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE);
+if (RAL)
+  return RALSrc;
+
+// LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc
+// LoSgpr = G_AMDGPU_READANYLANE LoVgpr
+// HiSgpr = G_AMDGPU_READANYLANE HiVgpr
+// Src G_MERGE_VALUES LoSgpr, HiSgpr
+auto *Merge = getOpcodeDef(Src, MRI);
+if (Merge) {
+  unsigned NumElts = Merge->getNumSources();
+  auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0));
+  if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0)
+return {};
+
+  // check if all elements are from same unmerge and there is no shuffling
+  for (unsigned i = 1; i < NumElts; ++i) {
+auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i));
+if (UnmergeI != Unmerge || (unsigned)IdxI != i)
+  return {};
+  }
+  return Unmerge->getSourceReg();
+}
+
+// ..., VgprI, ... = G_UNMERGE_VALUES VgprLarge
+// SgprI = G_AMDGPU_READANYLANE VgprI
+// SgprLarge G_MERGE_VALUES ..., SgprI, ...
+// ..., Src, ... = G_UNMERGE_VALUES SgprLarge
+auto *UnMerge = getOpcodeDef(Src, MRI);
+if (UnMerge) {
+  int Idx = UnMerge->findRegisterDefOperandIdx(Src, nullptr);
+  auto *Merge = getOpcodeDef(UnMerge->getSourceReg(), 
MRI);
+  if (Merge) {
+auto [RAL, RALSrc] =
+tryMatch(Merge->getSourceReg(Idx), AMDGPU::G_AMDGPU_READANYLANE);
+if (RAL)
+  return RALSrc;
+  }
+}
+
+return {};
+  }
+
+  bool tryEliminateReadAnyLane(MachineInstr &Copy) {
+Register Dst = Copy.getOperand(0).getReg();
+Register Src = Copy.getOperand(1).getReg();
+if (!Src.isVirtual())
+  return false;
+
+Register RALDst = Src;
+MachineInstr &SrcMI = *MRI.getVRegDef(Src);
+if (SrcMI.getOpcode() == AMDGPU::G_BITCAST) {
+  RALDst = SrcMI.getOperand(1).getReg();
+}

Pierre-vh wrote:

```suggestion
if (SrcMI.getOpcode() == AMDGPU::G_BITCAST) 
  RALDst = SrcMI.getOperand(1).getReg();
```

nit: can we have other opcodes than bitcast and that'd matter, like inreg 
extensions, assert exts ?
It feels like we should have a helper for this somewhere

https://github.com/llvm/llvm-project/pull/142789
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)

2025-06-05 Thread Pierre van Houtryve via llvm-branch-commits


@@ -137,7 +138,123 @@ class AMDGPURegBankLegalizeCombiner {
 return {MatchMI, MatchMI->getOperand(1).getReg()};
   }
 
+  std::tuple tryMatchRALFromUnmerge(Register Src) {
+auto *ReadAnyLane = MRI.getVRegDef(Src);
+if (ReadAnyLane->getOpcode() == AMDGPU::G_AMDGPU_READANYLANE) {
+  Register RALSrc = ReadAnyLane->getOperand(1).getReg();
+  auto *UnMerge = getOpcodeDef(RALSrc, MRI);
+  if (UnMerge)

Pierre-vh wrote:

```suggestion
  if (auto *UnMerge = getOpcodeDef(RALSrc, MRI))
```

https://github.com/llvm/llvm-project/pull/142789
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)

2025-06-05 Thread Pierre van Houtryve via llvm-branch-commits


@@ -117,45 +117,73 @@ static LLT getReadAnyLaneSplitTy(LLT Ty) {
   return LLT::scalar(32);
 }
 
-static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc,
- const RegisterBankInfo &RBI);
+typedef std::functionhttps://github.com/llvm/llvm-project/pull/142790
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)

2025-06-05 Thread Pierre van Houtryve via llvm-branch-commits


@@ -117,45 +117,73 @@ static LLT getReadAnyLaneSplitTy(LLT Ty) {
   return LLT::scalar(32);
 }
 
-static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc,
- const RegisterBankInfo &RBI);
+typedef std::function
+ReadLaneFnTy;
+
+static Register buildReadLane(MachineIRBuilder &, Register,
+  const RegisterBankInfo &, ReadLaneFnTy);
 
 static void unmergeReadAnyLane(MachineIRBuilder &B,
SmallVectorImpl &SgprDstParts,
LLT UnmergeTy, Register VgprSrc,
-   const RegisterBankInfo &RBI) {
+   const RegisterBankInfo &RBI,
+   ReadLaneFnTy BuildRL) {
   const RegisterBank *VgprRB = &RBI.getRegBank(AMDGPU::VGPRRegBankID);
   auto Unmerge = B.buildUnmerge({VgprRB, UnmergeTy}, VgprSrc);
   for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) {
-SgprDstParts.push_back(buildReadAnyLane(B, Unmerge.getReg(i), RBI));
+SgprDstParts.push_back(buildReadLane(B, Unmerge.getReg(i), RBI, BuildRL));
   }
 }
 
-static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc,
- const RegisterBankInfo &RBI) {
+static Register buildReadLane(MachineIRBuilder &B, Register VgprSrc,
+  const RegisterBankInfo &RBI,
+  ReadLaneFnTy BuildRL) {
   LLT Ty = B.getMRI()->getType(VgprSrc);
   const RegisterBank *SgprRB = &RBI.getRegBank(AMDGPU::SGPRRegBankID);
   if (Ty.getSizeInBits() == 32) {
-return B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {{SgprRB, Ty}}, 
{VgprSrc})
-.getReg(0);
+Register SgprDst = B.getMRI()->createVirtualRegister({SgprRB, Ty});
+return BuildRL(B, SgprDst, VgprSrc).getReg(0);
   }
 
   SmallVector SgprDstParts;
-  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI);
+  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI,
+ BuildRL);
 
   return B.buildMergeLikeInstr({SgprRB, Ty}, SgprDstParts).getReg(0);
 }
 
-void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst,
-  Register VgprSrc, const RegisterBankInfo &RBI) {
+static void buildReadLane(MachineIRBuilder &B, Register SgprDst,
+  Register VgprSrc, const RegisterBankInfo &RBI,
+  ReadLaneFnTy BuildReadLane) {
   LLT Ty = B.getMRI()->getType(VgprSrc);
   if (Ty.getSizeInBits() == 32) {
-B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {SgprDst}, {VgprSrc});
+BuildReadLane(B, SgprDst, VgprSrc);
 return;
   }
 
   SmallVector SgprDstParts;
-  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI);
+  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI,
+ BuildReadLane);
 
   B.buildMergeLikeInstr(SgprDst, SgprDstParts).getReg(0);
 }
+
+void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst,
+  Register VgprSrc, const RegisterBankInfo &RBI) {
+  return buildReadLane(
+  B, SgprDst, VgprSrc, RBI,
+  [](MachineIRBuilder &B, Register SgprDst, Register VgprSrc) {
+return B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {SgprDst}, 
{VgprSrc});
+  });
+}
+
+void AMDGPU::buildReadFirstLane(MachineIRBuilder &B, Register SgprDst,
+Register VgprSrc, const RegisterBankInfo &RBI) 
{
+  return buildReadLane(
+  B, SgprDst, VgprSrc, RBI,
+  [](MachineIRBuilder &B, Register SgprDst, Register VgprSrc) {
+return B.buildIntrinsic(Intrinsic::amdgcn_readfirstlane, SgprDst)

Pierre-vh wrote:

Not for this PR, but we should really have an opcode for this too instead of 
having one being an intrinsic and one being a generic opcode

https://github.com/llvm/llvm-project/pull/142790
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)

2025-06-05 Thread Pierre van Houtryve via llvm-branch-commits


@@ -57,6 +57,226 @@ void 
RegBankLegalizeHelper::findRuleAndApplyMapping(MachineInstr &MI) {
   lower(MI, Mapping, WaterfallSgprs);
 }
 
+bool RegBankLegalizeHelper::executeInWaterfallLoop(
+MachineIRBuilder &B, iterator_range Range,
+SmallSet &SGPROperandRegs) {
+  // Track use registers which have already been expanded with a readfirstlane
+  // sequence. This may have multiple uses if moving a sequence.
+  DenseMap WaterfalledRegMap;
+
+  MachineBasicBlock &MBB = B.getMBB();
+  MachineFunction &MF = B.getMF();
+
+  const SIRegisterInfo *TRI = ST.getRegisterInfo();
+  const TargetRegisterClass *WaveRC = TRI->getWaveMaskRegClass();
+  unsigned MovExecOpc, MovExecTermOpc, XorTermOpc, AndSaveExecOpc, ExecReg;
+  if (ST.isWave32()) {
+MovExecOpc = AMDGPU::S_MOV_B32;
+MovExecTermOpc = AMDGPU::S_MOV_B32_term;
+XorTermOpc = AMDGPU::S_XOR_B32_term;
+AndSaveExecOpc = AMDGPU::S_AND_SAVEEXEC_B32;
+ExecReg = AMDGPU::EXEC_LO;
+  } else {
+MovExecOpc = AMDGPU::S_MOV_B64;
+MovExecTermOpc = AMDGPU::S_MOV_B64_term;
+XorTermOpc = AMDGPU::S_XOR_B64_term;
+AndSaveExecOpc = AMDGPU::S_AND_SAVEEXEC_B64;
+ExecReg = AMDGPU::EXEC;
+  }
+
+#ifndef NDEBUG
+  const int OrigRangeSize = std::distance(Range.begin(), Range.end());
+#endif
+
+  MachineRegisterInfo &MRI = *B.getMRI();
+  Register SaveExecReg = MRI.createVirtualRegister(WaveRC);
+  Register InitSaveExecReg = MRI.createVirtualRegister(WaveRC);
+
+  // Don't bother using generic instructions/registers for the exec mask.
+  B.buildInstr(TargetOpcode::IMPLICIT_DEF).addDef(InitSaveExecReg);
+
+  Register SavedExec = MRI.createVirtualRegister(WaveRC);
+
+  // To insert the loop we need to split the block. Move everything before
+  // this point to a new block, and insert a new empty block before this
+  // instruction.
+  MachineBasicBlock *LoopBB = MF.CreateMachineBasicBlock();
+  MachineBasicBlock *BodyBB = MF.CreateMachineBasicBlock();
+  MachineBasicBlock *RestoreExecBB = MF.CreateMachineBasicBlock();
+  MachineBasicBlock *RemainderBB = MF.CreateMachineBasicBlock();
+  MachineFunction::iterator MBBI(MBB);
+  ++MBBI;
+  MF.insert(MBBI, LoopBB);
+  MF.insert(MBBI, BodyBB);
+  MF.insert(MBBI, RestoreExecBB);
+  MF.insert(MBBI, RemainderBB);
+
+  LoopBB->addSuccessor(BodyBB);
+  BodyBB->addSuccessor(RestoreExecBB);
+  BodyBB->addSuccessor(LoopBB);
+
+  // Move the rest of the block into a new block.
+  RemainderBB->transferSuccessorsAndUpdatePHIs(&MBB);
+  RemainderBB->splice(RemainderBB->begin(), &MBB, Range.end(), MBB.end());
+
+  MBB.addSuccessor(LoopBB);
+  RestoreExecBB->addSuccessor(RemainderBB);
+
+  B.setInsertPt(*LoopBB, LoopBB->end());
+
+  // +-MBB:+
+  // | ... |
+  // | %0 = G_INST_1   |
+  // | %Dst = MI %Vgpr |
+  // | %1 = G_INST_2   |
+  // | ... |
+  // +-+
+  // ->
+  // +-MBB---+
+  // | ...   |
+  // | %0 = G_INST_1 |
+  // | %SaveExecReg = S_MOV_B32 $exec_lo |
+  // +|--+
+  //  | 
/--|
+  //  VV   
|
+  // +-LoopBB---+  
|
+  // | %CurrentLaneReg:sgpr(s32) = READFIRSTLANE %Vgpr  |  
|
+  // |   instead of executing for each lane, see if other lanes had |  
|
+  // |   same value for %Vgpr and execute for them also.|  
|
+  // | %CondReg:vcc(s1) = G_ICMP eq %CurrentLaneReg, %Vgpr  |  
|
+  // | %CondRegLM:sreg_32 = ballot %CondReg // copy vcc to sreg32 lane mask |  
|
+  // | %SavedExec = S_AND_SAVEEXEC_B32 %CondRegLM   |  
|
+  // |   exec is active for lanes with the same "CurrentLane value" in Vgpr |  
|
+  // +|-+  
|
+  //  V
|
+  // +-BodyBB+ 
|
+  // | %Dst = MI %CurrentLaneReg:sgpr(s32)   | 
|
+  // |   executed only for active lanes and written to Dst   | 
|
+  // | $exec = S_XOR_B32 $exec, %SavedExec   | 
|
+  // |   set active lanes to 0 in SavedExec, lanes that did not write to | 
|
+  // |   Dst yet, and set this as new exec (for READFIRSTLANE and ICMP)  | 
|
+  // | SI_WATERFALL_LOOP LoopBB  
|-|
+  // +|--+
+  //  V
+  // +-RestoreExecBB--+
+  // | $exec_lo = S_MOV_B32_term %SaveExecReg |
+  // +|---+
+  //  V
+  // +-RemainderBB:--+
+  // 

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)

2025-06-05 Thread Pierre van Houtryve via llvm-branch-commits


@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
-; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -stop-after=regbankselect 
-regbankselect-fast -o - %s | FileCheck %s
-; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -stop-after=regbankselect 
-regbankselect-greedy -o - %s | FileCheck %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-mesa-mesa3d 
-stop-after=amdgpu-regbanklegalize -regbankselect-fast -o - %s | FileCheck %s

Pierre-vh wrote:

@arsenm Is it fine to move tests entirely to this new RBSelect, or should we 
keep coverage for both until the old RB is removed?

https://github.com/llvm/llvm-project/pull/142790
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)

2025-06-05 Thread Pierre van Houtryve via llvm-branch-commits


@@ -165,6 +165,8 @@ enum RegBankLLTMappingApplyID {
   Sgpr32Trunc,
 
   // Src only modifiers: waterfalls, extends
+  Sgpr32_W,
+  SgprV4S32_W,

Pierre-vh wrote:

Can you add a trailing comment or rename this ? The `_W` suffix is not 
immediately clear to me

https://github.com/llvm/llvm-project/pull/142790
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)

2025-06-05 Thread Pierre van Houtryve via llvm-branch-commits


@@ -57,6 +57,226 @@ void 
RegBankLegalizeHelper::findRuleAndApplyMapping(MachineInstr &MI) {
   lower(MI, Mapping, WaterfallSgprs);
 }
 
+bool RegBankLegalizeHelper::executeInWaterfallLoop(
+MachineIRBuilder &B, iterator_range Range,
+SmallSet &SGPROperandRegs) {
+  // Track use registers which have already been expanded with a readfirstlane
+  // sequence. This may have multiple uses if moving a sequence.
+  DenseMap WaterfalledRegMap;
+
+  MachineBasicBlock &MBB = B.getMBB();
+  MachineFunction &MF = B.getMF();
+
+  const SIRegisterInfo *TRI = ST.getRegisterInfo();
+  const TargetRegisterClass *WaveRC = TRI->getWaveMaskRegClass();
+  unsigned MovExecOpc, MovExecTermOpc, XorTermOpc, AndSaveExecOpc, ExecReg;
+  if (ST.isWave32()) {

Pierre-vh wrote:

nit: I think those could go in the class directly so this isn't repeated 
everytime no ?
The class is instantiated per function anyway

https://github.com/llvm/llvm-project/pull/142790
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)

2025-06-05 Thread Pierre van Houtryve via llvm-branch-commits


@@ -894,6 +1121,15 @@ void RegBankLegalizeHelper::applyMappingSrc(
   }
   break;
 }
+// sgpr waterfall, scalars and vectors
+case Sgpr32_W:
+case SgprV4S32_W: {
+  assert(Ty == getTyFromID(MethodIDs[i]));
+  if (RB != SgprRB) {
+SgprWaterfallOperandRegs.insert(Reg);
+  }

Pierre-vh wrote:

```suggestion
  if (RB != SgprRB)
SgprWaterfallOperandRegs.insert(Reg);
```

https://github.com/llvm/llvm-project/pull/142790
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CI] Migrate to runtimes build (PR #142696)

2025-06-05 Thread Aiden Grossman via llvm-branch-commits

https://github.com/boomanaiden154 updated 
https://github.com/llvm/llvm-project/pull/142696

>From 360e723b51ee201603f72b56859cd7c6d6faec24 Mon Sep 17 00:00:00 2001
From: Aiden Grossman 
Date: Thu, 5 Jun 2025 06:51:37 +
Subject: [PATCH 1/2] feedback

Created using spr 1.3.4
---
 .ci/compute_projects.py | 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/.ci/compute_projects.py b/.ci/compute_projects.py
index b12b729eadd3f..8134e1e2c29fb 100644
--- a/.ci/compute_projects.py
+++ b/.ci/compute_projects.py
@@ -145,22 +145,15 @@ def _add_dependencies(projects: Set[str], runtimes: 
Set[str]) -> Set[str]:
 
 
 def _exclude_projects(current_projects: Set[str], platform: str) -> Set[str]:
-new_project_set = set(current_projects)
 if platform == "Linux":
-for to_exclude in EXCLUDE_LINUX:
-if to_exclude in new_project_set:
-new_project_set.remove(to_exclude)
+to_exclude = EXCLUDE_LINUX
 elif platform == "Windows":
-for to_exclude in EXCLUDE_WINDOWS:
-if to_exclude in new_project_set:
-new_project_set.remove(to_exclude)
+to_exclude = EXCLUDE_WINDOWS
 elif platform == "Darwin":
-for to_exclude in EXCLUDE_MAC:
-if to_exclude in new_project_set:
-new_project_set.remove(to_exclude)
+to_exclude = EXCLUDE_MAC
 else:
-raise ValueError("Unexpected platform.")
-return new_project_set
+raise ValueError(f"Unexpected platform: {platform}")
+return current_projects.difference(to_exclude)
 
 
 def _compute_projects_to_test(modified_projects: Set[str], platform: str) -> 
Set[str]:

>From 26a48b3ba70c829862788335f4b5b610dfd5dd3a Mon Sep 17 00:00:00 2001
From: Aiden Grossman 
Date: Thu, 5 Jun 2025 08:55:00 +
Subject: [PATCH 2/2] feedback

Created using spr 1.3.4
---
 .ci/compute_projects.py | 20 ++--
 .ci/compute_projects_test.py| 32 
 .ci/monolithic-linux.sh |  8 
 .github/workflows/premerge.yaml |  4 ++--
 4 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/.ci/compute_projects.py b/.ci/compute_projects.py
index 8134e1e2c29fb..50a64cb15a937 100644
--- a/.ci/compute_projects.py
+++ b/.ci/compute_projects.py
@@ -66,7 +66,7 @@
 DEPENDENT_RUNTIMES_TO_TEST = {
 "clang": {"compiler-rt"},
 }
-DEPENDENT_RUNTIMES_TO_TEST_MULTICONFIG = {
+DEPENDENT_RUNTIMES_TO_TEST_NEEDS_RECONFIG = {
 "llvm": {"libcxx", "libcxxabi", "libunwind"},
 "clang": {"libcxx", "libcxxabi", "libunwind"},
 ".ci": {"libcxx", "libcxxabi", "libunwind"},
@@ -201,15 +201,15 @@ def _compute_runtimes_to_test(modified_projects: 
Set[str], platform: str) -> Set
 return _exclude_projects(runtimes_to_test, platform)
 
 
-def _compute_runtimes_to_test_multiconfig(
+def _compute_runtimes_to_test_needs_reconfig(
 modified_projects: Set[str], platform: str
 ) -> Set[str]:
 runtimes_to_test = set()
 for modified_project in modified_projects:
-if modified_project not in DEPENDENT_RUNTIMES_TO_TEST_MULTICONFIG:
+if modified_project not in DEPENDENT_RUNTIMES_TO_TEST_NEEDS_RECONFIG:
 continue
 runtimes_to_test.update(
-DEPENDENT_RUNTIMES_TO_TEST_MULTICONFIG[modified_project]
+DEPENDENT_RUNTIMES_TO_TEST_NEEDS_RECONFIG[modified_project]
 )
 return _exclude_projects(runtimes_to_test, platform)
 
@@ -246,17 +246,17 @@ def get_env_variables(modified_files: list[str], 
platform: str) -> Set[str]:
 modified_projects = _get_modified_projects(modified_files)
 projects_to_test = _compute_projects_to_test(modified_projects, platform)
 runtimes_to_test = _compute_runtimes_to_test(modified_projects, platform)
-runtimes_to_test_multiconfig = _compute_runtimes_to_test_multiconfig(
+runtimes_to_test_needs_reconfig = _compute_runtimes_to_test_needs_reconfig(
 modified_projects, platform
 )
 runtimes_to_build = _compute_runtimes_to_build(
-runtimes_to_test | runtimes_to_test_multiconfig, modified_projects, 
platform
+runtimes_to_test | runtimes_to_test_needs_reconfig, modified_projects, 
platform
 )
 projects_to_build = _compute_projects_to_build(projects_to_test, 
runtimes_to_build)
 projects_check_targets = _compute_project_check_targets(projects_to_test)
 runtimes_check_targets = _compute_project_check_targets(runtimes_to_test)
-runtimes_check_targets_multiconfig = _compute_project_check_targets(
-runtimes_to_test_multiconfig
+runtimes_check_targets_needs_reconfig = _compute_project_check_targets(
+runtimes_to_test_needs_reconfig
 )
 # We use a semicolon to separate the projects/runtimes as they get passed
 # to the CMake invocation and thus we need to use the CMake list separator
@@ -267,8 +267,8 @@ def get_env_variables(modified_files: list[str], platform: 
str) -> Set[str]:
 "project_chec

[llvm-branch-commits] [llvm] [CI] Migrate to runtimes build (PR #142696)

2025-06-05 Thread Aiden Grossman via llvm-branch-commits

boomanaiden154 wrote:

> Multiconfig in this context has some strong associations with CMake's Ninja 
> Multi-Config generator for me. My suggestion is needs_reconfig.

> Agree with needs_reconfig.

Updated. Thanks for the suggestion!

https://github.com/llvm/llvm-project/pull/142696
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [CI] Use LLVM_ENABLE_RUNTIMES for runtimes builds on Linux (PR #142694)

2025-06-05 Thread Aiden Grossman via llvm-branch-commits

boomanaiden154 wrote:

Branch seems to be cleaned up now.

https://github.com/llvm/llvm-project/pull/142694
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)

2025-06-05 Thread Pierre van Houtryve via llvm-branch-commits


@@ -57,6 +57,226 @@ void 
RegBankLegalizeHelper::findRuleAndApplyMapping(MachineInstr &MI) {
   lower(MI, Mapping, WaterfallSgprs);
 }
 
+bool RegBankLegalizeHelper::executeInWaterfallLoop(
+MachineIRBuilder &B, iterator_range Range,
+SmallSet &SGPROperandRegs) {
+  // Track use registers which have already been expanded with a readfirstlane
+  // sequence. This may have multiple uses if moving a sequence.
+  DenseMap WaterfalledRegMap;
+
+  MachineBasicBlock &MBB = B.getMBB();
+  MachineFunction &MF = B.getMF();
+
+  const SIRegisterInfo *TRI = ST.getRegisterInfo();
+  const TargetRegisterClass *WaveRC = TRI->getWaveMaskRegClass();
+  unsigned MovExecOpc, MovExecTermOpc, XorTermOpc, AndSaveExecOpc, ExecReg;
+  if (ST.isWave32()) {
+MovExecOpc = AMDGPU::S_MOV_B32;
+MovExecTermOpc = AMDGPU::S_MOV_B32_term;
+XorTermOpc = AMDGPU::S_XOR_B32_term;
+AndSaveExecOpc = AMDGPU::S_AND_SAVEEXEC_B32;
+ExecReg = AMDGPU::EXEC_LO;
+  } else {
+MovExecOpc = AMDGPU::S_MOV_B64;
+MovExecTermOpc = AMDGPU::S_MOV_B64_term;
+XorTermOpc = AMDGPU::S_XOR_B64_term;
+AndSaveExecOpc = AMDGPU::S_AND_SAVEEXEC_B64;
+ExecReg = AMDGPU::EXEC;
+  }
+
+#ifndef NDEBUG
+  const int OrigRangeSize = std::distance(Range.begin(), Range.end());
+#endif
+
+  MachineRegisterInfo &MRI = *B.getMRI();
+  Register SaveExecReg = MRI.createVirtualRegister(WaveRC);
+  Register InitSaveExecReg = MRI.createVirtualRegister(WaveRC);
+
+  // Don't bother using generic instructions/registers for the exec mask.
+  B.buildInstr(TargetOpcode::IMPLICIT_DEF).addDef(InitSaveExecReg);
+
+  Register SavedExec = MRI.createVirtualRegister(WaveRC);
+
+  // To insert the loop we need to split the block. Move everything before
+  // this point to a new block, and insert a new empty block before this
+  // instruction.
+  MachineBasicBlock *LoopBB = MF.CreateMachineBasicBlock();
+  MachineBasicBlock *BodyBB = MF.CreateMachineBasicBlock();
+  MachineBasicBlock *RestoreExecBB = MF.CreateMachineBasicBlock();
+  MachineBasicBlock *RemainderBB = MF.CreateMachineBasicBlock();
+  MachineFunction::iterator MBBI(MBB);
+  ++MBBI;
+  MF.insert(MBBI, LoopBB);
+  MF.insert(MBBI, BodyBB);
+  MF.insert(MBBI, RestoreExecBB);
+  MF.insert(MBBI, RemainderBB);
+
+  LoopBB->addSuccessor(BodyBB);
+  BodyBB->addSuccessor(RestoreExecBB);
+  BodyBB->addSuccessor(LoopBB);
+
+  // Move the rest of the block into a new block.
+  RemainderBB->transferSuccessorsAndUpdatePHIs(&MBB);
+  RemainderBB->splice(RemainderBB->begin(), &MBB, Range.end(), MBB.end());
+
+  MBB.addSuccessor(LoopBB);
+  RestoreExecBB->addSuccessor(RemainderBB);
+
+  B.setInsertPt(*LoopBB, LoopBB->end());
+
+  // +-MBB:+
+  // | ... |
+  // | %0 = G_INST_1   |
+  // | %Dst = MI %Vgpr |
+  // | %1 = G_INST_2   |
+  // | ... |
+  // +-+
+  // ->
+  // +-MBB---+
+  // | ...   |
+  // | %0 = G_INST_1 |
+  // | %SaveExecReg = S_MOV_B32 $exec_lo |
+  // +|--+
+  //  | 
/--|
+  //  VV   
|
+  // +-LoopBB---+  
|
+  // | %CurrentLaneReg:sgpr(s32) = READFIRSTLANE %Vgpr  |  
|
+  // |   instead of executing for each lane, see if other lanes had |  
|
+  // |   same value for %Vgpr and execute for them also.|  
|
+  // | %CondReg:vcc(s1) = G_ICMP eq %CurrentLaneReg, %Vgpr  |  
|
+  // | %CondRegLM:sreg_32 = ballot %CondReg // copy vcc to sreg32 lane mask |  
|
+  // | %SavedExec = S_AND_SAVEEXEC_B32 %CondRegLM   |  
|
+  // |   exec is active for lanes with the same "CurrentLane value" in Vgpr |  
|
+  // +|-+  
|
+  //  V
|
+  // +-BodyBB+ 
|
+  // | %Dst = MI %CurrentLaneReg:sgpr(s32)   | 
|
+  // |   executed only for active lanes and written to Dst   | 
|
+  // | $exec = S_XOR_B32 $exec, %SavedExec   | 
|
+  // |   set active lanes to 0 in SavedExec, lanes that did not write to | 
|
+  // |   Dst yet, and set this as new exec (for READFIRSTLANE and ICMP)  | 
|
+  // | SI_WATERFALL_LOOP LoopBB  
|-|
+  // +|--+
+  //  V
+  // +-RestoreExecBB--+
+  // | $exec_lo = S_MOV_B32_term %SaveExecReg |
+  // +|---+
+  //  V
+  // +-RemainderBB:--+
+  // 

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Add ISD::PTRADD DAG combines (PR #142739)

2025-06-05 Thread Fabian Ritter via llvm-branch-commits


@@ -2627,6 +2629,93 @@ SDValue DAGCombiner::foldSubToAvg(SDNode *N, const SDLoc 
&DL) {
   return SDValue();
 }
 
+/// Try to fold a pointer arithmetic node.
+/// This needs to be done separately from normal addition, because pointer
+/// addition is not commutative.
+SDValue DAGCombiner::visitPTRADD(SDNode *N) {
+  SDValue N0 = N->getOperand(0);
+  SDValue N1 = N->getOperand(1);
+  EVT PtrVT = N0.getValueType();
+  EVT IntVT = N1.getValueType();
+  SDLoc DL(N);
+
+  // This is already ensured by an assert in SelectionDAG::getNode(). Several
+  // combines here depend on this assumption.
+  assert(PtrVT == IntVT &&
+ "PTRADD with different operand types is not supported");
+
+  // fold (ptradd undef, y) -> undef
+  if (N0.isUndef())
+return N0;
+
+  // fold (ptradd x, undef) -> undef
+  if (N1.isUndef())
+return DAG.getUNDEF(PtrVT);
+
+  // fold (ptradd x, 0) -> x
+  if (isNullConstant(N1))
+return N0;
+
+  // fold (ptradd 0, x) -> x
+  if (isNullConstant(N0))
+return N1;
+
+  if (N0.getOpcode() == ISD::PTRADD &&

ritter-x2a wrote:

Indeed, I'll do that if we don't land on moving the target-specific combine 
below this one in [the other 
thread](https://app.graphite.dev/github/pr/llvm/llvm-project/142739/%5BAMDGPU%5D%5BSDAG%5D-Add-ISD-PTRADD-DAG-combines#comment-PRRC_kwDOBITxeM5-x5pQ).

https://github.com/llvm/llvm-project/pull/142739
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)

2025-06-05 Thread Pierre van Houtryve via llvm-branch-commits


@@ -137,7 +138,123 @@ class AMDGPURegBankLegalizeCombiner {
 return {MatchMI, MatchMI->getOperand(1).getReg()};
   }
 
+  std::tuple tryMatchRALFromUnmerge(Register Src) {
+auto *ReadAnyLane = MRI.getVRegDef(Src);
+if (ReadAnyLane->getOpcode() == AMDGPU::G_AMDGPU_READANYLANE) {
+  Register RALSrc = ReadAnyLane->getOperand(1).getReg();
+  auto *UnMerge = getOpcodeDef(RALSrc, MRI);
+  if (UnMerge)
+return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)};
+}
+return {nullptr, -1};
+  }
+
+  Register getReadAnyLaneSrc(Register Src) {
+// Src = G_AMDGPU_READANYLANE RALSrc
+auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE);
+if (RAL)
+  return RALSrc;
+
+// LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc
+// LoSgpr = G_AMDGPU_READANYLANE LoVgpr
+// HiSgpr = G_AMDGPU_READANYLANE HiVgpr
+// Src G_MERGE_VALUES LoSgpr, HiSgpr
+auto *Merge = getOpcodeDef(Src, MRI);
+if (Merge) {
+  unsigned NumElts = Merge->getNumSources();
+  auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0));
+  if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0)
+return {};
+
+  // check if all elements are from same unmerge and there is no shuffling
+  for (unsigned i = 1; i < NumElts; ++i) {
+auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i));
+if (UnmergeI != Unmerge || (unsigned)IdxI != i)
+  return {};
+  }
+  return Unmerge->getSourceReg();
+}
+
+// ..., VgprI, ... = G_UNMERGE_VALUES VgprLarge
+// SgprI = G_AMDGPU_READANYLANE VgprI
+// SgprLarge G_MERGE_VALUES ..., SgprI, ...
+// ..., Src, ... = G_UNMERGE_VALUES SgprLarge
+auto *UnMerge = getOpcodeDef(Src, MRI);
+if (UnMerge) {
+  int Idx = UnMerge->findRegisterDefOperandIdx(Src, nullptr);
+  auto *Merge = getOpcodeDef(UnMerge->getSourceReg(), 
MRI);
+  if (Merge) {
+auto [RAL, RALSrc] =
+tryMatch(Merge->getSourceReg(Idx), AMDGPU::G_AMDGPU_READANYLANE);
+if (RAL)
+  return RALSrc;
+  }
+}
+
+return {};
+  }
+
+  bool tryEliminateReadAnyLane(MachineInstr &Copy) {
+Register Dst = Copy.getOperand(0).getReg();
+Register Src = Copy.getOperand(1).getReg();
+if (!Src.isVirtual())
+  return false;
+
+Register RALDst = Src;
+MachineInstr &SrcMI = *MRI.getVRegDef(Src);
+if (SrcMI.getOpcode() == AMDGPU::G_BITCAST) {
+  RALDst = SrcMI.getOperand(1).getReg();
+}
+
+Register RALSrc = getReadAnyLaneSrc(RALDst);
+if (!RALSrc)
+  return false;
+
+if (Dst.isVirtual()) {
+  if (SrcMI.getOpcode() != AMDGPU::G_BITCAST) {
+// Src = READANYLANE RALSrc
+// Dst = Copy Src
+// ->
+// Dst = RALSrc
+MRI.replaceRegWith(Dst, RALSrc);

Pierre-vh wrote:

Just wondering, can we just emit a COPY instead and let another combine take 
care of the folding?
The two branches are very similar, it'd be nice to make this more terse. Maybe 
we could use a helper like `copyOrReplace` for `Dst` that does the right thing 
depending on whether `Dst` is virtual or not?

https://github.com/llvm/llvm-project/pull/142789
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/20.x: [clang-repl] Ensure clang-repl accepts all C keywords supported in all language models (#142749) (PR #142909)

2025-06-05 Thread Anutosh Bhat via llvm-branch-commits

anutosh491 wrote:

See https://github.com/llvm/llvm-project/pull/142933#issuecomment-2943354247 :(

https://github.com/llvm/llvm-project/pull/142909
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] LowerTypeTests: Avoid zext of ptrtoint ConstantExpr. (PR #142886)

2025-06-05 Thread Nikita Popov via llvm-branch-commits

https://github.com/nikic approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/142886
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/20.x: [clang-repl] Ensure clang-repl accepts all C keywords supported in all language models (#142749) (PR #142909)

2025-06-05 Thread Anutosh Bhat via llvm-branch-commits

https://github.com/anutosh491 converted_to_draft 
https://github.com/llvm/llvm-project/pull/142909
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [CI] Use LLVM_ENABLE_RUNTIMES for runtimes builds on Linux (PR #142694)

2025-06-05 Thread Louis Dionne via llvm-branch-commits


@@ -102,51 +102,25 @@ if [[ "${runtimes}" != "" ]]; then
 exit 1
   fi
 
-  echo "--- ninja install-clang"
-
-  ninja -C ${BUILD_DIR} install-clang install-clang-resource-headers
-
-  RUNTIMES_BUILD_DIR="${MONOREPO_ROOT}/build-runtimes"
-  INSTALL_DIR="${BUILD_DIR}/install"
-  mkdir -p ${RUNTIMES_BUILD_DIR}
-
   echo "--- cmake runtimes C++26"
 
-  rm -rf "${RUNTIMES_BUILD_DIR}"
-  cmake -S "${MONOREPO_ROOT}/runtimes" -B "${RUNTIMES_BUILD_DIR}" -GNinja \
-  -D CMAKE_C_COMPILER="${INSTALL_DIR}/bin/clang" \
-  -D CMAKE_CXX_COMPILER="${INSTALL_DIR}/bin/clang++" \
-  -D LLVM_ENABLE_RUNTIMES="${runtimes}" \
-  -D LIBCXX_CXX_ABI=libcxxabi \
-  -D CMAKE_BUILD_TYPE=RelWithDebInfo \
-  -D CMAKE_INSTALL_PREFIX="${INSTALL_DIR}" \
-  -D LIBCXX_TEST_PARAMS="std=c++26" \
-  -D LIBCXXABI_TEST_PARAMS="std=c++26" \
-  -D LLVM_LIT_ARGS="${lit_args}"
+  cmake \

ldionne wrote:

I think I don't quite understand what this change does. You're basically just 
re-generating the CMake cache and re-running `ninja` every time?

https://github.com/llvm/llvm-project/pull/142694
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [mlir] [OpenMP] Add directive spellings introduced in spec v6.0 (PR #141772)

2025-06-05 Thread Krzysztof Parzyszek via llvm-branch-commits

https://github.com/kparzysz closed 
https://github.com/llvm/llvm-project/pull/141772
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Add SimplifyTypeTests pass. (PR #141327)

2025-06-05 Thread Peter Collingbourne via llvm-branch-commits

https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/141327

>From b36c74c344ed47b99e9bfdc28f9081c3c704d8c7 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Tue, 27 May 2025 23:08:59 -0700
Subject: [PATCH] Format

Created using spr 1.3.6-beta.1
---
 llvm/lib/Transforms/IPO/LowerTypeTests.cpp | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp 
b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
index 907a664b0f936..26238acbb3f4d 100644
--- a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
+++ b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
@@ -2508,8 +2508,8 @@ PreservedAnalyses SimplifyTypeTestsPass::run(Module &M,
 };
 for (User *U : make_early_inc_range(GV.users())) {
   if (auto *CI = dyn_cast(U)) {
-if (CI->getPredicate() == CmpInst::ICMP_EQ && 
-MaySimplifyPtr(CI->getOperand(0))) {
+if (CI->getPredicate() == CmpInst::ICMP_EQ &&
+MaySimplifyPtr(CI->getOperand(0))) {
   // This is an equality comparison (TypeTestResolution::Single case in
   // lowerTypeTestCall). In this case we just replace the comparison
   // with true.
@@ -2538,8 +2538,8 @@ PreservedAnalyses SimplifyTypeTestsPass::run(Module &M,
 if (U.getOperandNo() == 1 && CI &&
 CI->getPredicate() == CmpInst::ICMP_EQ &&
 MaySimplifyInt(CI->getOperand(0))) {
-  // This is an equality comparison. Unlike in the case above it 
remained
-  // as an integer compare.
+  // This is an equality comparison. Unlike in the case above it
+  // remained as an integer compare.
   CI->replaceAllUsesWith(ConstantInt::getTrue(M.getContext()));
   CI->eraseFromParent();
   Changed = true;

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Add SimplifyTypeTests pass. (PR #141327)

2025-06-05 Thread Peter Collingbourne via llvm-branch-commits

https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/141327

>From b36c74c344ed47b99e9bfdc28f9081c3c704d8c7 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Tue, 27 May 2025 23:08:59 -0700
Subject: [PATCH] Format

Created using spr 1.3.6-beta.1
---
 llvm/lib/Transforms/IPO/LowerTypeTests.cpp | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp 
b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
index 907a664b0f936..26238acbb3f4d 100644
--- a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
+++ b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
@@ -2508,8 +2508,8 @@ PreservedAnalyses SimplifyTypeTestsPass::run(Module &M,
 };
 for (User *U : make_early_inc_range(GV.users())) {
   if (auto *CI = dyn_cast(U)) {
-if (CI->getPredicate() == CmpInst::ICMP_EQ && 
-MaySimplifyPtr(CI->getOperand(0))) {
+if (CI->getPredicate() == CmpInst::ICMP_EQ &&
+MaySimplifyPtr(CI->getOperand(0))) {
   // This is an equality comparison (TypeTestResolution::Single case in
   // lowerTypeTestCall). In this case we just replace the comparison
   // with true.
@@ -2538,8 +2538,8 @@ PreservedAnalyses SimplifyTypeTestsPass::run(Module &M,
 if (U.getOperandNo() == 1 && CI &&
 CI->getPredicate() == CmpInst::ICMP_EQ &&
 MaySimplifyInt(CI->getOperand(0))) {
-  // This is an equality comparison. Unlike in the case above it 
remained
-  // as an integer compare.
+  // This is an equality comparison. Unlike in the case above it
+  // remained as an integer compare.
   CI->replaceAllUsesWith(ConstantInt::getTrue(M.getContext()));
   CI->eraseFromParent();
   Changed = true;

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Move soft float predicate management into RuntimeLibcalls (PR #142905)

2025-06-05 Thread Craig Topper via llvm-branch-commits


@@ -11,12 +11,48 @@
 using namespace llvm;
 using namespace RTLIB;
 
+void RuntimeLibcallsInfo::initSoftFloatCmpLibcallPredicates() {
+  std::fill(SoftFloatCompareLibcallPredicates,

topperc wrote:

Should we be using `std::begin(SoftFloatCompareLibcallPredicates)` and 
`std::end(SoftFloatCompareLibcallPredicates)` rather than repeating the size?

https://github.com/llvm/llvm-project/pull/142905
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] LowerTypeTests: Avoid zext of ptrtoint ConstantExpr. (PR #142886)

2025-06-05 Thread Peter Collingbourne via llvm-branch-commits

https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/142886


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] LowerTypeTests: Avoid zext of ptrtoint ConstantExpr. (PR #142886)

2025-06-05 Thread Peter Collingbourne via llvm-branch-commits

https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/142886


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)

2025-06-05 Thread Nikita Popov via llvm-branch-commits


@@ -1302,6 +1302,24 @@ static void addRange(SmallVectorImpl 
&EndPoints,
   EndPoints.push_back(High);
 }
 
+MDNode *MDNode::getMergedCalleeTypeMetadata(LLVMContext &Ctx, MDNode *A,
+MDNode *B) {
+  SmallVector AB;
+  SmallSet MergedCallees;

nikic wrote:

```suggestion
  SmallPtrSet MergedCallees;
```

https://github.com/llvm/llvm-project/pull/87573
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)

2025-06-05 Thread Nikita Popov via llvm-branch-commits


@@ -0,0 +1,24 @@
+;; Test if the callee_type metadata attached to indirect call sites adhere to 
the expected format.
+
+; RUN: llvm-as < %s | llvm-dis | FileCheck %s
+define i32 @_Z13call_indirectPFicEc(ptr %func, i8 signext %x) !type !0 {
+entry:
+  %func.addr = alloca ptr, align 8
+  %x.addr = alloca i8, align 1
+  store ptr %func, ptr %func.addr, align 8
+  store i8 %x, ptr %x.addr, align 1
+  %fptr = load ptr, ptr %func.addr, align 8
+  %x_val = load i8, ptr %x.addr, align 1
+  ; CHECK: %call = call i32 %fptr(i8 signext %x_val), !callee_type !1
+  %call = call i32 %fptr(i8 signext %x_val), !callee_type !1
+  ret i32 %call
+}
+
+declare !type !2 i32 @_Z3barc(i8 signext)
+
+!0 = !{i64 0, !"_ZTSFiPvcE.generalized"}
+!1 = !{!2}
+!2 = !{i64 0, !"_ZTSFicE.generalized"}
+!3 = !{i64 0, !"_ZTSFicE"}
+!4 = !{!3}
+!8 = !{i64 0, !"_ZTSFicE.generalized"}

nikic wrote:

Looks like there's a bunch of unused metadata here?

https://github.com/llvm/llvm-project/pull/87573
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-05 Thread Momchil Velikov via llvm-branch-commits

https://github.com/momchil-velikov updated 
https://github.com/llvm/llvm-project/pull/142422

>From 2eb6c95955dc22b6b59eb4e5ba269e4744bbdd2a Mon Sep 17 00:00:00 2001
From: Momchil Velikov 
Date: Mon, 2 Jun 2025 15:13:13 +
Subject: [PATCH 1/3] [MLIR] Fix incorrect slice contiguity inference in
 `vector::isContiguousSlice`

Previously, slices were sometimes marked as non-contiguous when
they were actually contiguous. This occurred when the vector type had
leading unit dimensions, e.g., `vector<1x1x...x1xd0xd1x...xdn-1xT>``.
In such cases, only the trailing n dimensions of the memref need to be
contiguous, not the entire vector rank.

This affects how `FlattenContiguousRowMajorTransfer{Read,Write}Pattern`
flattens `transfer_read` and `transfer_write`` ops. The pattern used
to collapse a number of dimensions equal the vector rank, which
may be is incorrect when leading dimensions are unit-sized.

This patch fixes the issue by collapsing only as many trailing memref
dimensions as are actually contiguous.
---
 .../mlir/Dialect/Vector/Utils/VectorUtils.h   |  54 -
 .../Transforms/VectorTransferOpTransforms.cpp |   8 +-
 mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp |  25 ++--
 .../Vector/vector-transfer-flatten.mlir   | 108 +-
 4 files changed, 120 insertions(+), 75 deletions(-)

diff --git a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h 
b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
index 6609b28d77b6c..ed06d7a029494 100644
--- a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
+++ b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
@@ -49,35 +49,37 @@ FailureOr> 
isTranspose2DSlice(vector::TransposeOp op);
 
 /// Return true if `vectorType` is a contiguous slice of `memrefType`.
 ///
-/// Only the N = vectorType.getRank() trailing dims of `memrefType` are
-/// checked (the other dims are not relevant). Note that for `vectorType` to be
-/// a contiguous slice of `memrefType`, the trailing dims of the latter have
-/// to be contiguous - this is checked by looking at the corresponding strides.
+/// The leading unit dimensions of the vector type are ignored as they
+/// are not relevant to the result. Let N be the number of the vector
+/// dimensions after ignoring a leading sequence of unit ones.
 ///
-/// There might be some restriction on the leading dim of `VectorType`:
+/// For `vectorType` to be a contiguous slice of `memrefType`
+///   a) the N trailing dimensions of the latter must be contiguous, and
+///   b) the trailing N dimensions of `vectorType` and `memrefType`,
+///  except the first of them, must match.
 ///
-/// Case 1. If all the trailing dims of `vectorType` match the trailing dims
-/// of `memrefType` then the leading dim of `vectorType` can be
-/// arbitrary.
-///
-///Ex. 1.1 contiguous slice, perfect match
-///  vector<4x3x2xi32> from memref<5x4x3x2xi32>
-///Ex. 1.2 contiguous slice, the leading dim does not match (2 != 4)
-///  vector<2x3x2xi32> from memref<5x4x3x2xi32>
-///
-/// Case 2. If an "internal" dim of `vectorType` does not match the
-/// corresponding trailing dim in `memrefType` then the remaining
-/// leading dims of `vectorType` have to be 1 (the first non-matching
-/// dim can be arbitrary).
+/// Examples:
 ///
-///Ex. 2.1 non-contiguous slice, 2 != 3 and the leading dim != <1>
-///  vector<2x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.2  contiguous slice, 2 != 3 and the leading dim == <1>
-///  vector<1x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.3. contiguous slice, 2 != 3 and the leading dims == <1x1>
-///  vector<1x1x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.4. non-contiguous slice, 2 != 3 and the leading dims != <1x1>
-/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>)
+///   Ex.1 contiguous slice, perfect match
+/// vector<4x3x2xi32> from memref<5x4x3x2xi32>
+///   Ex.2 contiguous slice, the leading dim does not match (2 != 4)
+/// vector<2x3x2xi32> from memref<5x4x3x2xi32>
+///   Ex.3 non-contiguous slice, 2 != 3
+/// vector<2x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.4 contiguous slice, leading unit dimension of the vector ignored,
+///2 != 3 (allowed)
+/// vector<1x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.5. contiguous slice, leasing two unit dims of the vector ignored,
+/// 2 != 3 (allowed)
+/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.6. non-contiguous slice, 2 != 3, no leading sequence of unit dims
+/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>)
+///   Ex.7 contiguous slice, memref needs to be contiguous only on the last
+///dimension
+/// vector<1x1x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>>
+///   Ex.8 non-contiguous slice, memref needs to be contiguous one the last
+///two dimensions, and it isn't
+/// vector<1x2x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>>
 bool isContiguo