[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Add ISD::PTRADD DAG combines (PR #142739)
@@ -2627,6 +2629,93 @@ SDValue DAGCombiner::foldSubToAvg(SDNode *N, const SDLoc &DL) { return SDValue(); } +/// Try to fold a pointer arithmetic node. +/// This needs to be done separately from normal addition, because pointer +/// addition is not commutative. +SDValue DAGCombiner::visitPTRADD(SDNode *N) { + SDValue N0 = N->getOperand(0); + SDValue N1 = N->getOperand(1); + EVT PtrVT = N0.getValueType(); + EVT IntVT = N1.getValueType(); + SDLoc DL(N); + + // This is already ensured by an assert in SelectionDAG::getNode(). Several + // combines here depend on this assumption. + assert(PtrVT == IntVT && + "PTRADD with different operand types is not supported"); + + // fold (ptradd undef, y) -> undef + if (N0.isUndef()) +return N0; + + // fold (ptradd x, undef) -> undef + if (N1.isUndef()) +return DAG.getUNDEF(PtrVT); + + // fold (ptradd x, 0) -> x + if (isNullConstant(N1)) +return N0; + + // fold (ptradd 0, x) -> x + if (isNullConstant(N0)) +return N1; + + if (N0.getOpcode() == ISD::PTRADD && + !reassociationCanBreakAddressingModePattern(ISD::PTRADD, DL, N, N0, N1)) { +SDValue X = N0.getOperand(0); +SDValue Y = N0.getOperand(1); +SDValue Z = N1; +bool N0OneUse = N0.hasOneUse(); +bool YIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Y); +bool ZIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Z); + +// (ptradd (ptradd x, y), z) -> (ptradd x, (add y, z)) if: +// * y is a constant and (ptradd x, y) has one use; or +// * y and z are both constants. ritter-x2a wrote: So that `y + z` can be folded into a single constant, which might be folded as an immediate offset into a memory instruction. `SeparateConstOffsetFromGEP` should do that for AMDGPU already in many cases when it's beneficial, but - I don't think that every backend uses `SeparateConstOffsetFromGEP`, so it can be worthwhile to have anyway, - There are cases where these are introduced after `SeparateConstOffsetFromGEP` runs; for example when a wide vector load/store with an offset is legalized to several loads/stores with nested offsets, for example in `store_v16i32` in `ptradd-sdag-optimizations.ll`; with this reassociation we get the code that we would get with the old non-PTRADD code path, and - while it's probably possible that this could lead to worse code, the `reassociationCanBreakAddressingModePattern` check above _should_ avoid these (I'm not 100% convinced the logic in there is sound, but that seems like a different problem). https://github.com/llvm/llvm-project/pull/142739 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Baseline fneg-fabs.bf16.ll tests. NFC. (PR #142910)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/142910 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [CI] Use LLVM_ENABLE_RUNTIMES for runtimes builds on Linux (PR #142694)
https://github.com/Endilll commented: > We're using LLVM_ENABLE_RUNTIMES. It uses the just built clang to build the > runtimes specified. That explains it, thank you. There's still an outstanding question of unrelated changes to libc++ tests that are included in this PR. https://github.com/llvm/llvm-project/pull/142694 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Patterns for <2 x bfloat> fneg (fabs) (PR #142911)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/142911 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [CI] Use LLVM_ENABLE_RUNTIMES for runtimes builds on Linux (PR #142694)
boomanaiden154 wrote: > There's still an outstanding question of unrelated changes to libc++ tests > that are included in this PR. I'm still not sure how they're ending up in here. I haven't seen this before with `spr`. This will definitely be fixed before I end up landing the patch and I'm guessing will be resolved when I change the branch target to `main`. https://github.com/llvm/llvm-project/pull/142694 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Add ISD::PTRADD DAG combines (PR #142739)
@@ -2627,6 +2629,93 @@ SDValue DAGCombiner::foldSubToAvg(SDNode *N, const SDLoc &DL) { return SDValue(); } +/// Try to fold a pointer arithmetic node. +/// This needs to be done separately from normal addition, because pointer +/// addition is not commutative. +SDValue DAGCombiner::visitPTRADD(SDNode *N) { + SDValue N0 = N->getOperand(0); + SDValue N1 = N->getOperand(1); + EVT PtrVT = N0.getValueType(); + EVT IntVT = N1.getValueType(); + SDLoc DL(N); + + // This is already ensured by an assert in SelectionDAG::getNode(). Several + // combines here depend on this assumption. + assert(PtrVT == IntVT && + "PTRADD with different operand types is not supported"); + + // fold (ptradd undef, y) -> undef + if (N0.isUndef()) +return N0; + + // fold (ptradd x, undef) -> undef + if (N1.isUndef()) +return DAG.getUNDEF(PtrVT); + + // fold (ptradd x, 0) -> x + if (isNullConstant(N1)) +return N0; + + // fold (ptradd 0, x) -> x + if (isNullConstant(N0)) +return N1; + + if (N0.getOpcode() == ISD::PTRADD && + !reassociationCanBreakAddressingModePattern(ISD::PTRADD, DL, N, N0, N1)) { +SDValue X = N0.getOperand(0); +SDValue Y = N0.getOperand(1); +SDValue Z = N1; +bool N0OneUse = N0.hasOneUse(); +bool YIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Y); +bool ZIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Z); + +// (ptradd (ptradd x, y), z) -> (ptradd x, (add y, z)) if: +// * y is a constant and (ptradd x, y) has one use; or +// * y and z are both constants. +if ((YIsConstant && N0OneUse) || (YIsConstant && ZIsConstant)) { + SDNodeFlags Flags; + // If both additions in the original were NUW, the new ones are as well. + if (N->getFlags().hasNoUnsignedWrap() && + N0->getFlags().hasNoUnsignedWrap()) +Flags |= SDNodeFlags::NoUnsignedWrap; + SDValue Add = DAG.getNode(ISD::ADD, DL, IntVT, {Y, Z}, Flags); + AddToWorklist(Add.getNode()); + return DAG.getMemBasePlusOffset(X, Add, DL, Flags); +} + +// TODO: There is another possible fold here that was proven useful. +// It would be this: +// +// (ptradd (ptradd x, y), z) -> (ptradd (ptradd x, z), y) if: +// * (ptradd x, y) has one use; and +// * y is a constant; and +// * z is not a constant. +// +// In some cases, specifically in AArch64's FEAT_CPA, it exposes the +// opportunity to select more complex instructions such as SUBPT and +// MSUBPT. However, a hypothetical corner case has been found that we could +// not avoid. Consider this (pseudo-POSIX C): +// +// char *foo(char *x, int z) {return (x + LARGE_CONSTANT) + z;} +// char *p = mmap(LARGE_CONSTANT); +// char *q = foo(p, -LARGE_CONSTANT); +// +// Then x + LARGE_CONSTANT is one-past-the-end, so valid, and a +// further + z takes it back to the start of the mapping, so valid, +// regardless of the address mmap gave back. However, if mmap gives you an +// address < LARGE_CONSTANT (ignoring high bits), x - LARGE_CONSTANT will +// borrow from the high bits (with the subsequent + z carrying back into +// the high bits to give you a well-defined pointer) and thus trip +// FEAT_CPA's pointer corruption checks. +// +// We leave this fold as an opportunity for future work, addressing the +// corner case for FEAT_CPA, as well as reconciling the solution with the +// more general application of pointer arithmetic in other future targets. ritter-x2a wrote: My vague idea of handling this properly in the future would be to - have an `inbounds` flag on `PTRADD` nodes (see #131862), - have backends generate instructions that break for out-of-bounds arithmetic only when the `inbounds` flag is present, and - give targets an option to request that transformations that would generate `PTRADD`s without `inbounds` flag are not applied. I think that would give things like the CPA implementation a better semantic footing, since otherwise they would just be miscompiling the IR's `getelementptr`s without `inbounds` flags. However, at least the last point above is currently not on my critical path, so I'm open to adding the comment here or moving the other transform here. https://github.com/llvm/llvm-project/pull/142739 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)
@@ -137,7 +138,123 @@ class AMDGPURegBankLegalizeCombiner { return {MatchMI, MatchMI->getOperand(1).getReg()}; } + std::tuple tryMatchRALFromUnmerge(Register Src) { +auto *ReadAnyLane = MRI.getVRegDef(Src); +if (ReadAnyLane->getOpcode() == AMDGPU::G_AMDGPU_READANYLANE) { + Register RALSrc = ReadAnyLane->getOperand(1).getReg(); + auto *UnMerge = getOpcodeDef(RALSrc, MRI); + if (UnMerge) +return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)}; +} +return {nullptr, -1}; + } + + Register getReadAnyLaneSrc(Register Src) { +// Src = G_AMDGPU_READANYLANE RALSrc +auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + +// LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc +// LoSgpr = G_AMDGPU_READANYLANE LoVgpr +// HiSgpr = G_AMDGPU_READANYLANE HiVgpr +// Src G_MERGE_VALUES LoSgpr, HiSgpr +auto *Merge = getOpcodeDef(Src, MRI); +if (Merge) { + unsigned NumElts = Merge->getNumSources(); + auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0)); + if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0) +return {}; + + // check if all elements are from same unmerge and there is no shuffling + for (unsigned i = 1; i < NumElts; ++i) { +auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i)); +if (UnmergeI != Unmerge || (unsigned)IdxI != i) + return {}; + } + return Unmerge->getSourceReg(); +} + +// ..., VgprI, ... = G_UNMERGE_VALUES VgprLarge +// SgprI = G_AMDGPU_READANYLANE VgprI +// SgprLarge G_MERGE_VALUES ..., SgprI, ... +// ..., Src, ... = G_UNMERGE_VALUES SgprLarge +auto *UnMerge = getOpcodeDef(Src, MRI); +if (UnMerge) { + int Idx = UnMerge->findRegisterDefOperandIdx(Src, nullptr); + auto *Merge = getOpcodeDef(UnMerge->getSourceReg(), MRI); + if (Merge) { +auto [RAL, RALSrc] = +tryMatch(Merge->getSourceReg(Idx), AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + } +} + +return {}; + } + + bool tryEliminateReadAnyLane(MachineInstr &Copy) { +Register Dst = Copy.getOperand(0).getReg(); +Register Src = Copy.getOperand(1).getReg(); +if (!Src.isVirtual()) + return false; + +Register RALDst = Src; +MachineInstr &SrcMI = *MRI.getVRegDef(Src); +if (SrcMI.getOpcode() == AMDGPU::G_BITCAST) { + RALDst = SrcMI.getOperand(1).getReg(); +} + +Register RALSrc = getReadAnyLaneSrc(RALDst); +if (!RALSrc) + return false; + +if (Dst.isVirtual()) { + if (SrcMI.getOpcode() != AMDGPU::G_BITCAST) { +// Src = READANYLANE RALSrc +// Dst = Copy Src +// -> +// Dst = RALSrc +MRI.replaceRegWith(Dst, RALSrc); + } else { +// RALDst = READANYLANE RALSrc +// Src = G_BITCAST RALDst +// Dst = Copy Src +// -> +// NewVgpr = G_BITCAST RALDst +// Dst = NewVgpr +auto Bitcast = B.buildBitcast({VgprRB, MRI.getType(Src)}, RALSrc); Pierre-vh wrote: Does this work as intended without the `B.setInstr(Copy)` call? https://github.com/llvm/llvm-project/pull/142789 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)
https://github.com/momchil-velikov updated https://github.com/llvm/llvm-project/pull/142422 >From 2eb6c95955dc22b6b59eb4e5ba269e4744bbdd2a Mon Sep 17 00:00:00 2001 From: Momchil Velikov Date: Mon, 2 Jun 2025 15:13:13 + Subject: [PATCH 1/3] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` Previously, slices were sometimes marked as non-contiguous when they were actually contiguous. This occurred when the vector type had leading unit dimensions, e.g., `vector<1x1x...x1xd0xd1x...xdn-1xT>``. In such cases, only the trailing n dimensions of the memref need to be contiguous, not the entire vector rank. This affects how `FlattenContiguousRowMajorTransfer{Read,Write}Pattern` flattens `transfer_read` and `transfer_write`` ops. The pattern used to collapse a number of dimensions equal the vector rank, which may be is incorrect when leading dimensions are unit-sized. This patch fixes the issue by collapsing only as many trailing memref dimensions as are actually contiguous. --- .../mlir/Dialect/Vector/Utils/VectorUtils.h | 54 - .../Transforms/VectorTransferOpTransforms.cpp | 8 +- mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp | 25 ++-- .../Vector/vector-transfer-flatten.mlir | 108 +- 4 files changed, 120 insertions(+), 75 deletions(-) diff --git a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h index 6609b28d77b6c..ed06d7a029494 100644 --- a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h +++ b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h @@ -49,35 +49,37 @@ FailureOr> isTranspose2DSlice(vector::TransposeOp op); /// Return true if `vectorType` is a contiguous slice of `memrefType`. /// -/// Only the N = vectorType.getRank() trailing dims of `memrefType` are -/// checked (the other dims are not relevant). Note that for `vectorType` to be -/// a contiguous slice of `memrefType`, the trailing dims of the latter have -/// to be contiguous - this is checked by looking at the corresponding strides. +/// The leading unit dimensions of the vector type are ignored as they +/// are not relevant to the result. Let N be the number of the vector +/// dimensions after ignoring a leading sequence of unit ones. /// -/// There might be some restriction on the leading dim of `VectorType`: +/// For `vectorType` to be a contiguous slice of `memrefType` +/// a) the N trailing dimensions of the latter must be contiguous, and +/// b) the trailing N dimensions of `vectorType` and `memrefType`, +/// except the first of them, must match. /// -/// Case 1. If all the trailing dims of `vectorType` match the trailing dims -/// of `memrefType` then the leading dim of `vectorType` can be -/// arbitrary. -/// -///Ex. 1.1 contiguous slice, perfect match -/// vector<4x3x2xi32> from memref<5x4x3x2xi32> -///Ex. 1.2 contiguous slice, the leading dim does not match (2 != 4) -/// vector<2x3x2xi32> from memref<5x4x3x2xi32> -/// -/// Case 2. If an "internal" dim of `vectorType` does not match the -/// corresponding trailing dim in `memrefType` then the remaining -/// leading dims of `vectorType` have to be 1 (the first non-matching -/// dim can be arbitrary). +/// Examples: /// -///Ex. 2.1 non-contiguous slice, 2 != 3 and the leading dim != <1> -/// vector<2x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.2 contiguous slice, 2 != 3 and the leading dim == <1> -/// vector<1x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.3. contiguous slice, 2 != 3 and the leading dims == <1x1> -/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.4. non-contiguous slice, 2 != 3 and the leading dims != <1x1> -/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>) +/// Ex.1 contiguous slice, perfect match +/// vector<4x3x2xi32> from memref<5x4x3x2xi32> +/// Ex.2 contiguous slice, the leading dim does not match (2 != 4) +/// vector<2x3x2xi32> from memref<5x4x3x2xi32> +/// Ex.3 non-contiguous slice, 2 != 3 +/// vector<2x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.4 contiguous slice, leading unit dimension of the vector ignored, +///2 != 3 (allowed) +/// vector<1x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.5. contiguous slice, leasing two unit dims of the vector ignored, +/// 2 != 3 (allowed) +/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.6. non-contiguous slice, 2 != 3, no leading sequence of unit dims +/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>) +/// Ex.7 contiguous slice, memref needs to be contiguous only on the last +///dimension +/// vector<1x1x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>> +/// Ex.8 non-contiguous slice, memref needs to be contiguous one the last +///two dimensions, and it isn't +/// vector<1x2x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>> bool isContiguo
[llvm-branch-commits] [CI] Use LLVM_ENABLE_RUNTIMES for runtimes builds on Linux (PR #142694)
https://github.com/Endilll approved this pull request. https://github.com/llvm/llvm-project/pull/142694 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/142789 >From 64d7853a9edefabe8de40748e01348d2d5c017c5 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 5 Jun 2025 12:17:13 +0200 Subject: [PATCH] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize --- .../Target/AMDGPU/AMDGPURegBankLegalize.cpp | 122 +++--- .../AMDGPU/GlobalISel/readanylane-combines.ll | 25 +--- .../GlobalISel/readanylane-combines.mir | 78 +++ 3 files changed, 125 insertions(+), 100 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp index ba661348ca5b5..6707b641b0d25 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp @@ -23,6 +23,7 @@ #include "GCNSubtarget.h" #include "llvm/CodeGen/GlobalISel/CSEInfo.h" #include "llvm/CodeGen/GlobalISel/CSEMIRBuilder.h" +#include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h" #include "llvm/CodeGen/MachineFunctionPass.h" #include "llvm/CodeGen/MachineUniformityAnalysis.h" #include "llvm/CodeGen/TargetPassConfig.h" @@ -137,7 +138,109 @@ class AMDGPURegBankLegalizeCombiner { return {MatchMI, MatchMI->getOperand(1).getReg()}; } + std::pair tryMatchRALFromUnmerge(Register Src) { +MachineInstr *ReadAnyLane = MRI.getVRegDef(Src); +if (ReadAnyLane->getOpcode() == AMDGPU::G_AMDGPU_READANYLANE) { + Register RALSrc = ReadAnyLane->getOperand(1).getReg(); + if (auto *UnMerge = getOpcodeDef(RALSrc, MRI)) +return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)}; +} +return {nullptr, -1}; + } + + Register getReadAnyLaneSrc(Register Src) { +// Src = G_AMDGPU_READANYLANE RALSrc +auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + +// LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc +// LoSgpr = G_AMDGPU_READANYLANE LoVgpr +// HiSgpr = G_AMDGPU_READANYLANE HiVgpr +// Src G_MERGE_VALUES LoSgpr, HiSgpr +auto *Merge = getOpcodeDef(Src, MRI); +if (Merge) { + unsigned NumElts = Merge->getNumSources(); + auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0)); + if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0) +return {}; + + // check if all elements are from same unmerge and there is no shuffling + for (unsigned i = 1; i < NumElts; ++i) { +auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i)); +if (UnmergeI != Unmerge || (unsigned)IdxI != i) + return {}; + } + return Unmerge->getSourceReg(); +} + +// ..., VgprI, ... = G_UNMERGE_VALUES VgprLarge +// SgprI = G_AMDGPU_READANYLANE VgprI +// SgprLarge G_MERGE_VALUES ..., SgprI, ... +// ..., Src, ... = G_UNMERGE_VALUES SgprLarge +auto *UnMerge = getOpcodeDef(Src, MRI); +if (UnMerge) { + int Idx = UnMerge->findRegisterDefOperandIdx(Src, nullptr); + auto *Merge = getOpcodeDef(UnMerge->getSourceReg(), MRI); + if (Merge) { +auto [RAL, RALSrc] = +tryMatch(Merge->getSourceReg(Idx), AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + } +} + +return {}; + } + + void replaceRegWithOrBuildCopy(Register Dst, Register Src) { +if (Dst.isVirtual()) + MRI.replaceRegWith(Dst, Src); +else + B.buildCopy(Dst, Src); + } + + bool tryEliminateReadAnyLane(MachineInstr &Copy) { +Register Dst = Copy.getOperand(0).getReg(); +Register Src = Copy.getOperand(1).getReg(); +if (!Src.isVirtual()) + return false; + +Register RALDst = Src; +MachineInstr &SrcMI = *MRI.getVRegDef(Src); +if (SrcMI.getOpcode() == AMDGPU::G_BITCAST) + RALDst = SrcMI.getOperand(1).getReg(); + +Register RALSrc = getReadAnyLaneSrc(RALDst); +if (!RALSrc) + return false; + +B.setInstr(Copy); +if (SrcMI.getOpcode() != AMDGPU::G_BITCAST) { + // Src = READANYLANE RALSrc Src = READANYLANE RALSrc + // Dst = Copy Src $Dst = Copy Src + // -> -> + // Dst = RALSrc $Dst = Copy RALSrc + replaceRegWithOrBuildCopy(Dst, RALSrc); +} else { + // RALDst = READANYLANE RALSrc RALDst = READANYLANE RALSrc + // Src = G_BITCAST RALDst Src = G_BITCAST RALDst + // Dst = Copy Src Dst = Copy Src + // -> -> + // NewVgpr = G_BITCAST RALDst NewVgpr = G_BITCAST RALDst + // Dst = NewVgpr$Dst = Copy NewVgpr + auto Bitcast = B.buildBitcast({VgprRB, MRI.getType(Src)}, RALSrc); + replaceRegWithOrBuildCopy(Dst, Bitcast.getReg(0)); +} + +eraseInstr(Copy, MRI, nullptr); +return true; + } + void tryCombineCopy(MachineInstr &MI) { +if (tryEliminateReadAnyLane(MI)) + return; + Register Dst = MI.get
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/142789 >From 64d7853a9edefabe8de40748e01348d2d5c017c5 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 5 Jun 2025 12:17:13 +0200 Subject: [PATCH] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize --- .../Target/AMDGPU/AMDGPURegBankLegalize.cpp | 122 +++--- .../AMDGPU/GlobalISel/readanylane-combines.ll | 25 +--- .../GlobalISel/readanylane-combines.mir | 78 +++ 3 files changed, 125 insertions(+), 100 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp index ba661348ca5b5..6707b641b0d25 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp @@ -23,6 +23,7 @@ #include "GCNSubtarget.h" #include "llvm/CodeGen/GlobalISel/CSEInfo.h" #include "llvm/CodeGen/GlobalISel/CSEMIRBuilder.h" +#include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h" #include "llvm/CodeGen/MachineFunctionPass.h" #include "llvm/CodeGen/MachineUniformityAnalysis.h" #include "llvm/CodeGen/TargetPassConfig.h" @@ -137,7 +138,109 @@ class AMDGPURegBankLegalizeCombiner { return {MatchMI, MatchMI->getOperand(1).getReg()}; } + std::pair tryMatchRALFromUnmerge(Register Src) { +MachineInstr *ReadAnyLane = MRI.getVRegDef(Src); +if (ReadAnyLane->getOpcode() == AMDGPU::G_AMDGPU_READANYLANE) { + Register RALSrc = ReadAnyLane->getOperand(1).getReg(); + if (auto *UnMerge = getOpcodeDef(RALSrc, MRI)) +return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)}; +} +return {nullptr, -1}; + } + + Register getReadAnyLaneSrc(Register Src) { +// Src = G_AMDGPU_READANYLANE RALSrc +auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + +// LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc +// LoSgpr = G_AMDGPU_READANYLANE LoVgpr +// HiSgpr = G_AMDGPU_READANYLANE HiVgpr +// Src G_MERGE_VALUES LoSgpr, HiSgpr +auto *Merge = getOpcodeDef(Src, MRI); +if (Merge) { + unsigned NumElts = Merge->getNumSources(); + auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0)); + if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0) +return {}; + + // check if all elements are from same unmerge and there is no shuffling + for (unsigned i = 1; i < NumElts; ++i) { +auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i)); +if (UnmergeI != Unmerge || (unsigned)IdxI != i) + return {}; + } + return Unmerge->getSourceReg(); +} + +// ..., VgprI, ... = G_UNMERGE_VALUES VgprLarge +// SgprI = G_AMDGPU_READANYLANE VgprI +// SgprLarge G_MERGE_VALUES ..., SgprI, ... +// ..., Src, ... = G_UNMERGE_VALUES SgprLarge +auto *UnMerge = getOpcodeDef(Src, MRI); +if (UnMerge) { + int Idx = UnMerge->findRegisterDefOperandIdx(Src, nullptr); + auto *Merge = getOpcodeDef(UnMerge->getSourceReg(), MRI); + if (Merge) { +auto [RAL, RALSrc] = +tryMatch(Merge->getSourceReg(Idx), AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + } +} + +return {}; + } + + void replaceRegWithOrBuildCopy(Register Dst, Register Src) { +if (Dst.isVirtual()) + MRI.replaceRegWith(Dst, Src); +else + B.buildCopy(Dst, Src); + } + + bool tryEliminateReadAnyLane(MachineInstr &Copy) { +Register Dst = Copy.getOperand(0).getReg(); +Register Src = Copy.getOperand(1).getReg(); +if (!Src.isVirtual()) + return false; + +Register RALDst = Src; +MachineInstr &SrcMI = *MRI.getVRegDef(Src); +if (SrcMI.getOpcode() == AMDGPU::G_BITCAST) + RALDst = SrcMI.getOperand(1).getReg(); + +Register RALSrc = getReadAnyLaneSrc(RALDst); +if (!RALSrc) + return false; + +B.setInstr(Copy); +if (SrcMI.getOpcode() != AMDGPU::G_BITCAST) { + // Src = READANYLANE RALSrc Src = READANYLANE RALSrc + // Dst = Copy Src $Dst = Copy Src + // -> -> + // Dst = RALSrc $Dst = Copy RALSrc + replaceRegWithOrBuildCopy(Dst, RALSrc); +} else { + // RALDst = READANYLANE RALSrc RALDst = READANYLANE RALSrc + // Src = G_BITCAST RALDst Src = G_BITCAST RALDst + // Dst = Copy Src Dst = Copy Src + // -> -> + // NewVgpr = G_BITCAST RALDst NewVgpr = G_BITCAST RALDst + // Dst = NewVgpr$Dst = Copy NewVgpr + auto Bitcast = B.buildBitcast({VgprRB, MRI.getType(Src)}, RALSrc); + replaceRegWithOrBuildCopy(Dst, Bitcast.getReg(0)); +} + +eraseInstr(Copy, MRI, nullptr); +return true; + } + void tryCombineCopy(MachineInstr &MI) { +if (tryEliminateReadAnyLane(MI)) + return; + Register Dst = MI.get
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/142790 >From ae9621601118004cc6b363be7fad70092e401cad Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 5 Jun 2025 12:43:04 +0200 Subject: [PATCH] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize Add rules for G_AMDGPU_BUFFER_LOAD and implement waterfall lowering for divergent operands that must be sgpr. --- .../Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | 53 +++- .../lib/Target/AMDGPU/AMDGPUGlobalISelUtils.h | 2 + .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 239 +- .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 1 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 28 +- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 6 +- .../AMDGPU/GlobalISel/buffer-schedule.ll | 2 +- .../llvm.amdgcn.make.buffer.rsrc.ll | 2 +- .../regbankselect-amdgcn.raw.buffer.load.ll | 59 ++--- ...egbankselect-amdgcn.raw.ptr.buffer.load.ll | 59 ++--- ...regbankselect-amdgcn.struct.buffer.load.ll | 59 ++--- ...ankselect-amdgcn.struct.ptr.buffer.load.ll | 59 ++--- .../llvm.amdgcn.buffer.load-last-use.ll | 2 +- .../llvm.amdgcn.raw.atomic.buffer.load.ll | 42 +-- .../llvm.amdgcn.raw.ptr.atomic.buffer.load.ll | 42 +-- .../llvm.amdgcn.struct.atomic.buffer.load.ll | 48 ++-- ...vm.amdgcn.struct.ptr.atomic.buffer.load.ll | 48 ++-- .../CodeGen/AMDGPU/swizzle.bit.extract.ll | 4 +- 18 files changed, 513 insertions(+), 242 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp index 00979f44f9d34..d8be3aee1f410 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp @@ -117,45 +117,72 @@ static LLT getReadAnyLaneSplitTy(LLT Ty) { return LLT::scalar(32); } -static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc, - const RegisterBankInfo &RBI); +using ReadLaneFnTy = +function_ref; + +static Register buildReadLane(MachineIRBuilder &, Register, + const RegisterBankInfo &, ReadLaneFnTy); static void unmergeReadAnyLane(MachineIRBuilder &B, SmallVectorImpl &SgprDstParts, LLT UnmergeTy, Register VgprSrc, - const RegisterBankInfo &RBI) { + const RegisterBankInfo &RBI, + ReadLaneFnTy BuildRL) { const RegisterBank *VgprRB = &RBI.getRegBank(AMDGPU::VGPRRegBankID); auto Unmerge = B.buildUnmerge({VgprRB, UnmergeTy}, VgprSrc); for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) { -SgprDstParts.push_back(buildReadAnyLane(B, Unmerge.getReg(i), RBI)); +SgprDstParts.push_back(buildReadLane(B, Unmerge.getReg(i), RBI, BuildRL)); } } -static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc, - const RegisterBankInfo &RBI) { +static Register buildReadLane(MachineIRBuilder &B, Register VgprSrc, + const RegisterBankInfo &RBI, + ReadLaneFnTy BuildRL) { LLT Ty = B.getMRI()->getType(VgprSrc); const RegisterBank *SgprRB = &RBI.getRegBank(AMDGPU::SGPRRegBankID); if (Ty.getSizeInBits() == 32) { -return B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {{SgprRB, Ty}}, {VgprSrc}) -.getReg(0); +Register SgprDst = B.getMRI()->createVirtualRegister({SgprRB, Ty}); +return BuildRL(B, SgprDst, VgprSrc).getReg(0); } SmallVector SgprDstParts; - unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI); + unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI, + BuildRL); return B.buildMergeLikeInstr({SgprRB, Ty}, SgprDstParts).getReg(0); } -void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst, - Register VgprSrc, const RegisterBankInfo &RBI) { +static void buildReadLane(MachineIRBuilder &B, Register SgprDst, + Register VgprSrc, const RegisterBankInfo &RBI, + ReadLaneFnTy BuildReadLane) { LLT Ty = B.getMRI()->getType(VgprSrc); if (Ty.getSizeInBits() == 32) { -B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {SgprDst}, {VgprSrc}); +BuildReadLane(B, SgprDst, VgprSrc); return; } SmallVector SgprDstParts; - unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI); + unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI, + BuildReadLane); B.buildMergeLikeInstr(SgprDst, SgprDstParts).getReg(0); } + +void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst, + Register VgprSrc, const RegisterBankInfo &RBI) { + return buildReadLane( + B, SgprDst, VgprSrc, RBI, + [](
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/142790 >From ae9621601118004cc6b363be7fad70092e401cad Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 5 Jun 2025 12:43:04 +0200 Subject: [PATCH] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize Add rules for G_AMDGPU_BUFFER_LOAD and implement waterfall lowering for divergent operands that must be sgpr. --- .../Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | 53 +++- .../lib/Target/AMDGPU/AMDGPUGlobalISelUtils.h | 2 + .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 239 +- .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 1 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 28 +- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 6 +- .../AMDGPU/GlobalISel/buffer-schedule.ll | 2 +- .../llvm.amdgcn.make.buffer.rsrc.ll | 2 +- .../regbankselect-amdgcn.raw.buffer.load.ll | 59 ++--- ...egbankselect-amdgcn.raw.ptr.buffer.load.ll | 59 ++--- ...regbankselect-amdgcn.struct.buffer.load.ll | 59 ++--- ...ankselect-amdgcn.struct.ptr.buffer.load.ll | 59 ++--- .../llvm.amdgcn.buffer.load-last-use.ll | 2 +- .../llvm.amdgcn.raw.atomic.buffer.load.ll | 42 +-- .../llvm.amdgcn.raw.ptr.atomic.buffer.load.ll | 42 +-- .../llvm.amdgcn.struct.atomic.buffer.load.ll | 48 ++-- ...vm.amdgcn.struct.ptr.atomic.buffer.load.ll | 48 ++-- .../CodeGen/AMDGPU/swizzle.bit.extract.ll | 4 +- 18 files changed, 513 insertions(+), 242 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp index 00979f44f9d34..d8be3aee1f410 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp @@ -117,45 +117,72 @@ static LLT getReadAnyLaneSplitTy(LLT Ty) { return LLT::scalar(32); } -static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc, - const RegisterBankInfo &RBI); +using ReadLaneFnTy = +function_ref; + +static Register buildReadLane(MachineIRBuilder &, Register, + const RegisterBankInfo &, ReadLaneFnTy); static void unmergeReadAnyLane(MachineIRBuilder &B, SmallVectorImpl &SgprDstParts, LLT UnmergeTy, Register VgprSrc, - const RegisterBankInfo &RBI) { + const RegisterBankInfo &RBI, + ReadLaneFnTy BuildRL) { const RegisterBank *VgprRB = &RBI.getRegBank(AMDGPU::VGPRRegBankID); auto Unmerge = B.buildUnmerge({VgprRB, UnmergeTy}, VgprSrc); for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) { -SgprDstParts.push_back(buildReadAnyLane(B, Unmerge.getReg(i), RBI)); +SgprDstParts.push_back(buildReadLane(B, Unmerge.getReg(i), RBI, BuildRL)); } } -static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc, - const RegisterBankInfo &RBI) { +static Register buildReadLane(MachineIRBuilder &B, Register VgprSrc, + const RegisterBankInfo &RBI, + ReadLaneFnTy BuildRL) { LLT Ty = B.getMRI()->getType(VgprSrc); const RegisterBank *SgprRB = &RBI.getRegBank(AMDGPU::SGPRRegBankID); if (Ty.getSizeInBits() == 32) { -return B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {{SgprRB, Ty}}, {VgprSrc}) -.getReg(0); +Register SgprDst = B.getMRI()->createVirtualRegister({SgprRB, Ty}); +return BuildRL(B, SgprDst, VgprSrc).getReg(0); } SmallVector SgprDstParts; - unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI); + unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI, + BuildRL); return B.buildMergeLikeInstr({SgprRB, Ty}, SgprDstParts).getReg(0); } -void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst, - Register VgprSrc, const RegisterBankInfo &RBI) { +static void buildReadLane(MachineIRBuilder &B, Register SgprDst, + Register VgprSrc, const RegisterBankInfo &RBI, + ReadLaneFnTy BuildReadLane) { LLT Ty = B.getMRI()->getType(VgprSrc); if (Ty.getSizeInBits() == 32) { -B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {SgprDst}, {VgprSrc}); +BuildReadLane(B, SgprDst, VgprSrc); return; } SmallVector SgprDstParts; - unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI); + unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI, + BuildReadLane); B.buildMergeLikeInstr(SgprDst, SgprDstParts).getReg(0); } + +void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst, + Register VgprSrc, const RegisterBankInfo &RBI) { + return buildReadLane( + B, SgprDst, VgprSrc, RBI, + [](
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)
@@ -137,7 +138,123 @@ class AMDGPURegBankLegalizeCombiner { return {MatchMI, MatchMI->getOperand(1).getReg()}; } + std::tuple tryMatchRALFromUnmerge(Register Src) { +auto *ReadAnyLane = MRI.getVRegDef(Src); +if (ReadAnyLane->getOpcode() == AMDGPU::G_AMDGPU_READANYLANE) { + Register RALSrc = ReadAnyLane->getOperand(1).getReg(); + auto *UnMerge = getOpcodeDef(RALSrc, MRI); + if (UnMerge) +return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)}; +} +return {nullptr, -1}; + } + + Register getReadAnyLaneSrc(Register Src) { +// Src = G_AMDGPU_READANYLANE RALSrc +auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + +// LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc +// LoSgpr = G_AMDGPU_READANYLANE LoVgpr +// HiSgpr = G_AMDGPU_READANYLANE HiVgpr +// Src G_MERGE_VALUES LoSgpr, HiSgpr +auto *Merge = getOpcodeDef(Src, MRI); +if (Merge) { + unsigned NumElts = Merge->getNumSources(); + auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0)); + if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0) +return {}; + + // check if all elements are from same unmerge and there is no shuffling + for (unsigned i = 1; i < NumElts; ++i) { +auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i)); +if (UnmergeI != Unmerge || (unsigned)IdxI != i) + return {}; + } + return Unmerge->getSourceReg(); +} + +// ..., VgprI, ... = G_UNMERGE_VALUES VgprLarge +// SgprI = G_AMDGPU_READANYLANE VgprI +// SgprLarge G_MERGE_VALUES ..., SgprI, ... +// ..., Src, ... = G_UNMERGE_VALUES SgprLarge +auto *UnMerge = getOpcodeDef(Src, MRI); +if (UnMerge) { + int Idx = UnMerge->findRegisterDefOperandIdx(Src, nullptr); + auto *Merge = getOpcodeDef(UnMerge->getSourceReg(), MRI); + if (Merge) { +auto [RAL, RALSrc] = +tryMatch(Merge->getSourceReg(Idx), AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + } +} + +return {}; + } + + bool tryEliminateReadAnyLane(MachineInstr &Copy) { +Register Dst = Copy.getOperand(0).getReg(); +Register Src = Copy.getOperand(1).getReg(); +if (!Src.isVirtual()) + return false; + +Register RALDst = Src; +MachineInstr &SrcMI = *MRI.getVRegDef(Src); +if (SrcMI.getOpcode() == AMDGPU::G_BITCAST) { + RALDst = SrcMI.getOperand(1).getReg(); +} + +Register RALSrc = getReadAnyLaneSrc(RALDst); +if (!RALSrc) + return false; + +if (Dst.isVirtual()) { + if (SrcMI.getOpcode() != AMDGPU::G_BITCAST) { +// Src = READANYLANE RALSrc +// Dst = Copy Src +// -> +// Dst = RALSrc +MRI.replaceRegWith(Dst, RALSrc); + } else { +// RALDst = READANYLANE RALSrc +// Src = G_BITCAST RALDst +// Dst = Copy Src +// -> +// NewVgpr = G_BITCAST RALDst +// Dst = NewVgpr +auto Bitcast = B.buildBitcast({VgprRB, MRI.getType(Src)}, RALSrc); petar-avramovic wrote: No, have to set it manually before using the builder, it was a bug. https://github.com/llvm/llvm-project/pull/142789 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)
@@ -137,7 +138,123 @@ class AMDGPURegBankLegalizeCombiner { return {MatchMI, MatchMI->getOperand(1).getReg()}; } + std::tuple tryMatchRALFromUnmerge(Register Src) { +auto *ReadAnyLane = MRI.getVRegDef(Src); +if (ReadAnyLane->getOpcode() == AMDGPU::G_AMDGPU_READANYLANE) { + Register RALSrc = ReadAnyLane->getOperand(1).getReg(); + auto *UnMerge = getOpcodeDef(RALSrc, MRI); + if (UnMerge) +return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)}; +} +return {nullptr, -1}; + } + + Register getReadAnyLaneSrc(Register Src) { +// Src = G_AMDGPU_READANYLANE RALSrc +auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + +// LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc +// LoSgpr = G_AMDGPU_READANYLANE LoVgpr +// HiSgpr = G_AMDGPU_READANYLANE HiVgpr +// Src G_MERGE_VALUES LoSgpr, HiSgpr +auto *Merge = getOpcodeDef(Src, MRI); +if (Merge) { + unsigned NumElts = Merge->getNumSources(); + auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0)); + if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0) +return {}; + + // check if all elements are from same unmerge and there is no shuffling + for (unsigned i = 1; i < NumElts; ++i) { +auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i)); +if (UnmergeI != Unmerge || (unsigned)IdxI != i) + return {}; + } + return Unmerge->getSourceReg(); +} + +// ..., VgprI, ... = G_UNMERGE_VALUES VgprLarge +// SgprI = G_AMDGPU_READANYLANE VgprI +// SgprLarge G_MERGE_VALUES ..., SgprI, ... +// ..., Src, ... = G_UNMERGE_VALUES SgprLarge +auto *UnMerge = getOpcodeDef(Src, MRI); +if (UnMerge) { + int Idx = UnMerge->findRegisterDefOperandIdx(Src, nullptr); + auto *Merge = getOpcodeDef(UnMerge->getSourceReg(), MRI); + if (Merge) { +auto [RAL, RALSrc] = +tryMatch(Merge->getSourceReg(Idx), AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + } +} + +return {}; + } + + bool tryEliminateReadAnyLane(MachineInstr &Copy) { +Register Dst = Copy.getOperand(0).getReg(); +Register Src = Copy.getOperand(1).getReg(); +if (!Src.isVirtual()) + return false; + +Register RALDst = Src; +MachineInstr &SrcMI = *MRI.getVRegDef(Src); +if (SrcMI.getOpcode() == AMDGPU::G_BITCAST) { + RALDst = SrcMI.getOperand(1).getReg(); +} petar-avramovic wrote: Not sure, did not see any cases yet https://github.com/llvm/llvm-project/pull/142789 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
@@ -165,6 +165,8 @@ enum RegBankLLTMappingApplyID { Sgpr32Trunc, // Src only modifiers: waterfalls, extends + Sgpr32_W, + SgprV4S32_W, petar-avramovic wrote: Added one above, is it clear now? https://github.com/llvm/llvm-project/pull/142790 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
@@ -57,6 +57,226 @@ void RegBankLegalizeHelper::findRuleAndApplyMapping(MachineInstr &MI) { lower(MI, Mapping, WaterfallSgprs); } +bool RegBankLegalizeHelper::executeInWaterfallLoop( +MachineIRBuilder &B, iterator_range Range, +SmallSet &SGPROperandRegs) { + // Track use registers which have already been expanded with a readfirstlane + // sequence. This may have multiple uses if moving a sequence. + DenseMap WaterfalledRegMap; + + MachineBasicBlock &MBB = B.getMBB(); + MachineFunction &MF = B.getMF(); + + const SIRegisterInfo *TRI = ST.getRegisterInfo(); + const TargetRegisterClass *WaveRC = TRI->getWaveMaskRegClass(); + unsigned MovExecOpc, MovExecTermOpc, XorTermOpc, AndSaveExecOpc, ExecReg; + if (ST.isWave32()) { petar-avramovic wrote: it is instantiated per ST, MRI pair, not per function https://github.com/llvm/llvm-project/pull/142790 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CI] Migrate to runtimes build (PR #142696)
@@ -49,8 +49,7 @@ }, "lld": {"bolt", "cross-project-tests"}, # TODO(issues/132795): LLDB should be enabled on clang changes. -"clang": {"clang-tools-extra", "compiler-rt", "cross-project-tests"}, -"clang-tools-extra": {"libc"}, Endilll wrote: I see that `clang-tools-extra` used to depend on `libc`, but I can't find it anywhere now https://github.com/llvm/llvm-project/pull/142696 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)
nikic wrote: The way FileCheck works this will pass even if the metadata is not dropped. You could try whether `FileCheck --match-full-lines` works. Otherwise you could use explicit `CHECK-NOT` or `{{$}}`. https://github.com/llvm/llvm-project/pull/87573 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] DAG: Move soft float predicate management into RuntimeLibcalls (PR #142905)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/142905 >From a3cb3a4361182158b16e85952309c2ebbe9dfb32 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Thu, 5 Jun 2025 14:22:55 +0900 Subject: [PATCH] DAG: Move soft float predicate management into RuntimeLibcalls Work towards making RuntimeLibcalls the centralized location for all libcall information. This requires changing the encoding from tracking the ISD::CondCode to using CmpInst::Predicate. --- llvm/include/llvm/CodeGen/TargetLowering.h| 14 +- llvm/include/llvm/IR/RuntimeLibcalls.h| 25 +++ .../CodeGen/SelectionDAG/TargetLowering.cpp | 5 +- llvm/lib/IR/RuntimeLibcalls.cpp | 36 llvm/lib/Target/ARM/ARMISelLowering.cpp | 178 +- llvm/lib/Target/MSP430/MSP430ISelLowering.cpp | 130 ++--- 6 files changed, 224 insertions(+), 164 deletions(-) diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h index 9c453f51e129d..0d157de479141 100644 --- a/llvm/include/llvm/CodeGen/TargetLowering.h +++ b/llvm/include/llvm/CodeGen/TargetLowering.h @@ -3572,20 +3572,18 @@ class LLVM_ABI TargetLoweringBase { /// Override the default CondCode to be used to test the result of the /// comparison libcall against zero. - /// FIXME: This can't be merged with 'RuntimeLibcallsInfo' because of the ISD. - void setCmpLibcallCC(RTLIB::Libcall Call, ISD::CondCode CC) { -CmpLibcallCCs[Call] = CC; + /// FIXME: This should be removed + void setCmpLibcallCC(RTLIB::Libcall Call, CmpInst::Predicate Pred) { +Libcalls.setSoftFloatCmpLibcallPredicate(Call, Pred); } - /// Get the CondCode that's to be used to test the result of the comparison /// libcall against zero. - /// FIXME: This can't be merged with 'RuntimeLibcallsInfo' because of the ISD. - ISD::CondCode getCmpLibcallCC(RTLIB::Libcall Call) const { -return CmpLibcallCCs[Call]; + CmpInst::Predicate + getSoftFloatCmpLibcallPredicate(RTLIB::Libcall Call) const { +return Libcalls.getSoftFloatCmpLibcallPredicate(Call); } - /// Set the CallingConv that should be used for the specified libcall. void setLibcallCallingConv(RTLIB::Libcall Call, CallingConv::ID CC) { Libcalls.setLibcallCallingConv(Call, CC); diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h b/llvm/include/llvm/IR/RuntimeLibcalls.h index 26c085031a48a..6cc65fabfcc99 100644 --- a/llvm/include/llvm/IR/RuntimeLibcalls.h +++ b/llvm/include/llvm/IR/RuntimeLibcalls.h @@ -16,6 +16,7 @@ #include "llvm/ADT/ArrayRef.h" #include "llvm/IR/CallingConv.h" +#include "llvm/IR/InstrTypes.h" #include "llvm/Support/AtomicOrdering.h" #include "llvm/Support/Compiler.h" #include "llvm/TargetParser/Triple.h" @@ -73,6 +74,20 @@ struct RuntimeLibcallsInfo { LibcallRoutineNames + RTLIB::UNKNOWN_LIBCALL); } + /// Get the comparison predicate that's to be used to test the result of the + /// comparison libcall against zero. This should only be used with + /// floating-point compare libcalls. + CmpInst::Predicate + getSoftFloatCmpLibcallPredicate(RTLIB::Libcall Call) const { +return SoftFloatCompareLibcallPredicates[Call]; + } + + // FIXME: This should be removed. This should be private constant. + void setSoftFloatCmpLibcallPredicate(RTLIB::Libcall Call, + CmpInst::Predicate Pred) { +SoftFloatCompareLibcallPredicates[Call] = Pred; + } + private: /// Stores the name each libcall. const char *LibcallRoutineNames[RTLIB::UNKNOWN_LIBCALL + 1]; @@ -80,6 +95,14 @@ struct RuntimeLibcallsInfo { /// Stores the CallingConv that should be used for each libcall. CallingConv::ID LibcallCallingConvs[RTLIB::UNKNOWN_LIBCALL]; + /// The condition type that should be used to test the result of each of the + /// soft floating-point comparison libcall against integer zero. + /// + // FIXME: This is only relevant for the handful of floating-point comparison + // runtime calls; it's excessive to have a table entry for every single + // opcode. + CmpInst::Predicate SoftFloatCompareLibcallPredicates[RTLIB::UNKNOWN_LIBCALL]; + static bool darwinHasSinCos(const Triple &TT) { assert(TT.isOSDarwin() && "should be called with darwin triple"); // Don't bother with 32 bit x86. @@ -95,6 +118,8 @@ struct RuntimeLibcallsInfo { return true; } + void initSoftFloatCmpLibcallPredicates(); + /// Set default libcall names. If a target wants to opt-out of a libcall it /// should be placed here. LLVM_ABI void initLibcalls(const Triple &TT); diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index 4472a031c39f6..5105c4a515fbe 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -14,6 +14,7 @@ #include "llvm/ADT/STLExtras.h" #include "llvm/Analysis/ValueTrackin
[llvm-branch-commits] [llvm] RuntimeLibcalls: Cleanup sincos predicate functions (PR #143081)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/143081 The darwinHasSinCos wasn't actually used for sincos, only the stret variant. Rename this to reflect that, and introduce a new one for enabling sincos. >From ee79ca11029ca60e9b6062cde3d0f468c2d5a7b3 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Fri, 6 Jun 2025 15:15:53 +0900 Subject: [PATCH] RuntimeLibcalls: Cleanup sincos predicate functions The darwinHasSinCos wasn't actually used for sincos, only the stret variant. Rename this to reflect that, and introduce a new one for enabling sincos. --- llvm/include/llvm/IR/RuntimeLibcalls.h | 8 +++- llvm/lib/IR/RuntimeLibcalls.cpp| 5 ++--- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h b/llvm/include/llvm/IR/RuntimeLibcalls.h index 6cc65fabfcc99..d2704d5aa2616 100644 --- a/llvm/include/llvm/IR/RuntimeLibcalls.h +++ b/llvm/include/llvm/IR/RuntimeLibcalls.h @@ -103,7 +103,7 @@ struct RuntimeLibcallsInfo { // opcode. CmpInst::Predicate SoftFloatCompareLibcallPredicates[RTLIB::UNKNOWN_LIBCALL]; - static bool darwinHasSinCos(const Triple &TT) { + static bool darwinHasSinCosStret(const Triple &TT) { assert(TT.isOSDarwin() && "should be called with darwin triple"); // Don't bother with 32 bit x86. if (TT.getArch() == Triple::x86) @@ -118,6 +118,12 @@ struct RuntimeLibcallsInfo { return true; } + /// Return true if the target has sincosf/sincos/sincosl functions + static bool hasSinCos(const Triple &TT) { +return TT.isGNUEnvironment() || TT.isOSFuchsia() || + (TT.isAndroid() && !TT.isAndroidVersionLT(9)); + } + void initSoftFloatCmpLibcallPredicates(); /// Set default libcall names. If a target wants to opt-out of a libcall it diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp index 91f303c9e3d3c..a6fda0cfeadd2 100644 --- a/llvm/lib/IR/RuntimeLibcalls.cpp +++ b/llvm/lib/IR/RuntimeLibcalls.cpp @@ -170,7 +170,7 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) { break; } -if (darwinHasSinCos(TT)) { +if (darwinHasSinCosStret(TT)) { setLibcallName(RTLIB::SINCOS_STRET_F32, "__sincosf_stret"); setLibcallName(RTLIB::SINCOS_STRET_F64, "__sincos_stret"); if (TT.isWatchABI()) { @@ -214,8 +214,7 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) { setLibcallName(RTLIB::EXP10_F64, "__exp10"); } - if (TT.isGNUEnvironment() || TT.isOSFuchsia() || - (TT.isAndroid() && !TT.isAndroidVersionLT(9))) { + if (hasSinCos(TT)) { setLibcallName(RTLIB::SINCOS_F32, "sincosf"); setLibcallName(RTLIB::SINCOS_F64, "sincos"); setLibcallName(RTLIB::SINCOS_F80, "sincosl"); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] RuntimeLibcalls: Cleanup sincos predicate functions (PR #143081)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/143081?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#143081** https://app.graphite.dev/github/pr/llvm/llvm-project/143081?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/143081?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#142905** https://app.graphite.dev/github/pr/llvm/llvm-project/142905?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>: 1 other dependent PR ([#142912](https://github.com/llvm/llvm-project/pull/142912) https://app.graphite.dev/github/pr/llvm/llvm-project/142912?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>) * **#142898** https://app.graphite.dev/github/pr/llvm/llvm-project/142898?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/143081 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] RuntimeLibcalls: Use array initializers for default values (PR #143082)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/143082?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#143082** https://app.graphite.dev/github/pr/llvm/llvm-project/143082?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/143082?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#143081** https://app.graphite.dev/github/pr/llvm/llvm-project/143081?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#142905** https://app.graphite.dev/github/pr/llvm/llvm-project/142905?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>: 1 other dependent PR ([#142912](https://github.com/llvm/llvm-project/pull/142912) https://app.graphite.dev/github/pr/llvm/llvm-project/142912?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>) * **#142898** https://app.graphite.dev/github/pr/llvm/llvm-project/142898?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/143082 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] RuntimeLibcalls: Use array initializers for default values (PR #143082)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/143082 None >From 8aa7850d9ddd50d57c9d9fbbef07b9ad00ffe202 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Fri, 6 Jun 2025 14:50:57 +0900 Subject: [PATCH] RuntimeLibcalls: Use array initializers for default values --- llvm/include/llvm/IR/RuntimeLibcalls.h | 8 +--- llvm/lib/IR/RuntimeLibcalls.cpp| 10 -- 2 files changed, 5 insertions(+), 13 deletions(-) diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h b/llvm/include/llvm/IR/RuntimeLibcalls.h index d2704d5aa2616..d67430968edf1 100644 --- a/llvm/include/llvm/IR/RuntimeLibcalls.h +++ b/llvm/include/llvm/IR/RuntimeLibcalls.h @@ -90,10 +90,11 @@ struct RuntimeLibcallsInfo { private: /// Stores the name each libcall. - const char *LibcallRoutineNames[RTLIB::UNKNOWN_LIBCALL + 1]; + const char *LibcallRoutineNames[RTLIB::UNKNOWN_LIBCALL + 1] = {nullptr}; /// Stores the CallingConv that should be used for each libcall. - CallingConv::ID LibcallCallingConvs[RTLIB::UNKNOWN_LIBCALL]; + CallingConv::ID LibcallCallingConvs[RTLIB::UNKNOWN_LIBCALL] = { + CallingConv::C}; /// The condition type that should be used to test the result of each of the /// soft floating-point comparison libcall against integer zero. @@ -101,7 +102,8 @@ struct RuntimeLibcallsInfo { // FIXME: This is only relevant for the handful of floating-point comparison // runtime calls; it's excessive to have a table entry for every single // opcode. - CmpInst::Predicate SoftFloatCompareLibcallPredicates[RTLIB::UNKNOWN_LIBCALL]; + CmpInst::Predicate SoftFloatCompareLibcallPredicates[RTLIB::UNKNOWN_LIBCALL] = + {CmpInst::BAD_ICMP_PREDICATE}; static bool darwinHasSinCosStret(const Triple &TT) { assert(TT.isOSDarwin() && "should be called with darwin triple"); diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp index a6fda0cfeadd2..01978b7ae39e3 100644 --- a/llvm/lib/IR/RuntimeLibcalls.cpp +++ b/llvm/lib/IR/RuntimeLibcalls.cpp @@ -12,9 +12,6 @@ using namespace llvm; using namespace RTLIB; void RuntimeLibcallsInfo::initSoftFloatCmpLibcallPredicates() { - std::fill(SoftFloatCompareLibcallPredicates, -SoftFloatCompareLibcallPredicates + RTLIB::UNKNOWN_LIBCALL, -CmpInst::BAD_ICMP_PREDICATE); SoftFloatCompareLibcallPredicates[RTLIB::OEQ_F32] = CmpInst::ICMP_EQ; SoftFloatCompareLibcallPredicates[RTLIB::OEQ_F64] = CmpInst::ICMP_EQ; SoftFloatCompareLibcallPredicates[RTLIB::OEQ_F128] = CmpInst::ICMP_EQ; @@ -48,19 +45,12 @@ void RuntimeLibcallsInfo::initSoftFloatCmpLibcallPredicates() { /// Set default libcall names. If a target wants to opt-out of a libcall it /// should be placed here. void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) { - std::fill(std::begin(LibcallRoutineNames), std::end(LibcallRoutineNames), -nullptr); - initSoftFloatCmpLibcallPredicates(); #define HANDLE_LIBCALL(code, name) setLibcallName(RTLIB::code, name); #include "llvm/IR/RuntimeLibcalls.def" #undef HANDLE_LIBCALL - // Initialize calling conventions to their default. - for (int LC = 0; LC < RTLIB::UNKNOWN_LIBCALL; ++LC) -setLibcallCallingConv((RTLIB::Libcall)LC, CallingConv::C); - // Use the f128 variants of math functions on x86 if (TT.isX86() && TT.isGNUEnvironment()) { setLibcallName(RTLIB::REM_F128, "fmodf128"); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] RuntimeLibcalls: Cleanup sincos predicate functions (PR #143081)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/143081 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] RuntimeLibcalls: Use array initializers for default values (PR #143082)
llvmbot wrote: @llvm/pr-subscribers-llvm-ir Author: Matt Arsenault (arsenm) Changes --- Full diff: https://github.com/llvm/llvm-project/pull/143082.diff 2 Files Affected: - (modified) llvm/include/llvm/IR/RuntimeLibcalls.h (+5-3) - (modified) llvm/lib/IR/RuntimeLibcalls.cpp (-10) ``diff diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h b/llvm/include/llvm/IR/RuntimeLibcalls.h index d2704d5aa2616..d67430968edf1 100644 --- a/llvm/include/llvm/IR/RuntimeLibcalls.h +++ b/llvm/include/llvm/IR/RuntimeLibcalls.h @@ -90,10 +90,11 @@ struct RuntimeLibcallsInfo { private: /// Stores the name each libcall. - const char *LibcallRoutineNames[RTLIB::UNKNOWN_LIBCALL + 1]; + const char *LibcallRoutineNames[RTLIB::UNKNOWN_LIBCALL + 1] = {nullptr}; /// Stores the CallingConv that should be used for each libcall. - CallingConv::ID LibcallCallingConvs[RTLIB::UNKNOWN_LIBCALL]; + CallingConv::ID LibcallCallingConvs[RTLIB::UNKNOWN_LIBCALL] = { + CallingConv::C}; /// The condition type that should be used to test the result of each of the /// soft floating-point comparison libcall against integer zero. @@ -101,7 +102,8 @@ struct RuntimeLibcallsInfo { // FIXME: This is only relevant for the handful of floating-point comparison // runtime calls; it's excessive to have a table entry for every single // opcode. - CmpInst::Predicate SoftFloatCompareLibcallPredicates[RTLIB::UNKNOWN_LIBCALL]; + CmpInst::Predicate SoftFloatCompareLibcallPredicates[RTLIB::UNKNOWN_LIBCALL] = + {CmpInst::BAD_ICMP_PREDICATE}; static bool darwinHasSinCosStret(const Triple &TT) { assert(TT.isOSDarwin() && "should be called with darwin triple"); diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp index a6fda0cfeadd2..01978b7ae39e3 100644 --- a/llvm/lib/IR/RuntimeLibcalls.cpp +++ b/llvm/lib/IR/RuntimeLibcalls.cpp @@ -12,9 +12,6 @@ using namespace llvm; using namespace RTLIB; void RuntimeLibcallsInfo::initSoftFloatCmpLibcallPredicates() { - std::fill(SoftFloatCompareLibcallPredicates, -SoftFloatCompareLibcallPredicates + RTLIB::UNKNOWN_LIBCALL, -CmpInst::BAD_ICMP_PREDICATE); SoftFloatCompareLibcallPredicates[RTLIB::OEQ_F32] = CmpInst::ICMP_EQ; SoftFloatCompareLibcallPredicates[RTLIB::OEQ_F64] = CmpInst::ICMP_EQ; SoftFloatCompareLibcallPredicates[RTLIB::OEQ_F128] = CmpInst::ICMP_EQ; @@ -48,19 +45,12 @@ void RuntimeLibcallsInfo::initSoftFloatCmpLibcallPredicates() { /// Set default libcall names. If a target wants to opt-out of a libcall it /// should be placed here. void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) { - std::fill(std::begin(LibcallRoutineNames), std::end(LibcallRoutineNames), -nullptr); - initSoftFloatCmpLibcallPredicates(); #define HANDLE_LIBCALL(code, name) setLibcallName(RTLIB::code, name); #include "llvm/IR/RuntimeLibcalls.def" #undef HANDLE_LIBCALL - // Initialize calling conventions to their default. - for (int LC = 0; LC < RTLIB::UNKNOWN_LIBCALL; ++LC) -setLibcallCallingConv((RTLIB::Libcall)LC, CallingConv::C); - // Use the f128 variants of math functions on x86 if (TT.isX86() && TT.isGNUEnvironment()) { setLibcallName(RTLIB::REM_F128, "fmodf128"); `` https://github.com/llvm/llvm-project/pull/143082 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] RuntimeLibcalls: Use array initializers for default values (PR #143082)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/143082 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] RuntimeLibcalls: Cleanup sincos predicate functions (PR #143081)
llvmbot wrote: @llvm/pr-subscribers-llvm-ir Author: Matt Arsenault (arsenm) Changes The darwinHasSinCos wasn't actually used for sincos, only the stret variant. Rename this to reflect that, and introduce a new one for enabling sincos. --- Full diff: https://github.com/llvm/llvm-project/pull/143081.diff 2 Files Affected: - (modified) llvm/include/llvm/IR/RuntimeLibcalls.h (+7-1) - (modified) llvm/lib/IR/RuntimeLibcalls.cpp (+2-3) ``diff diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h b/llvm/include/llvm/IR/RuntimeLibcalls.h index 6cc65fabfcc99..d2704d5aa2616 100644 --- a/llvm/include/llvm/IR/RuntimeLibcalls.h +++ b/llvm/include/llvm/IR/RuntimeLibcalls.h @@ -103,7 +103,7 @@ struct RuntimeLibcallsInfo { // opcode. CmpInst::Predicate SoftFloatCompareLibcallPredicates[RTLIB::UNKNOWN_LIBCALL]; - static bool darwinHasSinCos(const Triple &TT) { + static bool darwinHasSinCosStret(const Triple &TT) { assert(TT.isOSDarwin() && "should be called with darwin triple"); // Don't bother with 32 bit x86. if (TT.getArch() == Triple::x86) @@ -118,6 +118,12 @@ struct RuntimeLibcallsInfo { return true; } + /// Return true if the target has sincosf/sincos/sincosl functions + static bool hasSinCos(const Triple &TT) { +return TT.isGNUEnvironment() || TT.isOSFuchsia() || + (TT.isAndroid() && !TT.isAndroidVersionLT(9)); + } + void initSoftFloatCmpLibcallPredicates(); /// Set default libcall names. If a target wants to opt-out of a libcall it diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp index 91f303c9e3d3c..a6fda0cfeadd2 100644 --- a/llvm/lib/IR/RuntimeLibcalls.cpp +++ b/llvm/lib/IR/RuntimeLibcalls.cpp @@ -170,7 +170,7 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) { break; } -if (darwinHasSinCos(TT)) { +if (darwinHasSinCosStret(TT)) { setLibcallName(RTLIB::SINCOS_STRET_F32, "__sincosf_stret"); setLibcallName(RTLIB::SINCOS_STRET_F64, "__sincos_stret"); if (TT.isWatchABI()) { @@ -214,8 +214,7 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) { setLibcallName(RTLIB::EXP10_F64, "__exp10"); } - if (TT.isGNUEnvironment() || TT.isOSFuchsia() || - (TT.isAndroid() && !TT.isAndroidVersionLT(9))) { + if (hasSinCos(TT)) { setLibcallName(RTLIB::SINCOS_F32, "sincosf"); setLibcallName(RTLIB::SINCOS_F64, "sincos"); setLibcallName(RTLIB::SINCOS_F80, "sincosl"); `` https://github.com/llvm/llvm-project/pull/143081 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)
@@ -1302,6 +1302,24 @@ static void addRange(SmallVectorImpl &EndPoints, EndPoints.push_back(High); } +MDNode *MDNode::getMergedCalleeTypeMetadata(LLVMContext &Ctx, MDNode *A, +MDNode *B) { + SmallVector AB; + SmallSet MergedCallees; + auto AddUniqueCallees = [&AB, &MergedCallees](llvm::MDNode *N) { +if (!N) + return; +for (const MDOperand &Op : N->operands()) { + Metadata *MD = Op.get(); + if (MergedCallees.insert(MD).second) +AB.push_back(MD); +} + }; + AddUniqueCallees(A); + AddUniqueCallees(B); + return llvm::MDNode::get(Ctx, AB); nikic wrote: ```suggestion return MDNode::get(Ctx, AB); ``` https://github.com/llvm/llvm-project/pull/87573 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)
@@ -1252,6 +1252,12 @@ class MDNode : public Metadata { bool isReplaceable() const { return isTemporary() || isAlwaysReplaceable(); } bool isAlwaysReplaceable() const { return getMetadataID() == DIAssignIDKind; } + bool hasGeneralizedMDString() const { nikic wrote: This looks too specific to be part of the main Metadata API. https://github.com/llvm/llvm-project/pull/87573 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)
@@ -5096,6 +5097,19 @@ void Verifier::visitCallsiteMetadata(Instruction &I, MDNode *MD) { visitCallStackMetadata(MD); } +void Verifier::visitCalleeTypeMetadata(Instruction &I, MDNode *MD) { + Check(isa(I), "!callee_type metadata should only exist on calls", +&I); + for (const MDOperand &Op : MD->operands()) { +Check(isa(Op.get()), + "The callee_type metadata must be a list of type metadata nodes"); +auto *TypeMD = cast(Op.get()); +Check(TypeMD->hasGeneralizedMDString(), + "Only generalized type metadata can be part of the callee_type " + "metadata list"); nikic wrote: The CalleeTypeMetadata.rst could be clearer on this requirement. Generalizations are mentioned, but not what this means for the metadata. https://github.com/llvm/llvm-project/pull/87573 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)
@@ -1302,6 +1302,24 @@ static void addRange(SmallVectorImpl &EndPoints, EndPoints.push_back(High); } +MDNode *MDNode::getMergedCalleeTypeMetadata(LLVMContext &Ctx, MDNode *A, +MDNode *B) { + SmallVector AB; + SmallSet MergedCallees; + auto AddUniqueCallees = [&AB, &MergedCallees](llvm::MDNode *N) { nikic wrote: ```suggestion auto AddUniqueCallees = [&AB, &MergedCallees](MDNode *N) { ``` https://github.com/llvm/llvm-project/pull/87573 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)
@@ -3377,6 +3377,11 @@ static void combineMetadata(Instruction *K, const Instruction *J, K->setMetadata(Kind, MDNode::getMostGenericAlignmentOrDereferenceable(JMD, KMD)); break; + case LLVMContext::MD_callee_type: +if (!AAOnly) + K->setMetadata(Kind, MDNode::getMergedCalleeTypeMetadata( + K->getContext(), KMD, JMD)); nikic wrote: This code appears to be untested. Check out existing metadata tests in SimplifyCFG. https://github.com/llvm/llvm-project/pull/87573 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)
@@ -4161,6 +4161,11 @@ Instruction *InstCombinerImpl::visitCallBase(CallBase &Call) { Call, Builder.CreateBitOrPointerCast(ReturnedArg, CallTy)); } + // Drop unnecessary callee_type metadata from calls that were converted + // into direct calls. + if (Call.getMetadata(LLVMContext::MD_callee_type) && !Call.isIndirectCall()) +Call.setMetadata(LLVMContext::MD_callee_type, nullptr); nikic wrote: Should indicate IR change. https://github.com/llvm/llvm-project/pull/87573 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [OpenMP] Add directive spellings introduced in spec v6.0 (PR #141772)
https://github.com/kparzysz reopened https://github.com/llvm/llvm-project/pull/141772 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] CodeGen: Move ABI option enums to support (PR #142912)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/142912 >From f8721bd055a0fb775543df2059d0979d9c3487de Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Thu, 5 Jun 2025 16:08:26 +0900 Subject: [PATCH] CodeGen: Move ABI option enums to support Move these out of TargetOptions and into Support to avoid the dependency on Target. There are similar ABI options already in Support/CodeGen.h. --- llvm/include/llvm/Support/CodeGen.h | 16 llvm/include/llvm/Target/TargetOptions.h | 17 + 2 files changed, 17 insertions(+), 16 deletions(-) diff --git a/llvm/include/llvm/Support/CodeGen.h b/llvm/include/llvm/Support/CodeGen.h index 0e42789ba932e..b7896ae5d0f83 100644 --- a/llvm/include/llvm/Support/CodeGen.h +++ b/llvm/include/llvm/Support/CodeGen.h @@ -50,6 +50,22 @@ namespace llvm { }; } + namespace FloatABI { + enum ABIType { +Default, // Target-specific (either soft or hard depending on triple, etc). +Soft,// Soft float. +Hard // Hard float. + }; + } + + enum class EABI { +Unknown, +Default, // Default means not specified +EABI4, // Target-specific (either 4, 5 or gnu depending on triple). +EABI5, +GNU + }; + /// Code generation optimization level. enum class CodeGenOptLevel { None = 0, ///< -O0 diff --git a/llvm/include/llvm/Target/TargetOptions.h b/llvm/include/llvm/Target/TargetOptions.h index fd8dad4f6f791..08d6aa36e19d8 100644 --- a/llvm/include/llvm/Target/TargetOptions.h +++ b/llvm/include/llvm/Target/TargetOptions.h @@ -16,6 +16,7 @@ #include "llvm/ADT/FloatingPointMode.h" #include "llvm/MC/MCTargetOptions.h" +#include "llvm/Support/CodeGen.h" #include @@ -24,14 +25,6 @@ namespace llvm { class MachineFunction; class MemoryBuffer; - namespace FloatABI { -enum ABIType { - Default, // Target-specific (either soft or hard depending on triple, etc). - Soft,// Soft float. - Hard // Hard float. -}; - } - namespace FPOpFusion { enum FPOpFusionMode { Fast, // Enable fusion of FP ops wherever it's profitable. @@ -70,14 +63,6 @@ namespace llvm { None// Do not use Basic Block Sections. }; - enum class EABI { -Unknown, -Default, // Default means not specified -EABI4, // Target-specific (either 4, 5 or gnu depending on triple). -EABI5, -GNU - }; - /// Identify a debugger for "tuning" the debug info. /// /// The "debugger tuning" concept allows us to present a more intuitive ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] DAG: Move soft float predicate management into RuntimeLibcalls (PR #142905)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/142905 >From a3cb3a4361182158b16e85952309c2ebbe9dfb32 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Thu, 5 Jun 2025 14:22:55 +0900 Subject: [PATCH] DAG: Move soft float predicate management into RuntimeLibcalls Work towards making RuntimeLibcalls the centralized location for all libcall information. This requires changing the encoding from tracking the ISD::CondCode to using CmpInst::Predicate. --- llvm/include/llvm/CodeGen/TargetLowering.h| 14 +- llvm/include/llvm/IR/RuntimeLibcalls.h| 25 +++ .../CodeGen/SelectionDAG/TargetLowering.cpp | 5 +- llvm/lib/IR/RuntimeLibcalls.cpp | 36 llvm/lib/Target/ARM/ARMISelLowering.cpp | 178 +- llvm/lib/Target/MSP430/MSP430ISelLowering.cpp | 130 ++--- 6 files changed, 224 insertions(+), 164 deletions(-) diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h index 9c453f51e129d..0d157de479141 100644 --- a/llvm/include/llvm/CodeGen/TargetLowering.h +++ b/llvm/include/llvm/CodeGen/TargetLowering.h @@ -3572,20 +3572,18 @@ class LLVM_ABI TargetLoweringBase { /// Override the default CondCode to be used to test the result of the /// comparison libcall against zero. - /// FIXME: This can't be merged with 'RuntimeLibcallsInfo' because of the ISD. - void setCmpLibcallCC(RTLIB::Libcall Call, ISD::CondCode CC) { -CmpLibcallCCs[Call] = CC; + /// FIXME: This should be removed + void setCmpLibcallCC(RTLIB::Libcall Call, CmpInst::Predicate Pred) { +Libcalls.setSoftFloatCmpLibcallPredicate(Call, Pred); } - /// Get the CondCode that's to be used to test the result of the comparison /// libcall against zero. - /// FIXME: This can't be merged with 'RuntimeLibcallsInfo' because of the ISD. - ISD::CondCode getCmpLibcallCC(RTLIB::Libcall Call) const { -return CmpLibcallCCs[Call]; + CmpInst::Predicate + getSoftFloatCmpLibcallPredicate(RTLIB::Libcall Call) const { +return Libcalls.getSoftFloatCmpLibcallPredicate(Call); } - /// Set the CallingConv that should be used for the specified libcall. void setLibcallCallingConv(RTLIB::Libcall Call, CallingConv::ID CC) { Libcalls.setLibcallCallingConv(Call, CC); diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h b/llvm/include/llvm/IR/RuntimeLibcalls.h index 26c085031a48a..6cc65fabfcc99 100644 --- a/llvm/include/llvm/IR/RuntimeLibcalls.h +++ b/llvm/include/llvm/IR/RuntimeLibcalls.h @@ -16,6 +16,7 @@ #include "llvm/ADT/ArrayRef.h" #include "llvm/IR/CallingConv.h" +#include "llvm/IR/InstrTypes.h" #include "llvm/Support/AtomicOrdering.h" #include "llvm/Support/Compiler.h" #include "llvm/TargetParser/Triple.h" @@ -73,6 +74,20 @@ struct RuntimeLibcallsInfo { LibcallRoutineNames + RTLIB::UNKNOWN_LIBCALL); } + /// Get the comparison predicate that's to be used to test the result of the + /// comparison libcall against zero. This should only be used with + /// floating-point compare libcalls. + CmpInst::Predicate + getSoftFloatCmpLibcallPredicate(RTLIB::Libcall Call) const { +return SoftFloatCompareLibcallPredicates[Call]; + } + + // FIXME: This should be removed. This should be private constant. + void setSoftFloatCmpLibcallPredicate(RTLIB::Libcall Call, + CmpInst::Predicate Pred) { +SoftFloatCompareLibcallPredicates[Call] = Pred; + } + private: /// Stores the name each libcall. const char *LibcallRoutineNames[RTLIB::UNKNOWN_LIBCALL + 1]; @@ -80,6 +95,14 @@ struct RuntimeLibcallsInfo { /// Stores the CallingConv that should be used for each libcall. CallingConv::ID LibcallCallingConvs[RTLIB::UNKNOWN_LIBCALL]; + /// The condition type that should be used to test the result of each of the + /// soft floating-point comparison libcall against integer zero. + /// + // FIXME: This is only relevant for the handful of floating-point comparison + // runtime calls; it's excessive to have a table entry for every single + // opcode. + CmpInst::Predicate SoftFloatCompareLibcallPredicates[RTLIB::UNKNOWN_LIBCALL]; + static bool darwinHasSinCos(const Triple &TT) { assert(TT.isOSDarwin() && "should be called with darwin triple"); // Don't bother with 32 bit x86. @@ -95,6 +118,8 @@ struct RuntimeLibcallsInfo { return true; } + void initSoftFloatCmpLibcallPredicates(); + /// Set default libcall names. If a target wants to opt-out of a libcall it /// should be placed here. LLVM_ABI void initLibcalls(const Triple &TT); diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index 4472a031c39f6..5105c4a515fbe 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -14,6 +14,7 @@ #include "llvm/ADT/STLExtras.h" #include "llvm/Analysis/ValueTrackin
[llvm-branch-commits] [llvm] CodeGen: Move ABI option enums to support (PR #142912)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/142912 >From f8721bd055a0fb775543df2059d0979d9c3487de Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Thu, 5 Jun 2025 16:08:26 +0900 Subject: [PATCH] CodeGen: Move ABI option enums to support Move these out of TargetOptions and into Support to avoid the dependency on Target. There are similar ABI options already in Support/CodeGen.h. --- llvm/include/llvm/Support/CodeGen.h | 16 llvm/include/llvm/Target/TargetOptions.h | 17 + 2 files changed, 17 insertions(+), 16 deletions(-) diff --git a/llvm/include/llvm/Support/CodeGen.h b/llvm/include/llvm/Support/CodeGen.h index 0e42789ba932e..b7896ae5d0f83 100644 --- a/llvm/include/llvm/Support/CodeGen.h +++ b/llvm/include/llvm/Support/CodeGen.h @@ -50,6 +50,22 @@ namespace llvm { }; } + namespace FloatABI { + enum ABIType { +Default, // Target-specific (either soft or hard depending on triple, etc). +Soft,// Soft float. +Hard // Hard float. + }; + } + + enum class EABI { +Unknown, +Default, // Default means not specified +EABI4, // Target-specific (either 4, 5 or gnu depending on triple). +EABI5, +GNU + }; + /// Code generation optimization level. enum class CodeGenOptLevel { None = 0, ///< -O0 diff --git a/llvm/include/llvm/Target/TargetOptions.h b/llvm/include/llvm/Target/TargetOptions.h index fd8dad4f6f791..08d6aa36e19d8 100644 --- a/llvm/include/llvm/Target/TargetOptions.h +++ b/llvm/include/llvm/Target/TargetOptions.h @@ -16,6 +16,7 @@ #include "llvm/ADT/FloatingPointMode.h" #include "llvm/MC/MCTargetOptions.h" +#include "llvm/Support/CodeGen.h" #include @@ -24,14 +25,6 @@ namespace llvm { class MachineFunction; class MemoryBuffer; - namespace FloatABI { -enum ABIType { - Default, // Target-specific (either soft or hard depending on triple, etc). - Soft,// Soft float. - Hard // Hard float. -}; - } - namespace FPOpFusion { enum FPOpFusionMode { Fast, // Enable fusion of FP ops wherever it's profitable. @@ -70,14 +63,6 @@ namespace llvm { None// Do not use Basic Block Sections. }; - enum class EABI { -Unknown, -Default, // Default means not specified -EABI4, // Target-specific (either 4, 5 or gnu depending on triple). -EABI5, -GNU - }; - /// Identify a debugger for "tuning" the debug info. /// /// The "debugger tuning" concept allows us to present a more intuitive ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] [libcxx] Include __fwd/span.h in (PR #142925)
@@ -451,6 +451,7 @@ namespace std { # if _LIBCPP_STD_VER >= 23 #include <__fwd/mdspan.h> +#include <__fwd/span.h> philnik777 wrote: Can you add a comment with the LWG issue number? If the answer is that we indeed expect users to include `` we should remove the include again. I don't expect it, but it's better to have a comment that this is technically an extension currently. https://github.com/llvm/llvm-project/pull/142925 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] [libcxx] Include __fwd/span.h in (PR #142925)
https://github.com/philnik777 edited https://github.com/llvm/llvm-project/pull/142925 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] [libcxx] Include __fwd/span.h in (PR #142925)
https://github.com/philnik777 approved this pull request. LGTM with added comment. https://github.com/llvm/llvm-project/pull/142925 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)
@@ -582,6 +582,15 @@ static SmallVector getCollapsedIndices(RewriterBase &rewriter, namespace { +/// Helper functon to return the index of the last dynamic dimension in `shape`. newling wrote: ```suggestion /// Helper functon to return the index of the last dynamic dimension in `shape`. or -1 if there are no dynamic dimensions ``` ... if I understand correctly, although it might be static_cast(0ULL - 1), not sure what that is https://github.com/llvm/llvm-project/pull/142422 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)
@@ -49,35 +49,37 @@ FailureOr> isTranspose2DSlice(vector::TransposeOp op); /// Return true if `vectorType` is a contiguous slice of `memrefType`. /// -/// Only the N = vectorType.getRank() trailing dims of `memrefType` are -/// checked (the other dims are not relevant). Note that for `vectorType` to be -/// a contiguous slice of `memrefType`, the trailing dims of the latter have -/// to be contiguous - this is checked by looking at the corresponding strides. +/// The leading unit dimensions of the vector type are ignored as they +/// are not relevant to the result. Let N be the number of the vector +/// dimensions after ignoring a leading sequence of unit ones. /// -/// There might be some restriction on the leading dim of `VectorType`: +/// For `vectorType` to be a contiguous slice of `memrefType` +/// a) the N trailing dimensions of the latter must be contiguous, and +/// b) the trailing N dimensions of `vectorType` and `memrefType`, +/// except the first of them, must match. /// -/// Case 1. If all the trailing dims of `vectorType` match the trailing dims -/// of `memrefType` then the leading dim of `vectorType` can be -/// arbitrary. -/// -///Ex. 1.1 contiguous slice, perfect match -/// vector<4x3x2xi32> from memref<5x4x3x2xi32> -///Ex. 1.2 contiguous slice, the leading dim does not match (2 != 4) -/// vector<2x3x2xi32> from memref<5x4x3x2xi32> -/// -/// Case 2. If an "internal" dim of `vectorType` does not match the -/// corresponding trailing dim in `memrefType` then the remaining -/// leading dims of `vectorType` have to be 1 (the first non-matching -/// dim can be arbitrary). +/// Examples: /// -///Ex. 2.1 non-contiguous slice, 2 != 3 and the leading dim != <1> -/// vector<2x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.2 contiguous slice, 2 != 3 and the leading dim == <1> -/// vector<1x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.3. contiguous slice, 2 != 3 and the leading dims == <1x1> -/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.4. non-contiguous slice, 2 != 3 and the leading dims != <1x1> -/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>) +/// Ex.1 contiguous slice, perfect match +/// vector<4x3x2xi32> from memref<5x4x3x2xi32> +/// Ex.2 contiguous slice, the leading dim does not match (2 != 4) +/// vector<2x3x2xi32> from memref<5x4x3x2xi32> +/// Ex.3 non-contiguous slice, 2 != 3 +/// vector<2x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.4 contiguous slice, leading unit dimension of the vector ignored, +///2 != 3 (allowed) +/// vector<1x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.5. contiguous slice, leasing two unit dims of the vector ignored, newling wrote: ```suggestion /// Ex.5. contiguous slice, leading two unit dims of the vector ignored, ``` https://github.com/llvm/llvm-project/pull/142422 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)
https://github.com/newling edited https://github.com/llvm/llvm-project/pull/142422 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)
@@ -203,21 +206,21 @@ func.func @transfer_read_dynamic_dim_to_flatten( return %res : vector<1x2x6xi32> } -// CHECK: #[[$MAP:.*]] = affine_map<()[s0, s1] -> (s0 * 24 + s1 * 6)> +// CHECK: #[[$MAP:.+]] = affine_map<()[s0, s1] -> (s0 * 24 + s1 * 6)> // CHECK-LABEL: func.func @transfer_read_dynamic_dim_to_flatten // CHECK-SAME:%[[IDX_1:arg0]] // CHECK-SAME:%[[IDX_2:arg1]] // CHECK-SAME:%[[MEM:arg2]] -// CHECK: %[[C0_I32:.*]] = arith.constant 0 : i32 newling wrote: For my own learning, is there an advantage to using + over * ? Maybe lit can process/match this faster? https://github.com/llvm/llvm-project/pull/142422 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)
@@ -49,35 +49,37 @@ FailureOr> isTranspose2DSlice(vector::TransposeOp op); /// Return true if `vectorType` is a contiguous slice of `memrefType`. /// -/// Only the N = vectorType.getRank() trailing dims of `memrefType` are -/// checked (the other dims are not relevant). Note that for `vectorType` to be -/// a contiguous slice of `memrefType`, the trailing dims of the latter have -/// to be contiguous - this is checked by looking at the corresponding strides. +/// The leading unit dimensions of the vector type are ignored as they +/// are not relevant to the result. Let N be the number of the vector +/// dimensions after ignoring a leading sequence of unit ones. /// -/// There might be some restriction on the leading dim of `VectorType`: +/// For `vectorType` to be a contiguous slice of `memrefType` +/// a) the N trailing dimensions of the latter must be contiguous, and +/// b) the trailing N dimensions of `vectorType` and `memrefType`, +/// except the first of them, must match. newling wrote: ```suggestion ``` https://github.com/llvm/llvm-project/pull/142422 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)
@@ -49,35 +49,37 @@ FailureOr> isTranspose2DSlice(vector::TransposeOp op); /// Return true if `vectorType` is a contiguous slice of `memrefType`. /// -/// Only the N = vectorType.getRank() trailing dims of `memrefType` are -/// checked (the other dims are not relevant). Note that for `vectorType` to be -/// a contiguous slice of `memrefType`, the trailing dims of the latter have -/// to be contiguous - this is checked by looking at the corresponding strides. +/// The leading unit dimensions of the vector type are ignored as they +/// are not relevant to the result. Let N be the number of the vector +/// dimensions after ignoring a leading sequence of unit ones. /// -/// There might be some restriction on the leading dim of `VectorType`: +/// For `vectorType` to be a contiguous slice of `memrefType` +/// a) the N trailing dimensions of the latter must be contiguous, and +/// b) the trailing N dimensions of `vectorType` and `memrefType`, +/// except the first of them, must match. /// -/// Case 1. If all the trailing dims of `vectorType` match the trailing dims -/// of `memrefType` then the leading dim of `vectorType` can be -/// arbitrary. -/// -///Ex. 1.1 contiguous slice, perfect match -/// vector<4x3x2xi32> from memref<5x4x3x2xi32> -///Ex. 1.2 contiguous slice, the leading dim does not match (2 != 4) -/// vector<2x3x2xi32> from memref<5x4x3x2xi32> -/// -/// Case 2. If an "internal" dim of `vectorType` does not match the -/// corresponding trailing dim in `memrefType` then the remaining -/// leading dims of `vectorType` have to be 1 (the first non-matching -/// dim can be arbitrary). +/// Examples: /// -///Ex. 2.1 non-contiguous slice, 2 != 3 and the leading dim != <1> -/// vector<2x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.2 contiguous slice, 2 != 3 and the leading dim == <1> -/// vector<1x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.3. contiguous slice, 2 != 3 and the leading dims == <1x1> -/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.4. non-contiguous slice, 2 != 3 and the leading dims != <1x1> -/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>) +/// Ex.1 contiguous slice, perfect match +/// vector<4x3x2xi32> from memref<5x4x3x2xi32> +/// Ex.2 contiguous slice, the leading dim does not match (2 != 4) +/// vector<2x3x2xi32> from memref<5x4x3x2xi32> +/// Ex.3 non-contiguous slice, 2 != 3 +/// vector<2x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.4 contiguous slice, leading unit dimension of the vector ignored, +///2 != 3 (allowed) +/// vector<1x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.5. contiguous slice, leasing two unit dims of the vector ignored, +/// 2 != 3 (allowed) +/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.6. non-contiguous slice, 2 != 3, no leading sequence of unit dims +/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>) +/// Ex.7 contiguous slice, memref needs to be contiguous only on the last +///dimension +/// vector<1x1x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>> +/// Ex.8 non-contiguous slice, memref needs to be contiguous one the last newling wrote: ```suggestion /// Ex.8 non-contiguous slice, memref needs to be contiguous in the last ``` https://github.com/llvm/llvm-project/pull/142422 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)
@@ -630,7 +639,10 @@ class FlattenContiguousRowMajorTransferReadPattern if (transferReadOp.getMask()) return failure(); -int64_t firstDimToCollapse = sourceType.getRank() - vectorType.getRank(); newling wrote: Why does this need to change? If memref is rank n+2 and vector is rank n, isn't it always fine to flatten the memref from index 2? So that memref becomes rank 3 and vector becomes rank 1. Isn't having a rank-1 vector the goal here? https://github.com/llvm/llvm-project/pull/142422 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)
@@ -49,35 +49,37 @@ FailureOr> isTranspose2DSlice(vector::TransposeOp op); /// Return true if `vectorType` is a contiguous slice of `memrefType`. /// -/// Only the N = vectorType.getRank() trailing dims of `memrefType` are -/// checked (the other dims are not relevant). Note that for `vectorType` to be -/// a contiguous slice of `memrefType`, the trailing dims of the latter have -/// to be contiguous - this is checked by looking at the corresponding strides. +/// The leading unit dimensions of the vector type are ignored as they +/// are not relevant to the result. Let N be the number of the vector +/// dimensions after ignoring a leading sequence of unit ones. /// -/// There might be some restriction on the leading dim of `VectorType`: +/// For `vectorType` to be a contiguous slice of `memrefType` +/// a) the N trailing dimensions of the latter must be contiguous, and newling wrote: ```suggestion /// a) the N trailing dimensions of `memrefType` must be contiguous, and ``` https://github.com/llvm/llvm-project/pull/142422 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)
@@ -49,35 +49,37 @@ FailureOr> isTranspose2DSlice(vector::TransposeOp op); /// Return true if `vectorType` is a contiguous slice of `memrefType`. /// -/// Only the N = vectorType.getRank() trailing dims of `memrefType` are -/// checked (the other dims are not relevant). Note that for `vectorType` to be -/// a contiguous slice of `memrefType`, the trailing dims of the latter have -/// to be contiguous - this is checked by looking at the corresponding strides. +/// The leading unit dimensions of the vector type are ignored as they +/// are not relevant to the result. Let N be the number of the vector +/// dimensions after ignoring a leading sequence of unit ones. /// -/// There might be some restriction on the leading dim of `VectorType`: +/// For `vectorType` to be a contiguous slice of `memrefType` +/// a) the N trailing dimensions of the latter must be contiguous, and +/// b) the trailing N dimensions of `vectorType` and `memrefType`, +/// except the first of them, must match. /// -/// Case 1. If all the trailing dims of `vectorType` match the trailing dims -/// of `memrefType` then the leading dim of `vectorType` can be -/// arbitrary. -/// -///Ex. 1.1 contiguous slice, perfect match -/// vector<4x3x2xi32> from memref<5x4x3x2xi32> -///Ex. 1.2 contiguous slice, the leading dim does not match (2 != 4) -/// vector<2x3x2xi32> from memref<5x4x3x2xi32> -/// -/// Case 2. If an "internal" dim of `vectorType` does not match the -/// corresponding trailing dim in `memrefType` then the remaining -/// leading dims of `vectorType` have to be 1 (the first non-matching -/// dim can be arbitrary). +/// Examples: /// -///Ex. 2.1 non-contiguous slice, 2 != 3 and the leading dim != <1> -/// vector<2x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.2 contiguous slice, 2 != 3 and the leading dim == <1> -/// vector<1x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.3. contiguous slice, 2 != 3 and the leading dims == <1x1> -/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.4. non-contiguous slice, 2 != 3 and the leading dims != <1x1> -/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>) +/// Ex.1 contiguous slice, perfect match +/// vector<4x3x2xi32> from memref<5x4x3x2xi32> +/// Ex.2 contiguous slice, the leading dim does not match (2 != 4) +/// vector<2x3x2xi32> from memref<5x4x3x2xi32> +/// Ex.3 non-contiguous slice, 2 != 3 +/// vector<2x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.4 contiguous slice, leading unit dimension of the vector ignored, +///2 != 3 (allowed) +/// vector<1x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.5. contiguous slice, leasing two unit dims of the vector ignored, +/// 2 != 3 (allowed) +/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.6. non-contiguous slice, 2 != 3, no leading sequence of unit dims +/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>) +/// Ex.7 contiguous slice, memref needs to be contiguous only on the last +///dimension +/// vector<1x1x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>> +/// Ex.8 non-contiguous slice, memref needs to be contiguous one the last +///two dimensions, and it isn't +/// vector<1x2x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>> newling wrote: These 8 examples cover all the situations I can think of, other than where memref has a dynamic size. Can you please confirm that they're all tested? https://github.com/llvm/llvm-project/pull/142422 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)
@@ -49,35 +49,37 @@ FailureOr> isTranspose2DSlice(vector::TransposeOp op); /// Return true if `vectorType` is a contiguous slice of `memrefType`. /// -/// Only the N = vectorType.getRank() trailing dims of `memrefType` are -/// checked (the other dims are not relevant). Note that for `vectorType` to be -/// a contiguous slice of `memrefType`, the trailing dims of the latter have -/// to be contiguous - this is checked by looking at the corresponding strides. +/// The leading unit dimensions of the vector type are ignored as they +/// are not relevant to the result. Let N be the number of the vector +/// dimensions after ignoring a leading sequence of unit ones. /// -/// There might be some restriction on the leading dim of `VectorType`: +/// For `vectorType` to be a contiguous slice of `memrefType` +/// a) the N trailing dimensions of the latter must be contiguous, and +/// b) the trailing N dimensions of `vectorType` and `memrefType`, newling wrote: ```suggestion /// b) the trailing N-1 dimensions of `vectorType` and `memrefType` must match. ``` https://github.com/llvm/llvm-project/pull/142422 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)
https://github.com/newling commented: Thanks! Other than my question about the change to first dimension of the memref that gets collapsed, my comments are all quite minor. https://github.com/llvm/llvm-project/pull/142422 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT] Expose external entry count for functions (PR #141674)
https://github.com/aaupov updated https://github.com/llvm/llvm-project/pull/141674 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT] Expose external entry count for functions (PR #141674)
https://github.com/aaupov updated https://github.com/llvm/llvm-project/pull/141674 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Expose external entry count for functions (PR #141674)
https://github.com/aaupov edited https://github.com/llvm/llvm-project/pull/141674 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] LowerTypeTests: Shrink check size by 1 instruction on x86. (PR #142887)
https://github.com/fmayer commented: Could we have a test that demonstrates the new better instruction sequence (by precommiting to show the diff here)? https://github.com/llvm/llvm-project/pull/142887 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] WebAssembly: Stop directly using RuntimeLibcalls.def (PR #143054)
llvmbot wrote: @llvm/pr-subscribers-backend-webassembly Author: Matt Arsenault (arsenm) Changes Construct RuntimeLibcallsInfo instead of manually creating a map. This was repeating the setting of the RETURN_ADDRESS. This removes an obstacle to generating libcall information with tablegen. This is also not great, since it's setting a static map which would be broken if there were ever a triple with a different libcall configuration. --- Full diff: https://github.com/llvm/llvm-project/pull/143054.diff 1 Files Affected: - (modified) llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp (+12-15) ``diff diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp index ce795d3dedc6a..9622b5a54dc62 100644 --- a/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp +++ b/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp @@ -528,23 +528,20 @@ RuntimeLibcallSignatureTable &getRuntimeLibcallSignatures() { // constructor for use with a static variable struct StaticLibcallNameMap { StringMap Map; - StaticLibcallNameMap() { -static const std::pair NameLibcalls[] = { -#define HANDLE_LIBCALL(code, name) {(const char *)name, RTLIB::code}, -#include "llvm/IR/RuntimeLibcalls.def" -#undef HANDLE_LIBCALL -}; -for (const auto &NameLibcall : NameLibcalls) { - if (NameLibcall.first != nullptr && - getRuntimeLibcallSignatures().Table[NameLibcall.second] != - unsupported) { -assert(!Map.contains(NameLibcall.first) && + StaticLibcallNameMap(const Triple &TT) { +// FIXME: This is broken if there are ever different triples compiled with +// different libcalls. +RTLIB::RuntimeLibcallsInfo RTCI(TT); +for (int I = 0; I < RTLIB::UNKNOWN_LIBCALL; ++I) { + RTLIB::Libcall LC = static_cast(I); + const char *NameLibcall = RTCI.getLibcallName(LC); + if (NameLibcall != nullptr && + getRuntimeLibcallSignatures().Table[LC] != unsupported) { +assert(!Map.contains(NameLibcall) && "duplicate libcall names in name map"); -Map[NameLibcall.first] = NameLibcall.second; +Map[NameLibcall] = LC; } } - -Map["emscripten_return_address"] = RTLIB::RETURN_ADDRESS; } }; @@ -940,7 +937,7 @@ void WebAssembly::getLibcallSignature(const WebAssemblySubtarget &Subtarget, StringRef Name, SmallVectorImpl &Rets, SmallVectorImpl &Params) { - static StaticLibcallNameMap LibcallNameMap; + static StaticLibcallNameMap LibcallNameMap(Subtarget.getTargetTriple()); auto &Map = LibcallNameMap.Map; auto Val = Map.find(Name); #ifndef NDEBUG `` https://github.com/llvm/llvm-project/pull/143054 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] WebAssembly: Stop directly using RuntimeLibcalls.def (PR #143054)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/143054 Construct RuntimeLibcallsInfo instead of manually creating a map. This was repeating the setting of the RETURN_ADDRESS. This removes an obstacle to generating libcall information with tablegen. This is also not great, since it's setting a static map which would be broken if there were ever a triple with a different libcall configuration. >From 9405d81822edcfc0071c8de5c1d09dcb8ea22910 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Fri, 6 Jun 2025 10:01:59 +0900 Subject: [PATCH] WebAssembly: Stop directly using RuntimeLibcalls.def Construct RuntimeLibcallsInfo instead of manually creating a map. This was repeating the setting of the RETURN_ADDRESS. This removes an obstacle to generating libcall information with tablegen. This is also not great, since it's setting a static map which would be broken if there were ever a triple with a different libcall configuration. --- .../WebAssemblyRuntimeLibcallSignatures.cpp | 27 +-- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp index ce795d3dedc6a..9622b5a54dc62 100644 --- a/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp +++ b/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp @@ -528,23 +528,20 @@ RuntimeLibcallSignatureTable &getRuntimeLibcallSignatures() { // constructor for use with a static variable struct StaticLibcallNameMap { StringMap Map; - StaticLibcallNameMap() { -static const std::pair NameLibcalls[] = { -#define HANDLE_LIBCALL(code, name) {(const char *)name, RTLIB::code}, -#include "llvm/IR/RuntimeLibcalls.def" -#undef HANDLE_LIBCALL -}; -for (const auto &NameLibcall : NameLibcalls) { - if (NameLibcall.first != nullptr && - getRuntimeLibcallSignatures().Table[NameLibcall.second] != - unsupported) { -assert(!Map.contains(NameLibcall.first) && + StaticLibcallNameMap(const Triple &TT) { +// FIXME: This is broken if there are ever different triples compiled with +// different libcalls. +RTLIB::RuntimeLibcallsInfo RTCI(TT); +for (int I = 0; I < RTLIB::UNKNOWN_LIBCALL; ++I) { + RTLIB::Libcall LC = static_cast(I); + const char *NameLibcall = RTCI.getLibcallName(LC); + if (NameLibcall != nullptr && + getRuntimeLibcallSignatures().Table[LC] != unsupported) { +assert(!Map.contains(NameLibcall) && "duplicate libcall names in name map"); -Map[NameLibcall.first] = NameLibcall.second; +Map[NameLibcall] = LC; } } - -Map["emscripten_return_address"] = RTLIB::RETURN_ADDRESS; } }; @@ -940,7 +937,7 @@ void WebAssembly::getLibcallSignature(const WebAssemblySubtarget &Subtarget, StringRef Name, SmallVectorImpl &Rets, SmallVectorImpl &Params) { - static StaticLibcallNameMap LibcallNameMap; + static StaticLibcallNameMap LibcallNameMap(Subtarget.getTargetTriple()); auto &Map = LibcallNameMap.Map; auto Val = Map.find(Name); #ifndef NDEBUG ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] WebAssembly: Stop directly using RuntimeLibcalls.def (PR #143054)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/143054 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] WebAssembly: Stop directly using RuntimeLibcalls.def (PR #143054)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/143054?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#143054** https://app.graphite.dev/github/pr/llvm/llvm-project/143054?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/143054?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#142624** https://app.graphite.dev/github/pr/llvm/llvm-project/142624?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/143054 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add SimplifyTypeTests pass. (PR #141327)
https://github.com/teresajohnson approved this pull request. lgtm but I think there is a code formatting error reported that should be fixed before merging. https://github.com/llvm/llvm-project/pull/141327 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)
https://github.com/momchil-velikov updated https://github.com/llvm/llvm-project/pull/142422 >From 8f9a4002820dcd3de2a5986d53749386a2507eab Mon Sep 17 00:00:00 2001 From: Momchil Velikov Date: Mon, 2 Jun 2025 15:13:13 + Subject: [PATCH 1/4] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` Previously, slices were sometimes marked as non-contiguous when they were actually contiguous. This occurred when the vector type had leading unit dimensions, e.g., `vector<1x1x...x1xd0xd1x...xdn-1xT>``. In such cases, only the trailing n dimensions of the memref need to be contiguous, not the entire vector rank. This affects how `FlattenContiguousRowMajorTransfer{Read,Write}Pattern` flattens `transfer_read` and `transfer_write`` ops. The pattern used to collapse a number of dimensions equal the vector rank, which may be is incorrect when leading dimensions are unit-sized. This patch fixes the issue by collapsing only as many trailing memref dimensions as are actually contiguous. --- .../mlir/Dialect/Vector/Utils/VectorUtils.h | 54 - .../Transforms/VectorTransferOpTransforms.cpp | 8 +- mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp | 25 ++-- .../Vector/vector-transfer-flatten.mlir | 108 +- 4 files changed, 120 insertions(+), 75 deletions(-) diff --git a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h index 6609b28d77b6c..ed06d7a029494 100644 --- a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h +++ b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h @@ -49,35 +49,37 @@ FailureOr> isTranspose2DSlice(vector::TransposeOp op); /// Return true if `vectorType` is a contiguous slice of `memrefType`. /// -/// Only the N = vectorType.getRank() trailing dims of `memrefType` are -/// checked (the other dims are not relevant). Note that for `vectorType` to be -/// a contiguous slice of `memrefType`, the trailing dims of the latter have -/// to be contiguous - this is checked by looking at the corresponding strides. +/// The leading unit dimensions of the vector type are ignored as they +/// are not relevant to the result. Let N be the number of the vector +/// dimensions after ignoring a leading sequence of unit ones. /// -/// There might be some restriction on the leading dim of `VectorType`: +/// For `vectorType` to be a contiguous slice of `memrefType` +/// a) the N trailing dimensions of the latter must be contiguous, and +/// b) the trailing N dimensions of `vectorType` and `memrefType`, +/// except the first of them, must match. /// -/// Case 1. If all the trailing dims of `vectorType` match the trailing dims -/// of `memrefType` then the leading dim of `vectorType` can be -/// arbitrary. -/// -///Ex. 1.1 contiguous slice, perfect match -/// vector<4x3x2xi32> from memref<5x4x3x2xi32> -///Ex. 1.2 contiguous slice, the leading dim does not match (2 != 4) -/// vector<2x3x2xi32> from memref<5x4x3x2xi32> -/// -/// Case 2. If an "internal" dim of `vectorType` does not match the -/// corresponding trailing dim in `memrefType` then the remaining -/// leading dims of `vectorType` have to be 1 (the first non-matching -/// dim can be arbitrary). +/// Examples: /// -///Ex. 2.1 non-contiguous slice, 2 != 3 and the leading dim != <1> -/// vector<2x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.2 contiguous slice, 2 != 3 and the leading dim == <1> -/// vector<1x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.3. contiguous slice, 2 != 3 and the leading dims == <1x1> -/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.4. non-contiguous slice, 2 != 3 and the leading dims != <1x1> -/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>) +/// Ex.1 contiguous slice, perfect match +/// vector<4x3x2xi32> from memref<5x4x3x2xi32> +/// Ex.2 contiguous slice, the leading dim does not match (2 != 4) +/// vector<2x3x2xi32> from memref<5x4x3x2xi32> +/// Ex.3 non-contiguous slice, 2 != 3 +/// vector<2x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.4 contiguous slice, leading unit dimension of the vector ignored, +///2 != 3 (allowed) +/// vector<1x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.5. contiguous slice, leasing two unit dims of the vector ignored, +/// 2 != 3 (allowed) +/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.6. non-contiguous slice, 2 != 3, no leading sequence of unit dims +/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>) +/// Ex.7 contiguous slice, memref needs to be contiguous only on the last +///dimension +/// vector<1x1x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>> +/// Ex.8 non-contiguous slice, memref needs to be contiguous one the last +///two dimensions, and it isn't +/// vector<1x2x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>> bool isContiguo
[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)
https://github.com/momchil-velikov updated https://github.com/llvm/llvm-project/pull/142422 >From 8f9a4002820dcd3de2a5986d53749386a2507eab Mon Sep 17 00:00:00 2001 From: Momchil Velikov Date: Mon, 2 Jun 2025 15:13:13 + Subject: [PATCH 1/4] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` Previously, slices were sometimes marked as non-contiguous when they were actually contiguous. This occurred when the vector type had leading unit dimensions, e.g., `vector<1x1x...x1xd0xd1x...xdn-1xT>``. In such cases, only the trailing n dimensions of the memref need to be contiguous, not the entire vector rank. This affects how `FlattenContiguousRowMajorTransfer{Read,Write}Pattern` flattens `transfer_read` and `transfer_write`` ops. The pattern used to collapse a number of dimensions equal the vector rank, which may be is incorrect when leading dimensions are unit-sized. This patch fixes the issue by collapsing only as many trailing memref dimensions as are actually contiguous. --- .../mlir/Dialect/Vector/Utils/VectorUtils.h | 54 - .../Transforms/VectorTransferOpTransforms.cpp | 8 +- mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp | 25 ++-- .../Vector/vector-transfer-flatten.mlir | 108 +- 4 files changed, 120 insertions(+), 75 deletions(-) diff --git a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h index 6609b28d77b6c..ed06d7a029494 100644 --- a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h +++ b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h @@ -49,35 +49,37 @@ FailureOr> isTranspose2DSlice(vector::TransposeOp op); /// Return true if `vectorType` is a contiguous slice of `memrefType`. /// -/// Only the N = vectorType.getRank() trailing dims of `memrefType` are -/// checked (the other dims are not relevant). Note that for `vectorType` to be -/// a contiguous slice of `memrefType`, the trailing dims of the latter have -/// to be contiguous - this is checked by looking at the corresponding strides. +/// The leading unit dimensions of the vector type are ignored as they +/// are not relevant to the result. Let N be the number of the vector +/// dimensions after ignoring a leading sequence of unit ones. /// -/// There might be some restriction on the leading dim of `VectorType`: +/// For `vectorType` to be a contiguous slice of `memrefType` +/// a) the N trailing dimensions of the latter must be contiguous, and +/// b) the trailing N dimensions of `vectorType` and `memrefType`, +/// except the first of them, must match. /// -/// Case 1. If all the trailing dims of `vectorType` match the trailing dims -/// of `memrefType` then the leading dim of `vectorType` can be -/// arbitrary. -/// -///Ex. 1.1 contiguous slice, perfect match -/// vector<4x3x2xi32> from memref<5x4x3x2xi32> -///Ex. 1.2 contiguous slice, the leading dim does not match (2 != 4) -/// vector<2x3x2xi32> from memref<5x4x3x2xi32> -/// -/// Case 2. If an "internal" dim of `vectorType` does not match the -/// corresponding trailing dim in `memrefType` then the remaining -/// leading dims of `vectorType` have to be 1 (the first non-matching -/// dim can be arbitrary). +/// Examples: /// -///Ex. 2.1 non-contiguous slice, 2 != 3 and the leading dim != <1> -/// vector<2x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.2 contiguous slice, 2 != 3 and the leading dim == <1> -/// vector<1x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.3. contiguous slice, 2 != 3 and the leading dims == <1x1> -/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.4. non-contiguous slice, 2 != 3 and the leading dims != <1x1> -/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>) +/// Ex.1 contiguous slice, perfect match +/// vector<4x3x2xi32> from memref<5x4x3x2xi32> +/// Ex.2 contiguous slice, the leading dim does not match (2 != 4) +/// vector<2x3x2xi32> from memref<5x4x3x2xi32> +/// Ex.3 non-contiguous slice, 2 != 3 +/// vector<2x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.4 contiguous slice, leading unit dimension of the vector ignored, +///2 != 3 (allowed) +/// vector<1x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.5. contiguous slice, leasing two unit dims of the vector ignored, +/// 2 != 3 (allowed) +/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.6. non-contiguous slice, 2 != 3, no leading sequence of unit dims +/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>) +/// Ex.7 contiguous slice, memref needs to be contiguous only on the last +///dimension +/// vector<1x1x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>> +/// Ex.8 non-contiguous slice, memref needs to be contiguous one the last +///two dimensions, and it isn't +/// vector<1x2x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>> bool isContiguo
[llvm-branch-commits] [flang] [Flang][OpenMP] - When mapping a `fir.boxchar`, map the underlying data pointer as a member (PR #141715)
https://github.com/bhandarkar-pranav updated https://github.com/llvm/llvm-project/pull/141715 >From 2d411fc5d24c7e3e933447307fc958b7e544490b Mon Sep 17 00:00:00 2001 From: Pranav Bhandarkar Date: Fri, 23 May 2025 10:26:14 -0500 Subject: [PATCH 1/5] Fix boxchar with firstprivate --- .../Optimizer/Builder/DirectivesCommon.h | 85 +- flang/lib/Optimizer/Dialect/FIRType.cpp | 3 + .../Optimizer/OpenMP/MapInfoFinalization.cpp | 88 ++- .../OpenMP/MapsForPrivatizedSymbols.cpp | 67 -- .../Fir/convert-to-llvm-openmp-and-fir.fir| 27 ++ flang/test/Lower/OpenMP/map-character.f90 | 23 +++-- .../Lower/OpenMP/optional-argument-map-2.f90 | 63 +++-- 7 files changed, 297 insertions(+), 59 deletions(-) diff --git a/flang/include/flang/Optimizer/Builder/DirectivesCommon.h b/flang/include/flang/Optimizer/Builder/DirectivesCommon.h index 3f30c761acb4e..be11b9b5ede7c 100644 --- a/flang/include/flang/Optimizer/Builder/DirectivesCommon.h +++ b/flang/include/flang/Optimizer/Builder/DirectivesCommon.h @@ -91,6 +91,16 @@ inline AddrAndBoundsInfo getDataOperandBaseAddr(fir::FirOpBuilder &builder, return AddrAndBoundsInfo(symAddr, rawInput, isPresent, boxTy); } + // For boxchar references, do the same as what is done above for box + // references - Load the boxchar so that it is easier to retrieve the length + // of the underlying character and the data pointer. + if (auto boxCharType = mlir::dyn_cast( + fir::unwrapRefType((symAddr.getType() { +if (!isOptional && mlir::isa(symAddr.getType())) { + mlir::Value boxChar = builder.create(loc, symAddr); + return AddrAndBoundsInfo(boxChar, rawInput, isPresent); +} + } return AddrAndBoundsInfo(symAddr, rawInput, isPresent); } @@ -137,26 +147,61 @@ template mlir::Value genBoundsOpFromBoxChar(fir::FirOpBuilder &builder, mlir::Location loc, fir::ExtendedValue dataExv, AddrAndBoundsInfo &info) { - // TODO: Handle info.isPresent. - if (auto boxCharType = - mlir::dyn_cast(info.addr.getType())) { -mlir::Type idxTy = builder.getIndexType(); -mlir::Type lenType = builder.getCharacterLengthType(); + + if (!mlir::isa(fir::unwrapRefType(info.addr.getType( +return mlir::Value{}; + + mlir::Type idxTy = builder.getIndexType(); + mlir::Type lenType = builder.getCharacterLengthType(); + mlir::Value zero = builder.createIntegerConstant(loc, idxTy, 0); + mlir::Value one = builder.createIntegerConstant(loc, idxTy, 1); + using ExtentAndStride = std::tuple; + auto [extent, stride] = [&]() -> ExtentAndStride { +if (info.isPresent) { + llvm::SmallVector resTypes = {idxTy, idxTy}; + mlir::Operation::result_range ifRes = + builder.genIfOp(loc, resTypes, info.isPresent, /*withElseRegion=*/true) + .genThen([&]() { +mlir::Value boxChar = +fir::isa_ref_type(info.addr.getType()) +? builder.create(loc, info.addr) +: info.addr; +fir::BoxCharType boxCharType = +mlir::cast(boxChar.getType()); +mlir::Type refType = builder.getRefType(boxCharType.getEleTy()); +auto unboxed = builder.create( +loc, refType, lenType, boxChar); +mlir::SmallVector results = {unboxed.getResult(1), one }; +builder.create(loc, results); + }) + .genElse([&]() { +mlir::SmallVector results = {zero, zero }; +builder.create(loc, results); }) + .getResults(); + return {ifRes[0], ifRes[1]}; +} +// We have already established that info.addr.getType() is a boxchar +// or a boxchar address. If an address, load the boxchar. +mlir::Value boxChar = fir::isa_ref_type(info.addr.getType()) + ? builder.create(loc, info.addr) + : info.addr; +fir::BoxCharType boxCharType = +mlir::cast(boxChar.getType()); mlir::Type refType = builder.getRefType(boxCharType.getEleTy()); auto unboxed = -builder.create(loc, refType, lenType, info.addr); -mlir::Value zero = builder.createIntegerConstant(loc, idxTy, 0); -mlir::Value one = builder.createIntegerConstant(loc, idxTy, 1); -mlir::Value extent = unboxed.getResult(1); -mlir::Value stride = one; -mlir::Value ub = builder.create(loc, extent, one); -mlir::Type boundTy = builder.getType(); -return builder.create( -loc, boundTy, /*lower_bound=*/zero, -/*upper_bound=*/ub, /*extent=*/extent, /*stride=*/stride, -/*stride_in_bytes=*/true, /*start_idx=*/zero); - } - return mlir::Value{}; +builder.create(loc, refType, lenType, boxChar); +return {unboxed.getResult(1), one}; + }(); + + mlir::Value ub = builder.create(loc, extent, one); + mlir::Type boundTy = builder.getType()
[llvm-branch-commits] [llvm] [AMDGPU] Patterns for <2 x bfloat> fneg (fabs) (PR #142911)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/142911 >From c8524591999f495dd86261daecc44071737a227b Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Wed, 4 Jun 2025 23:49:43 -0700 Subject: [PATCH] [AMDGPU] Patterns for <2 x bfloat> fneg (fabs) --- llvm/lib/Target/AMDGPU/SIInstructions.td | 11 +++ llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll | 38 +- 2 files changed, 21 insertions(+), 28 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td index a0285e3512a08..360fd05cb3d96 100644 --- a/llvm/lib/Target/AMDGPU/SIInstructions.td +++ b/llvm/lib/Target/AMDGPU/SIInstructions.td @@ -1840,22 +1840,21 @@ def : GCNPat < (UniformUnaryFrag (v2fp16vt SReg_32:$src)), (S_AND_B32 SReg_32:$src, (S_MOV_B32 (i32 0x7fff7fff))) >; -} // This is really (fneg (fabs v2f16:$src)) // // fabs is not reported as free because there is modifier for it in // VOP3P instructions, so it is turned into the bit op. def : GCNPat < - (UniformUnaryFrag (v2f16 (bitconvert (and_oneuse (i32 SReg_32:$src), 0x7fff7fff, + (UniformUnaryFrag (v2fp16vt (bitconvert (and_oneuse (i32 SReg_32:$src), 0x7fff7fff, (S_OR_B32 SReg_32:$src, (S_MOV_B32 (i32 0x80008000))) // Set sign bit >; def : GCNPat < - (UniformUnaryFrag (v2f16 (fabs SReg_32:$src))), + (UniformUnaryFrag (v2fp16vt (fabs SReg_32:$src))), (S_OR_B32 SReg_32:$src, (S_MOV_B32 (i32 0x80008000))) // Set sign bit >; - +} // COPY_TO_REGCLASS is needed to avoid using SCC from S_XOR_B32 instead // of the real value. @@ -1986,12 +1985,12 @@ def : GCNPat < (fabs (v2fp16vt VGPR_32:$src)), (V_AND_B32_e64 (S_MOV_B32 (i32 0x7fff7fff)), VGPR_32:$src) >; -} def : GCNPat < - (fneg (v2f16 (fabs VGPR_32:$src))), + (fneg (v2fp16vt (fabs VGPR_32:$src))), (V_OR_B32_e64 (S_MOV_B32 (i32 0x80008000)), VGPR_32:$src) >; +} def : GCNPat < (fabs (f64 VReg_64:$src)), diff --git a/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll b/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll index 243469d39cc11..d189b6d4c1e83 100644 --- a/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll +++ b/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll @@ -523,8 +523,7 @@ define amdgpu_kernel void @s_fneg_fabs_v2bf16_non_bc_src(ptr addrspace(1) %out, ; VI-NEXT:v_cndmask_b32_e32 v1, v2, v3, vcc ; VI-NEXT:v_lshrrev_b32_e32 v1, 16, v1 ; VI-NEXT:v_alignbit_b32 v0, v1, v0, 16 -; VI-NEXT:v_and_b32_e32 v0, 0x7fff7fff, v0 -; VI-NEXT:v_xor_b32_e32 v2, 0x80008000, v0 +; VI-NEXT:v_or_b32_e32 v2, 0x80008000, v0 ; VI-NEXT:v_mov_b32_e32 v0, s0 ; VI-NEXT:v_mov_b32_e32 v1, s1 ; VI-NEXT:flat_store_dword v[0:1], v2 @@ -556,8 +555,7 @@ define amdgpu_kernel void @s_fneg_fabs_v2bf16_non_bc_src(ptr addrspace(1) %out, ; GFX9-NEXT:v_lshrrev_b32_e32 v1, 16, v1 ; GFX9-NEXT:v_and_b32_sdwa v2, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1 ; GFX9-NEXT:v_lshl_or_b32 v1, v1, 16, v2 -; GFX9-NEXT:v_and_b32_e32 v1, 0x7fff7fff, v1 -; GFX9-NEXT:v_xor_b32_e32 v1, 0x80008000, v1 +; GFX9-NEXT:v_or_b32_e32 v1, 0x80008000, v1 ; GFX9-NEXT:global_store_dword v0, v1, s[0:1] ; GFX9-NEXT:s_endpgm ; @@ -590,9 +588,9 @@ define amdgpu_kernel void @s_fneg_fabs_v2bf16_non_bc_src(ptr addrspace(1) %out, ; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1) ; GFX11-NEXT:v_lshrrev_b32_e32 v1, 16, v1 ; GFX11-NEXT:v_lshl_or_b32 v0, v1, 16, v0 -; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1) -; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_and_b32 v0, 0x7fff7fff, v0 -; GFX11-NEXT:v_xor_b32_e32 v0, 0x80008000, v0 +; GFX11-NEXT:v_mov_b32_e32 v1, 0 +; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_2) +; GFX11-NEXT:v_or_b32_e32 v0, 0x80008000, v0 ; GFX11-NEXT:s_waitcnt lgkmcnt(0) ; GFX11-NEXT:global_store_b32 v1, v0, s[0:1] ; GFX11-NEXT:s_endpgm @@ -634,8 +632,7 @@ define amdgpu_kernel void @s_fneg_fabs_v2bf16_bc_src(ptr addrspace(1) %out, <2 x ; VI-NEXT:s_mov_b32 flat_scratch_lo, s13 ; VI-NEXT:s_lshr_b32 flat_scratch_hi, s12, 8 ; VI-NEXT:s_waitcnt lgkmcnt(0) -; VI-NEXT:s_and_b32 s2, s2, 0x7fff7fff -; VI-NEXT:s_xor_b32 s2, s2, 0x80008000 +; VI-NEXT:s_or_b32 s2, s2, 0x80008000 ; VI-NEXT:v_mov_b32_e32 v0, s0 ; VI-NEXT:v_mov_b32_e32 v1, s1 ; VI-NEXT:v_mov_b32_e32 v2, s2 @@ -648,8 +645,7 @@ define amdgpu_kernel void @s_fneg_fabs_v2bf16_bc_src(ptr addrspace(1) %out, <2 x ; GFX9-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x0 ; GFX9-NEXT:v_mov_b32_e32 v0, 0 ; GFX9-NEXT:s_waitcnt lgkmcnt(0) -; GFX9-NEXT:s_and_b32 s2, s2, 0x7fff7fff -; GFX9-NEXT:s_xor_b32 s2, s2, 0x80008000 +; GFX9-NEXT:s_or_b32 s2, s2, 0x80008000 ; GFX9-NEXT:v_mov_b32_e32 v1, s2 ; GFX9-NEXT:global_store_dword v0, v1, s[0:1] ; GFX9-NEXT:s_endpgm @@ -660,9 +656,8 @@ define amdgpu_kernel voi
[llvm-branch-commits] [llvm] [AMDGPU] Baseline fneg-fabs.bf16.ll tests. NFC. (PR #142910)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/142910 >From 641fb5022daeca9b71527e18ea2df7982856a105 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Wed, 4 Jun 2025 23:46:28 -0700 Subject: [PATCH] [AMDGPU] Baseline fneg-fabs.bf16.ll tests. NFC. --- llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll | 1223 1 file changed, 1223 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll diff --git a/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll b/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll new file mode 100644 index 0..243469d39cc11 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll @@ -0,0 +1,1223 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2 +; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=kaveri < %s | FileCheck --check-prefixes=CIVI,CI %s +; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=tonga < %s | FileCheck --check-prefixes=CIVI,VI %s +; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx900 < %s | FileCheck --check-prefixes=GFX9 %s +; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -mattr=+real-true16 < %s | FileCheck --check-prefixes=GFX11,GFX11-TRUE16 %s +; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -mattr=-real-true16 < %s | FileCheck --check-prefixes=GFX11,GFX11-FAKE16 %s + +define amdgpu_kernel void @fneg_fabs_fadd_bf16(ptr addrspace(1) %out, bfloat %x, bfloat %y) { +; CI-LABEL: fneg_fabs_fadd_bf16: +; CI: ; %bb.0: +; CI-NEXT:s_load_dword s2, s[8:9], 0x2 +; CI-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x0 +; CI-NEXT:s_add_i32 s12, s12, s17 +; CI-NEXT:s_mov_b32 flat_scratch_lo, s13 +; CI-NEXT:s_lshr_b32 flat_scratch_hi, s12, 8 +; CI-NEXT:s_waitcnt lgkmcnt(0) +; CI-NEXT:s_and_b32 s3, s2, 0x7fff +; CI-NEXT:s_lshl_b32 s3, s3, 16 +; CI-NEXT:s_and_b32 s2, s2, 0x +; CI-NEXT:v_mov_b32_e32 v0, s3 +; CI-NEXT:v_sub_f32_e32 v0, s2, v0 +; CI-NEXT:v_lshrrev_b32_e32 v2, 16, v0 +; CI-NEXT:v_mov_b32_e32 v0, s0 +; CI-NEXT:v_mov_b32_e32 v1, s1 +; CI-NEXT:flat_store_short v[0:1], v2 +; CI-NEXT:s_endpgm +; +; VI-LABEL: fneg_fabs_fadd_bf16: +; VI: ; %bb.0: +; VI-NEXT:s_load_dword s2, s[8:9], 0x8 +; VI-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x0 +; VI-NEXT:s_add_i32 s12, s12, s17 +; VI-NEXT:s_mov_b32 flat_scratch_lo, s13 +; VI-NEXT:s_lshr_b32 flat_scratch_hi, s12, 8 +; VI-NEXT:s_waitcnt lgkmcnt(0) +; VI-NEXT:s_and_b32 s3, s2, 0x7fff +; VI-NEXT:s_lshl_b32 s3, s3, 16 +; VI-NEXT:s_and_b32 s2, s2, 0x +; VI-NEXT:v_mov_b32_e32 v0, s3 +; VI-NEXT:v_sub_f32_e32 v0, s2, v0 +; VI-NEXT:v_bfe_u32 v1, v0, 16, 1 +; VI-NEXT:v_add_u32_e32 v1, vcc, v1, v0 +; VI-NEXT:v_add_u32_e32 v1, vcc, 0x7fff, v1 +; VI-NEXT:v_or_b32_e32 v2, 0x40, v0 +; VI-NEXT:v_cmp_u_f32_e32 vcc, v0, v0 +; VI-NEXT:v_cndmask_b32_e32 v0, v1, v2, vcc +; VI-NEXT:v_lshrrev_b32_e32 v2, 16, v0 +; VI-NEXT:v_mov_b32_e32 v0, s0 +; VI-NEXT:v_mov_b32_e32 v1, s1 +; VI-NEXT:flat_store_short v[0:1], v2 +; VI-NEXT:s_endpgm +; +; GFX9-LABEL: fneg_fabs_fadd_bf16: +; GFX9: ; %bb.0: +; GFX9-NEXT:s_load_dword s2, s[8:9], 0x8 +; GFX9-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x0 +; GFX9-NEXT:v_mov_b32_e32 v0, 0 +; GFX9-NEXT:s_waitcnt lgkmcnt(0) +; GFX9-NEXT:s_and_b32 s3, s2, 0x7fff +; GFX9-NEXT:s_lshl_b32 s3, s3, 16 +; GFX9-NEXT:s_and_b32 s2, s2, 0x +; GFX9-NEXT:v_mov_b32_e32 v1, s3 +; GFX9-NEXT:v_sub_f32_e32 v1, s2, v1 +; GFX9-NEXT:v_bfe_u32 v2, v1, 16, 1 +; GFX9-NEXT:v_add_u32_e32 v2, v2, v1 +; GFX9-NEXT:v_or_b32_e32 v3, 0x40, v1 +; GFX9-NEXT:v_add_u32_e32 v2, 0x7fff, v2 +; GFX9-NEXT:v_cmp_u_f32_e32 vcc, v1, v1 +; GFX9-NEXT:v_cndmask_b32_e32 v1, v2, v3, vcc +; GFX9-NEXT:global_store_short_d16_hi v0, v1, s[0:1] +; GFX9-NEXT:s_endpgm +; +; GFX11-TRUE16-LABEL: fneg_fabs_fadd_bf16: +; GFX11-TRUE16: ; %bb.0: +; GFX11-TRUE16-NEXT:s_load_b32 s0, s[4:5], 0x8 +; GFX11-TRUE16-NEXT:s_waitcnt lgkmcnt(0) +; GFX11-TRUE16-NEXT:s_mov_b32 s1, s0 +; GFX11-TRUE16-NEXT:s_and_b32 s0, s0, 0x +; GFX11-TRUE16-NEXT:s_and_b32 s1, s1, 0x7fff +; GFX11-TRUE16-NEXT:s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1) +; GFX11-TRUE16-NEXT:s_lshl_b32 s1, s1, 16 +; GFX11-TRUE16-NEXT:v_sub_f32_e64 v0, s0, s1 +; GFX11-TRUE16-NEXT:s_load_b64 s[0:1], s[4:5], 0x0 +; GFX11-TRUE16-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(VALU_DEP_3) +; GFX11-TRUE16-NEXT:v_bfe_u32 v1, v0, 16, 1 +; GFX11-TRUE16-NEXT:v_or_b32_e32 v2, 0x40, v0 +; GFX11-TRUE16-NEXT:v_cmp_u_f32_e32 vcc_lo, v0, v0 +; GFX11-TRUE16-NEXT:v_add_nc_u32_e32 v1, v1, v0 +; GFX11-TRUE16-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1) +; GFX11-TRUE16-NEXT:v_add_nc_u32_e32 v1, 0x7fff, v1 +; GFX11-TRUE16-NEXT:v_dual_mov_b32
[llvm-branch-commits] [llvm] [AMDGPU] Baseline fneg-fabs.bf16.ll tests. NFC. (PR #142910)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/142910 >From 641fb5022daeca9b71527e18ea2df7982856a105 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Wed, 4 Jun 2025 23:46:28 -0700 Subject: [PATCH] [AMDGPU] Baseline fneg-fabs.bf16.ll tests. NFC. --- llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll | 1223 1 file changed, 1223 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll diff --git a/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll b/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll new file mode 100644 index 0..243469d39cc11 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll @@ -0,0 +1,1223 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2 +; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=kaveri < %s | FileCheck --check-prefixes=CIVI,CI %s +; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=tonga < %s | FileCheck --check-prefixes=CIVI,VI %s +; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx900 < %s | FileCheck --check-prefixes=GFX9 %s +; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -mattr=+real-true16 < %s | FileCheck --check-prefixes=GFX11,GFX11-TRUE16 %s +; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -mattr=-real-true16 < %s | FileCheck --check-prefixes=GFX11,GFX11-FAKE16 %s + +define amdgpu_kernel void @fneg_fabs_fadd_bf16(ptr addrspace(1) %out, bfloat %x, bfloat %y) { +; CI-LABEL: fneg_fabs_fadd_bf16: +; CI: ; %bb.0: +; CI-NEXT:s_load_dword s2, s[8:9], 0x2 +; CI-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x0 +; CI-NEXT:s_add_i32 s12, s12, s17 +; CI-NEXT:s_mov_b32 flat_scratch_lo, s13 +; CI-NEXT:s_lshr_b32 flat_scratch_hi, s12, 8 +; CI-NEXT:s_waitcnt lgkmcnt(0) +; CI-NEXT:s_and_b32 s3, s2, 0x7fff +; CI-NEXT:s_lshl_b32 s3, s3, 16 +; CI-NEXT:s_and_b32 s2, s2, 0x +; CI-NEXT:v_mov_b32_e32 v0, s3 +; CI-NEXT:v_sub_f32_e32 v0, s2, v0 +; CI-NEXT:v_lshrrev_b32_e32 v2, 16, v0 +; CI-NEXT:v_mov_b32_e32 v0, s0 +; CI-NEXT:v_mov_b32_e32 v1, s1 +; CI-NEXT:flat_store_short v[0:1], v2 +; CI-NEXT:s_endpgm +; +; VI-LABEL: fneg_fabs_fadd_bf16: +; VI: ; %bb.0: +; VI-NEXT:s_load_dword s2, s[8:9], 0x8 +; VI-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x0 +; VI-NEXT:s_add_i32 s12, s12, s17 +; VI-NEXT:s_mov_b32 flat_scratch_lo, s13 +; VI-NEXT:s_lshr_b32 flat_scratch_hi, s12, 8 +; VI-NEXT:s_waitcnt lgkmcnt(0) +; VI-NEXT:s_and_b32 s3, s2, 0x7fff +; VI-NEXT:s_lshl_b32 s3, s3, 16 +; VI-NEXT:s_and_b32 s2, s2, 0x +; VI-NEXT:v_mov_b32_e32 v0, s3 +; VI-NEXT:v_sub_f32_e32 v0, s2, v0 +; VI-NEXT:v_bfe_u32 v1, v0, 16, 1 +; VI-NEXT:v_add_u32_e32 v1, vcc, v1, v0 +; VI-NEXT:v_add_u32_e32 v1, vcc, 0x7fff, v1 +; VI-NEXT:v_or_b32_e32 v2, 0x40, v0 +; VI-NEXT:v_cmp_u_f32_e32 vcc, v0, v0 +; VI-NEXT:v_cndmask_b32_e32 v0, v1, v2, vcc +; VI-NEXT:v_lshrrev_b32_e32 v2, 16, v0 +; VI-NEXT:v_mov_b32_e32 v0, s0 +; VI-NEXT:v_mov_b32_e32 v1, s1 +; VI-NEXT:flat_store_short v[0:1], v2 +; VI-NEXT:s_endpgm +; +; GFX9-LABEL: fneg_fabs_fadd_bf16: +; GFX9: ; %bb.0: +; GFX9-NEXT:s_load_dword s2, s[8:9], 0x8 +; GFX9-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x0 +; GFX9-NEXT:v_mov_b32_e32 v0, 0 +; GFX9-NEXT:s_waitcnt lgkmcnt(0) +; GFX9-NEXT:s_and_b32 s3, s2, 0x7fff +; GFX9-NEXT:s_lshl_b32 s3, s3, 16 +; GFX9-NEXT:s_and_b32 s2, s2, 0x +; GFX9-NEXT:v_mov_b32_e32 v1, s3 +; GFX9-NEXT:v_sub_f32_e32 v1, s2, v1 +; GFX9-NEXT:v_bfe_u32 v2, v1, 16, 1 +; GFX9-NEXT:v_add_u32_e32 v2, v2, v1 +; GFX9-NEXT:v_or_b32_e32 v3, 0x40, v1 +; GFX9-NEXT:v_add_u32_e32 v2, 0x7fff, v2 +; GFX9-NEXT:v_cmp_u_f32_e32 vcc, v1, v1 +; GFX9-NEXT:v_cndmask_b32_e32 v1, v2, v3, vcc +; GFX9-NEXT:global_store_short_d16_hi v0, v1, s[0:1] +; GFX9-NEXT:s_endpgm +; +; GFX11-TRUE16-LABEL: fneg_fabs_fadd_bf16: +; GFX11-TRUE16: ; %bb.0: +; GFX11-TRUE16-NEXT:s_load_b32 s0, s[4:5], 0x8 +; GFX11-TRUE16-NEXT:s_waitcnt lgkmcnt(0) +; GFX11-TRUE16-NEXT:s_mov_b32 s1, s0 +; GFX11-TRUE16-NEXT:s_and_b32 s0, s0, 0x +; GFX11-TRUE16-NEXT:s_and_b32 s1, s1, 0x7fff +; GFX11-TRUE16-NEXT:s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1) +; GFX11-TRUE16-NEXT:s_lshl_b32 s1, s1, 16 +; GFX11-TRUE16-NEXT:v_sub_f32_e64 v0, s0, s1 +; GFX11-TRUE16-NEXT:s_load_b64 s[0:1], s[4:5], 0x0 +; GFX11-TRUE16-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(VALU_DEP_3) +; GFX11-TRUE16-NEXT:v_bfe_u32 v1, v0, 16, 1 +; GFX11-TRUE16-NEXT:v_or_b32_e32 v2, 0x40, v0 +; GFX11-TRUE16-NEXT:v_cmp_u_f32_e32 vcc_lo, v0, v0 +; GFX11-TRUE16-NEXT:v_add_nc_u32_e32 v1, v1, v0 +; GFX11-TRUE16-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1) +; GFX11-TRUE16-NEXT:v_add_nc_u32_e32 v1, 0x7fff, v1 +; GFX11-TRUE16-NEXT:v_dual_mov_b32
[llvm-branch-commits] [llvm] [AMDGPU] Patterns for <2 x bfloat> fneg (fabs) (PR #142911)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/142911 >From c8524591999f495dd86261daecc44071737a227b Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Wed, 4 Jun 2025 23:49:43 -0700 Subject: [PATCH] [AMDGPU] Patterns for <2 x bfloat> fneg (fabs) --- llvm/lib/Target/AMDGPU/SIInstructions.td | 11 +++ llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll | 38 +- 2 files changed, 21 insertions(+), 28 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td index a0285e3512a08..360fd05cb3d96 100644 --- a/llvm/lib/Target/AMDGPU/SIInstructions.td +++ b/llvm/lib/Target/AMDGPU/SIInstructions.td @@ -1840,22 +1840,21 @@ def : GCNPat < (UniformUnaryFrag (v2fp16vt SReg_32:$src)), (S_AND_B32 SReg_32:$src, (S_MOV_B32 (i32 0x7fff7fff))) >; -} // This is really (fneg (fabs v2f16:$src)) // // fabs is not reported as free because there is modifier for it in // VOP3P instructions, so it is turned into the bit op. def : GCNPat < - (UniformUnaryFrag (v2f16 (bitconvert (and_oneuse (i32 SReg_32:$src), 0x7fff7fff, + (UniformUnaryFrag (v2fp16vt (bitconvert (and_oneuse (i32 SReg_32:$src), 0x7fff7fff, (S_OR_B32 SReg_32:$src, (S_MOV_B32 (i32 0x80008000))) // Set sign bit >; def : GCNPat < - (UniformUnaryFrag (v2f16 (fabs SReg_32:$src))), + (UniformUnaryFrag (v2fp16vt (fabs SReg_32:$src))), (S_OR_B32 SReg_32:$src, (S_MOV_B32 (i32 0x80008000))) // Set sign bit >; - +} // COPY_TO_REGCLASS is needed to avoid using SCC from S_XOR_B32 instead // of the real value. @@ -1986,12 +1985,12 @@ def : GCNPat < (fabs (v2fp16vt VGPR_32:$src)), (V_AND_B32_e64 (S_MOV_B32 (i32 0x7fff7fff)), VGPR_32:$src) >; -} def : GCNPat < - (fneg (v2f16 (fabs VGPR_32:$src))), + (fneg (v2fp16vt (fabs VGPR_32:$src))), (V_OR_B32_e64 (S_MOV_B32 (i32 0x80008000)), VGPR_32:$src) >; +} def : GCNPat < (fabs (f64 VReg_64:$src)), diff --git a/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll b/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll index 243469d39cc11..d189b6d4c1e83 100644 --- a/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll +++ b/llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll @@ -523,8 +523,7 @@ define amdgpu_kernel void @s_fneg_fabs_v2bf16_non_bc_src(ptr addrspace(1) %out, ; VI-NEXT:v_cndmask_b32_e32 v1, v2, v3, vcc ; VI-NEXT:v_lshrrev_b32_e32 v1, 16, v1 ; VI-NEXT:v_alignbit_b32 v0, v1, v0, 16 -; VI-NEXT:v_and_b32_e32 v0, 0x7fff7fff, v0 -; VI-NEXT:v_xor_b32_e32 v2, 0x80008000, v0 +; VI-NEXT:v_or_b32_e32 v2, 0x80008000, v0 ; VI-NEXT:v_mov_b32_e32 v0, s0 ; VI-NEXT:v_mov_b32_e32 v1, s1 ; VI-NEXT:flat_store_dword v[0:1], v2 @@ -556,8 +555,7 @@ define amdgpu_kernel void @s_fneg_fabs_v2bf16_non_bc_src(ptr addrspace(1) %out, ; GFX9-NEXT:v_lshrrev_b32_e32 v1, 16, v1 ; GFX9-NEXT:v_and_b32_sdwa v2, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1 ; GFX9-NEXT:v_lshl_or_b32 v1, v1, 16, v2 -; GFX9-NEXT:v_and_b32_e32 v1, 0x7fff7fff, v1 -; GFX9-NEXT:v_xor_b32_e32 v1, 0x80008000, v1 +; GFX9-NEXT:v_or_b32_e32 v1, 0x80008000, v1 ; GFX9-NEXT:global_store_dword v0, v1, s[0:1] ; GFX9-NEXT:s_endpgm ; @@ -590,9 +588,9 @@ define amdgpu_kernel void @s_fneg_fabs_v2bf16_non_bc_src(ptr addrspace(1) %out, ; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1) ; GFX11-NEXT:v_lshrrev_b32_e32 v1, 16, v1 ; GFX11-NEXT:v_lshl_or_b32 v0, v1, 16, v0 -; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1) -; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_and_b32 v0, 0x7fff7fff, v0 -; GFX11-NEXT:v_xor_b32_e32 v0, 0x80008000, v0 +; GFX11-NEXT:v_mov_b32_e32 v1, 0 +; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_2) +; GFX11-NEXT:v_or_b32_e32 v0, 0x80008000, v0 ; GFX11-NEXT:s_waitcnt lgkmcnt(0) ; GFX11-NEXT:global_store_b32 v1, v0, s[0:1] ; GFX11-NEXT:s_endpgm @@ -634,8 +632,7 @@ define amdgpu_kernel void @s_fneg_fabs_v2bf16_bc_src(ptr addrspace(1) %out, <2 x ; VI-NEXT:s_mov_b32 flat_scratch_lo, s13 ; VI-NEXT:s_lshr_b32 flat_scratch_hi, s12, 8 ; VI-NEXT:s_waitcnt lgkmcnt(0) -; VI-NEXT:s_and_b32 s2, s2, 0x7fff7fff -; VI-NEXT:s_xor_b32 s2, s2, 0x80008000 +; VI-NEXT:s_or_b32 s2, s2, 0x80008000 ; VI-NEXT:v_mov_b32_e32 v0, s0 ; VI-NEXT:v_mov_b32_e32 v1, s1 ; VI-NEXT:v_mov_b32_e32 v2, s2 @@ -648,8 +645,7 @@ define amdgpu_kernel void @s_fneg_fabs_v2bf16_bc_src(ptr addrspace(1) %out, <2 x ; GFX9-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x0 ; GFX9-NEXT:v_mov_b32_e32 v0, 0 ; GFX9-NEXT:s_waitcnt lgkmcnt(0) -; GFX9-NEXT:s_and_b32 s2, s2, 0x7fff7fff -; GFX9-NEXT:s_xor_b32 s2, s2, 0x80008000 +; GFX9-NEXT:s_or_b32 s2, s2, 0x80008000 ; GFX9-NEXT:v_mov_b32_e32 v1, s2 ; GFX9-NEXT:global_store_dword v0, v1, s[0:1] ; GFX9-NEXT:s_endpgm @@ -660,9 +656,8 @@ define amdgpu_kernel voi
[llvm-branch-commits] [llvm] [CI] Migrate to runtimes build (PR #142696)
Endilll wrote: > > It doesn't relate to multilib, I understand that, but does it mean we're > > going to test more than one runtime or that we'll test the same runtime > > multiple ways? > > It's runtimes that we test in multiple ways (`-std=c++26` and > `enable_modules=clang` currently). I felt multiconfig covered that and > couldn't really think of a better name. If anyone else has better ideas I'd > be happy to change it up. Multiconfig in this context has some strong associations with CMake's Ninja Multi-Config generator for me. My suggestion is `needs_reconfig`. https://github.com/llvm/llvm-project/pull/142696 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)
@@ -137,7 +138,123 @@ class AMDGPURegBankLegalizeCombiner { return {MatchMI, MatchMI->getOperand(1).getReg()}; } + std::tuple tryMatchRALFromUnmerge(Register Src) { +auto *ReadAnyLane = MRI.getVRegDef(Src); Pierre-vh wrote: ```suggestion MachineInstr *ReadAnyLane = MRI.getVRegDef(Src); ``` I think we generally use `auto` only if the type is already in the RHS https://github.com/llvm/llvm-project/pull/142789 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)
@@ -137,7 +138,123 @@ class AMDGPURegBankLegalizeCombiner { return {MatchMI, MatchMI->getOperand(1).getReg()}; } + std::tuple tryMatchRALFromUnmerge(Register Src) { Pierre-vh wrote: ```suggestion std::pair tryMatchRALFromUnmerge(Register Src) { ``` https://github.com/llvm/llvm-project/pull/142789 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)
@@ -137,7 +138,123 @@ class AMDGPURegBankLegalizeCombiner { return {MatchMI, MatchMI->getOperand(1).getReg()}; } + std::tuple tryMatchRALFromUnmerge(Register Src) { +auto *ReadAnyLane = MRI.getVRegDef(Src); +if (ReadAnyLane->getOpcode() == AMDGPU::G_AMDGPU_READANYLANE) { + Register RALSrc = ReadAnyLane->getOperand(1).getReg(); + auto *UnMerge = getOpcodeDef(RALSrc, MRI); + if (UnMerge) +return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)}; +} +return {nullptr, -1}; + } + + Register getReadAnyLaneSrc(Register Src) { +// Src = G_AMDGPU_READANYLANE RALSrc +auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + +// LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc +// LoSgpr = G_AMDGPU_READANYLANE LoVgpr +// HiSgpr = G_AMDGPU_READANYLANE HiVgpr +// Src G_MERGE_VALUES LoSgpr, HiSgpr +auto *Merge = getOpcodeDef(Src, MRI); +if (Merge) { + unsigned NumElts = Merge->getNumSources(); + auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0)); + if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0) +return {}; + + // check if all elements are from same unmerge and there is no shuffling + for (unsigned i = 1; i < NumElts; ++i) { +auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i)); +if (UnmergeI != Unmerge || (unsigned)IdxI != i) + return {}; + } + return Unmerge->getSourceReg(); +} + +// ..., VgprI, ... = G_UNMERGE_VALUES VgprLarge +// SgprI = G_AMDGPU_READANYLANE VgprI +// SgprLarge G_MERGE_VALUES ..., SgprI, ... +// ..., Src, ... = G_UNMERGE_VALUES SgprLarge +auto *UnMerge = getOpcodeDef(Src, MRI); +if (UnMerge) { + int Idx = UnMerge->findRegisterDefOperandIdx(Src, nullptr); + auto *Merge = getOpcodeDef(UnMerge->getSourceReg(), MRI); + if (Merge) { +auto [RAL, RALSrc] = +tryMatch(Merge->getSourceReg(Idx), AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + } +} + +return {}; + } + + bool tryEliminateReadAnyLane(MachineInstr &Copy) { +Register Dst = Copy.getOperand(0).getReg(); +Register Src = Copy.getOperand(1).getReg(); +if (!Src.isVirtual()) + return false; + +Register RALDst = Src; +MachineInstr &SrcMI = *MRI.getVRegDef(Src); +if (SrcMI.getOpcode() == AMDGPU::G_BITCAST) { + RALDst = SrcMI.getOperand(1).getReg(); +} Pierre-vh wrote: ```suggestion if (SrcMI.getOpcode() == AMDGPU::G_BITCAST) RALDst = SrcMI.getOperand(1).getReg(); ``` nit: can we have other opcodes than bitcast and that'd matter, like inreg extensions, assert exts ? It feels like we should have a helper for this somewhere https://github.com/llvm/llvm-project/pull/142789 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)
@@ -137,7 +138,123 @@ class AMDGPURegBankLegalizeCombiner { return {MatchMI, MatchMI->getOperand(1).getReg()}; } + std::tuple tryMatchRALFromUnmerge(Register Src) { +auto *ReadAnyLane = MRI.getVRegDef(Src); +if (ReadAnyLane->getOpcode() == AMDGPU::G_AMDGPU_READANYLANE) { + Register RALSrc = ReadAnyLane->getOperand(1).getReg(); + auto *UnMerge = getOpcodeDef(RALSrc, MRI); + if (UnMerge) Pierre-vh wrote: ```suggestion if (auto *UnMerge = getOpcodeDef(RALSrc, MRI)) ``` https://github.com/llvm/llvm-project/pull/142789 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
@@ -117,45 +117,73 @@ static LLT getReadAnyLaneSplitTy(LLT Ty) { return LLT::scalar(32); } -static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc, - const RegisterBankInfo &RBI); +typedef std::functionhttps://github.com/llvm/llvm-project/pull/142790 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
@@ -117,45 +117,73 @@ static LLT getReadAnyLaneSplitTy(LLT Ty) { return LLT::scalar(32); } -static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc, - const RegisterBankInfo &RBI); +typedef std::function +ReadLaneFnTy; + +static Register buildReadLane(MachineIRBuilder &, Register, + const RegisterBankInfo &, ReadLaneFnTy); static void unmergeReadAnyLane(MachineIRBuilder &B, SmallVectorImpl &SgprDstParts, LLT UnmergeTy, Register VgprSrc, - const RegisterBankInfo &RBI) { + const RegisterBankInfo &RBI, + ReadLaneFnTy BuildRL) { const RegisterBank *VgprRB = &RBI.getRegBank(AMDGPU::VGPRRegBankID); auto Unmerge = B.buildUnmerge({VgprRB, UnmergeTy}, VgprSrc); for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) { -SgprDstParts.push_back(buildReadAnyLane(B, Unmerge.getReg(i), RBI)); +SgprDstParts.push_back(buildReadLane(B, Unmerge.getReg(i), RBI, BuildRL)); } } -static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc, - const RegisterBankInfo &RBI) { +static Register buildReadLane(MachineIRBuilder &B, Register VgprSrc, + const RegisterBankInfo &RBI, + ReadLaneFnTy BuildRL) { LLT Ty = B.getMRI()->getType(VgprSrc); const RegisterBank *SgprRB = &RBI.getRegBank(AMDGPU::SGPRRegBankID); if (Ty.getSizeInBits() == 32) { -return B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {{SgprRB, Ty}}, {VgprSrc}) -.getReg(0); +Register SgprDst = B.getMRI()->createVirtualRegister({SgprRB, Ty}); +return BuildRL(B, SgprDst, VgprSrc).getReg(0); } SmallVector SgprDstParts; - unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI); + unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI, + BuildRL); return B.buildMergeLikeInstr({SgprRB, Ty}, SgprDstParts).getReg(0); } -void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst, - Register VgprSrc, const RegisterBankInfo &RBI) { +static void buildReadLane(MachineIRBuilder &B, Register SgprDst, + Register VgprSrc, const RegisterBankInfo &RBI, + ReadLaneFnTy BuildReadLane) { LLT Ty = B.getMRI()->getType(VgprSrc); if (Ty.getSizeInBits() == 32) { -B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {SgprDst}, {VgprSrc}); +BuildReadLane(B, SgprDst, VgprSrc); return; } SmallVector SgprDstParts; - unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI); + unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI, + BuildReadLane); B.buildMergeLikeInstr(SgprDst, SgprDstParts).getReg(0); } + +void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst, + Register VgprSrc, const RegisterBankInfo &RBI) { + return buildReadLane( + B, SgprDst, VgprSrc, RBI, + [](MachineIRBuilder &B, Register SgprDst, Register VgprSrc) { +return B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {SgprDst}, {VgprSrc}); + }); +} + +void AMDGPU::buildReadFirstLane(MachineIRBuilder &B, Register SgprDst, +Register VgprSrc, const RegisterBankInfo &RBI) { + return buildReadLane( + B, SgprDst, VgprSrc, RBI, + [](MachineIRBuilder &B, Register SgprDst, Register VgprSrc) { +return B.buildIntrinsic(Intrinsic::amdgcn_readfirstlane, SgprDst) Pierre-vh wrote: Not for this PR, but we should really have an opcode for this too instead of having one being an intrinsic and one being a generic opcode https://github.com/llvm/llvm-project/pull/142790 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
@@ -57,6 +57,226 @@ void RegBankLegalizeHelper::findRuleAndApplyMapping(MachineInstr &MI) { lower(MI, Mapping, WaterfallSgprs); } +bool RegBankLegalizeHelper::executeInWaterfallLoop( +MachineIRBuilder &B, iterator_range Range, +SmallSet &SGPROperandRegs) { + // Track use registers which have already been expanded with a readfirstlane + // sequence. This may have multiple uses if moving a sequence. + DenseMap WaterfalledRegMap; + + MachineBasicBlock &MBB = B.getMBB(); + MachineFunction &MF = B.getMF(); + + const SIRegisterInfo *TRI = ST.getRegisterInfo(); + const TargetRegisterClass *WaveRC = TRI->getWaveMaskRegClass(); + unsigned MovExecOpc, MovExecTermOpc, XorTermOpc, AndSaveExecOpc, ExecReg; + if (ST.isWave32()) { +MovExecOpc = AMDGPU::S_MOV_B32; +MovExecTermOpc = AMDGPU::S_MOV_B32_term; +XorTermOpc = AMDGPU::S_XOR_B32_term; +AndSaveExecOpc = AMDGPU::S_AND_SAVEEXEC_B32; +ExecReg = AMDGPU::EXEC_LO; + } else { +MovExecOpc = AMDGPU::S_MOV_B64; +MovExecTermOpc = AMDGPU::S_MOV_B64_term; +XorTermOpc = AMDGPU::S_XOR_B64_term; +AndSaveExecOpc = AMDGPU::S_AND_SAVEEXEC_B64; +ExecReg = AMDGPU::EXEC; + } + +#ifndef NDEBUG + const int OrigRangeSize = std::distance(Range.begin(), Range.end()); +#endif + + MachineRegisterInfo &MRI = *B.getMRI(); + Register SaveExecReg = MRI.createVirtualRegister(WaveRC); + Register InitSaveExecReg = MRI.createVirtualRegister(WaveRC); + + // Don't bother using generic instructions/registers for the exec mask. + B.buildInstr(TargetOpcode::IMPLICIT_DEF).addDef(InitSaveExecReg); + + Register SavedExec = MRI.createVirtualRegister(WaveRC); + + // To insert the loop we need to split the block. Move everything before + // this point to a new block, and insert a new empty block before this + // instruction. + MachineBasicBlock *LoopBB = MF.CreateMachineBasicBlock(); + MachineBasicBlock *BodyBB = MF.CreateMachineBasicBlock(); + MachineBasicBlock *RestoreExecBB = MF.CreateMachineBasicBlock(); + MachineBasicBlock *RemainderBB = MF.CreateMachineBasicBlock(); + MachineFunction::iterator MBBI(MBB); + ++MBBI; + MF.insert(MBBI, LoopBB); + MF.insert(MBBI, BodyBB); + MF.insert(MBBI, RestoreExecBB); + MF.insert(MBBI, RemainderBB); + + LoopBB->addSuccessor(BodyBB); + BodyBB->addSuccessor(RestoreExecBB); + BodyBB->addSuccessor(LoopBB); + + // Move the rest of the block into a new block. + RemainderBB->transferSuccessorsAndUpdatePHIs(&MBB); + RemainderBB->splice(RemainderBB->begin(), &MBB, Range.end(), MBB.end()); + + MBB.addSuccessor(LoopBB); + RestoreExecBB->addSuccessor(RemainderBB); + + B.setInsertPt(*LoopBB, LoopBB->end()); + + // +-MBB:+ + // | ... | + // | %0 = G_INST_1 | + // | %Dst = MI %Vgpr | + // | %1 = G_INST_2 | + // | ... | + // +-+ + // -> + // +-MBB---+ + // | ... | + // | %0 = G_INST_1 | + // | %SaveExecReg = S_MOV_B32 $exec_lo | + // +|--+ + // | /--| + // VV | + // +-LoopBB---+ | + // | %CurrentLaneReg:sgpr(s32) = READFIRSTLANE %Vgpr | | + // | instead of executing for each lane, see if other lanes had | | + // | same value for %Vgpr and execute for them also.| | + // | %CondReg:vcc(s1) = G_ICMP eq %CurrentLaneReg, %Vgpr | | + // | %CondRegLM:sreg_32 = ballot %CondReg // copy vcc to sreg32 lane mask | | + // | %SavedExec = S_AND_SAVEEXEC_B32 %CondRegLM | | + // | exec is active for lanes with the same "CurrentLane value" in Vgpr | | + // +|-+ | + // V | + // +-BodyBB+ | + // | %Dst = MI %CurrentLaneReg:sgpr(s32) | | + // | executed only for active lanes and written to Dst | | + // | $exec = S_XOR_B32 $exec, %SavedExec | | + // | set active lanes to 0 in SavedExec, lanes that did not write to | | + // | Dst yet, and set this as new exec (for READFIRSTLANE and ICMP) | | + // | SI_WATERFALL_LOOP LoopBB |-| + // +|--+ + // V + // +-RestoreExecBB--+ + // | $exec_lo = S_MOV_B32_term %SaveExecReg | + // +|---+ + // V + // +-RemainderBB:--+ + //
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
@@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py -; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -stop-after=regbankselect -regbankselect-fast -o - %s | FileCheck %s -; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -stop-after=regbankselect -regbankselect-greedy -o - %s | FileCheck %s +; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-mesa-mesa3d -stop-after=amdgpu-regbanklegalize -regbankselect-fast -o - %s | FileCheck %s Pierre-vh wrote: @arsenm Is it fine to move tests entirely to this new RBSelect, or should we keep coverage for both until the old RB is removed? https://github.com/llvm/llvm-project/pull/142790 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
@@ -165,6 +165,8 @@ enum RegBankLLTMappingApplyID { Sgpr32Trunc, // Src only modifiers: waterfalls, extends + Sgpr32_W, + SgprV4S32_W, Pierre-vh wrote: Can you add a trailing comment or rename this ? The `_W` suffix is not immediately clear to me https://github.com/llvm/llvm-project/pull/142790 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
@@ -57,6 +57,226 @@ void RegBankLegalizeHelper::findRuleAndApplyMapping(MachineInstr &MI) { lower(MI, Mapping, WaterfallSgprs); } +bool RegBankLegalizeHelper::executeInWaterfallLoop( +MachineIRBuilder &B, iterator_range Range, +SmallSet &SGPROperandRegs) { + // Track use registers which have already been expanded with a readfirstlane + // sequence. This may have multiple uses if moving a sequence. + DenseMap WaterfalledRegMap; + + MachineBasicBlock &MBB = B.getMBB(); + MachineFunction &MF = B.getMF(); + + const SIRegisterInfo *TRI = ST.getRegisterInfo(); + const TargetRegisterClass *WaveRC = TRI->getWaveMaskRegClass(); + unsigned MovExecOpc, MovExecTermOpc, XorTermOpc, AndSaveExecOpc, ExecReg; + if (ST.isWave32()) { Pierre-vh wrote: nit: I think those could go in the class directly so this isn't repeated everytime no ? The class is instantiated per function anyway https://github.com/llvm/llvm-project/pull/142790 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
@@ -894,6 +1121,15 @@ void RegBankLegalizeHelper::applyMappingSrc( } break; } +// sgpr waterfall, scalars and vectors +case Sgpr32_W: +case SgprV4S32_W: { + assert(Ty == getTyFromID(MethodIDs[i])); + if (RB != SgprRB) { +SgprWaterfallOperandRegs.insert(Reg); + } Pierre-vh wrote: ```suggestion if (RB != SgprRB) SgprWaterfallOperandRegs.insert(Reg); ``` https://github.com/llvm/llvm-project/pull/142790 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CI] Migrate to runtimes build (PR #142696)
https://github.com/boomanaiden154 updated https://github.com/llvm/llvm-project/pull/142696 >From 360e723b51ee201603f72b56859cd7c6d6faec24 Mon Sep 17 00:00:00 2001 From: Aiden Grossman Date: Thu, 5 Jun 2025 06:51:37 + Subject: [PATCH 1/2] feedback Created using spr 1.3.4 --- .ci/compute_projects.py | 17 + 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/.ci/compute_projects.py b/.ci/compute_projects.py index b12b729eadd3f..8134e1e2c29fb 100644 --- a/.ci/compute_projects.py +++ b/.ci/compute_projects.py @@ -145,22 +145,15 @@ def _add_dependencies(projects: Set[str], runtimes: Set[str]) -> Set[str]: def _exclude_projects(current_projects: Set[str], platform: str) -> Set[str]: -new_project_set = set(current_projects) if platform == "Linux": -for to_exclude in EXCLUDE_LINUX: -if to_exclude in new_project_set: -new_project_set.remove(to_exclude) +to_exclude = EXCLUDE_LINUX elif platform == "Windows": -for to_exclude in EXCLUDE_WINDOWS: -if to_exclude in new_project_set: -new_project_set.remove(to_exclude) +to_exclude = EXCLUDE_WINDOWS elif platform == "Darwin": -for to_exclude in EXCLUDE_MAC: -if to_exclude in new_project_set: -new_project_set.remove(to_exclude) +to_exclude = EXCLUDE_MAC else: -raise ValueError("Unexpected platform.") -return new_project_set +raise ValueError(f"Unexpected platform: {platform}") +return current_projects.difference(to_exclude) def _compute_projects_to_test(modified_projects: Set[str], platform: str) -> Set[str]: >From 26a48b3ba70c829862788335f4b5b610dfd5dd3a Mon Sep 17 00:00:00 2001 From: Aiden Grossman Date: Thu, 5 Jun 2025 08:55:00 + Subject: [PATCH 2/2] feedback Created using spr 1.3.4 --- .ci/compute_projects.py | 20 ++-- .ci/compute_projects_test.py| 32 .ci/monolithic-linux.sh | 8 .github/workflows/premerge.yaml | 4 ++-- 4 files changed, 32 insertions(+), 32 deletions(-) diff --git a/.ci/compute_projects.py b/.ci/compute_projects.py index 8134e1e2c29fb..50a64cb15a937 100644 --- a/.ci/compute_projects.py +++ b/.ci/compute_projects.py @@ -66,7 +66,7 @@ DEPENDENT_RUNTIMES_TO_TEST = { "clang": {"compiler-rt"}, } -DEPENDENT_RUNTIMES_TO_TEST_MULTICONFIG = { +DEPENDENT_RUNTIMES_TO_TEST_NEEDS_RECONFIG = { "llvm": {"libcxx", "libcxxabi", "libunwind"}, "clang": {"libcxx", "libcxxabi", "libunwind"}, ".ci": {"libcxx", "libcxxabi", "libunwind"}, @@ -201,15 +201,15 @@ def _compute_runtimes_to_test(modified_projects: Set[str], platform: str) -> Set return _exclude_projects(runtimes_to_test, platform) -def _compute_runtimes_to_test_multiconfig( +def _compute_runtimes_to_test_needs_reconfig( modified_projects: Set[str], platform: str ) -> Set[str]: runtimes_to_test = set() for modified_project in modified_projects: -if modified_project not in DEPENDENT_RUNTIMES_TO_TEST_MULTICONFIG: +if modified_project not in DEPENDENT_RUNTIMES_TO_TEST_NEEDS_RECONFIG: continue runtimes_to_test.update( -DEPENDENT_RUNTIMES_TO_TEST_MULTICONFIG[modified_project] +DEPENDENT_RUNTIMES_TO_TEST_NEEDS_RECONFIG[modified_project] ) return _exclude_projects(runtimes_to_test, platform) @@ -246,17 +246,17 @@ def get_env_variables(modified_files: list[str], platform: str) -> Set[str]: modified_projects = _get_modified_projects(modified_files) projects_to_test = _compute_projects_to_test(modified_projects, platform) runtimes_to_test = _compute_runtimes_to_test(modified_projects, platform) -runtimes_to_test_multiconfig = _compute_runtimes_to_test_multiconfig( +runtimes_to_test_needs_reconfig = _compute_runtimes_to_test_needs_reconfig( modified_projects, platform ) runtimes_to_build = _compute_runtimes_to_build( -runtimes_to_test | runtimes_to_test_multiconfig, modified_projects, platform +runtimes_to_test | runtimes_to_test_needs_reconfig, modified_projects, platform ) projects_to_build = _compute_projects_to_build(projects_to_test, runtimes_to_build) projects_check_targets = _compute_project_check_targets(projects_to_test) runtimes_check_targets = _compute_project_check_targets(runtimes_to_test) -runtimes_check_targets_multiconfig = _compute_project_check_targets( -runtimes_to_test_multiconfig +runtimes_check_targets_needs_reconfig = _compute_project_check_targets( +runtimes_to_test_needs_reconfig ) # We use a semicolon to separate the projects/runtimes as they get passed # to the CMake invocation and thus we need to use the CMake list separator @@ -267,8 +267,8 @@ def get_env_variables(modified_files: list[str], platform: str) -> Set[str]: "project_chec
[llvm-branch-commits] [llvm] [CI] Migrate to runtimes build (PR #142696)
boomanaiden154 wrote: > Multiconfig in this context has some strong associations with CMake's Ninja > Multi-Config generator for me. My suggestion is needs_reconfig. > Agree with needs_reconfig. Updated. Thanks for the suggestion! https://github.com/llvm/llvm-project/pull/142696 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [CI] Use LLVM_ENABLE_RUNTIMES for runtimes builds on Linux (PR #142694)
boomanaiden154 wrote: Branch seems to be cleaned up now. https://github.com/llvm/llvm-project/pull/142694 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
@@ -57,6 +57,226 @@ void RegBankLegalizeHelper::findRuleAndApplyMapping(MachineInstr &MI) { lower(MI, Mapping, WaterfallSgprs); } +bool RegBankLegalizeHelper::executeInWaterfallLoop( +MachineIRBuilder &B, iterator_range Range, +SmallSet &SGPROperandRegs) { + // Track use registers which have already been expanded with a readfirstlane + // sequence. This may have multiple uses if moving a sequence. + DenseMap WaterfalledRegMap; + + MachineBasicBlock &MBB = B.getMBB(); + MachineFunction &MF = B.getMF(); + + const SIRegisterInfo *TRI = ST.getRegisterInfo(); + const TargetRegisterClass *WaveRC = TRI->getWaveMaskRegClass(); + unsigned MovExecOpc, MovExecTermOpc, XorTermOpc, AndSaveExecOpc, ExecReg; + if (ST.isWave32()) { +MovExecOpc = AMDGPU::S_MOV_B32; +MovExecTermOpc = AMDGPU::S_MOV_B32_term; +XorTermOpc = AMDGPU::S_XOR_B32_term; +AndSaveExecOpc = AMDGPU::S_AND_SAVEEXEC_B32; +ExecReg = AMDGPU::EXEC_LO; + } else { +MovExecOpc = AMDGPU::S_MOV_B64; +MovExecTermOpc = AMDGPU::S_MOV_B64_term; +XorTermOpc = AMDGPU::S_XOR_B64_term; +AndSaveExecOpc = AMDGPU::S_AND_SAVEEXEC_B64; +ExecReg = AMDGPU::EXEC; + } + +#ifndef NDEBUG + const int OrigRangeSize = std::distance(Range.begin(), Range.end()); +#endif + + MachineRegisterInfo &MRI = *B.getMRI(); + Register SaveExecReg = MRI.createVirtualRegister(WaveRC); + Register InitSaveExecReg = MRI.createVirtualRegister(WaveRC); + + // Don't bother using generic instructions/registers for the exec mask. + B.buildInstr(TargetOpcode::IMPLICIT_DEF).addDef(InitSaveExecReg); + + Register SavedExec = MRI.createVirtualRegister(WaveRC); + + // To insert the loop we need to split the block. Move everything before + // this point to a new block, and insert a new empty block before this + // instruction. + MachineBasicBlock *LoopBB = MF.CreateMachineBasicBlock(); + MachineBasicBlock *BodyBB = MF.CreateMachineBasicBlock(); + MachineBasicBlock *RestoreExecBB = MF.CreateMachineBasicBlock(); + MachineBasicBlock *RemainderBB = MF.CreateMachineBasicBlock(); + MachineFunction::iterator MBBI(MBB); + ++MBBI; + MF.insert(MBBI, LoopBB); + MF.insert(MBBI, BodyBB); + MF.insert(MBBI, RestoreExecBB); + MF.insert(MBBI, RemainderBB); + + LoopBB->addSuccessor(BodyBB); + BodyBB->addSuccessor(RestoreExecBB); + BodyBB->addSuccessor(LoopBB); + + // Move the rest of the block into a new block. + RemainderBB->transferSuccessorsAndUpdatePHIs(&MBB); + RemainderBB->splice(RemainderBB->begin(), &MBB, Range.end(), MBB.end()); + + MBB.addSuccessor(LoopBB); + RestoreExecBB->addSuccessor(RemainderBB); + + B.setInsertPt(*LoopBB, LoopBB->end()); + + // +-MBB:+ + // | ... | + // | %0 = G_INST_1 | + // | %Dst = MI %Vgpr | + // | %1 = G_INST_2 | + // | ... | + // +-+ + // -> + // +-MBB---+ + // | ... | + // | %0 = G_INST_1 | + // | %SaveExecReg = S_MOV_B32 $exec_lo | + // +|--+ + // | /--| + // VV | + // +-LoopBB---+ | + // | %CurrentLaneReg:sgpr(s32) = READFIRSTLANE %Vgpr | | + // | instead of executing for each lane, see if other lanes had | | + // | same value for %Vgpr and execute for them also.| | + // | %CondReg:vcc(s1) = G_ICMP eq %CurrentLaneReg, %Vgpr | | + // | %CondRegLM:sreg_32 = ballot %CondReg // copy vcc to sreg32 lane mask | | + // | %SavedExec = S_AND_SAVEEXEC_B32 %CondRegLM | | + // | exec is active for lanes with the same "CurrentLane value" in Vgpr | | + // +|-+ | + // V | + // +-BodyBB+ | + // | %Dst = MI %CurrentLaneReg:sgpr(s32) | | + // | executed only for active lanes and written to Dst | | + // | $exec = S_XOR_B32 $exec, %SavedExec | | + // | set active lanes to 0 in SavedExec, lanes that did not write to | | + // | Dst yet, and set this as new exec (for READFIRSTLANE and ICMP) | | + // | SI_WATERFALL_LOOP LoopBB |-| + // +|--+ + // V + // +-RestoreExecBB--+ + // | $exec_lo = S_MOV_B32_term %SaveExecReg | + // +|---+ + // V + // +-RemainderBB:--+ + //
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Add ISD::PTRADD DAG combines (PR #142739)
@@ -2627,6 +2629,93 @@ SDValue DAGCombiner::foldSubToAvg(SDNode *N, const SDLoc &DL) { return SDValue(); } +/// Try to fold a pointer arithmetic node. +/// This needs to be done separately from normal addition, because pointer +/// addition is not commutative. +SDValue DAGCombiner::visitPTRADD(SDNode *N) { + SDValue N0 = N->getOperand(0); + SDValue N1 = N->getOperand(1); + EVT PtrVT = N0.getValueType(); + EVT IntVT = N1.getValueType(); + SDLoc DL(N); + + // This is already ensured by an assert in SelectionDAG::getNode(). Several + // combines here depend on this assumption. + assert(PtrVT == IntVT && + "PTRADD with different operand types is not supported"); + + // fold (ptradd undef, y) -> undef + if (N0.isUndef()) +return N0; + + // fold (ptradd x, undef) -> undef + if (N1.isUndef()) +return DAG.getUNDEF(PtrVT); + + // fold (ptradd x, 0) -> x + if (isNullConstant(N1)) +return N0; + + // fold (ptradd 0, x) -> x + if (isNullConstant(N0)) +return N1; + + if (N0.getOpcode() == ISD::PTRADD && ritter-x2a wrote: Indeed, I'll do that if we don't land on moving the target-specific combine below this one in [the other thread](https://app.graphite.dev/github/pr/llvm/llvm-project/142739/%5BAMDGPU%5D%5BSDAG%5D-Add-ISD-PTRADD-DAG-combines#comment-PRRC_kwDOBITxeM5-x5pQ). https://github.com/llvm/llvm-project/pull/142739 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)
@@ -137,7 +138,123 @@ class AMDGPURegBankLegalizeCombiner { return {MatchMI, MatchMI->getOperand(1).getReg()}; } + std::tuple tryMatchRALFromUnmerge(Register Src) { +auto *ReadAnyLane = MRI.getVRegDef(Src); +if (ReadAnyLane->getOpcode() == AMDGPU::G_AMDGPU_READANYLANE) { + Register RALSrc = ReadAnyLane->getOperand(1).getReg(); + auto *UnMerge = getOpcodeDef(RALSrc, MRI); + if (UnMerge) +return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)}; +} +return {nullptr, -1}; + } + + Register getReadAnyLaneSrc(Register Src) { +// Src = G_AMDGPU_READANYLANE RALSrc +auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + +// LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc +// LoSgpr = G_AMDGPU_READANYLANE LoVgpr +// HiSgpr = G_AMDGPU_READANYLANE HiVgpr +// Src G_MERGE_VALUES LoSgpr, HiSgpr +auto *Merge = getOpcodeDef(Src, MRI); +if (Merge) { + unsigned NumElts = Merge->getNumSources(); + auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0)); + if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0) +return {}; + + // check if all elements are from same unmerge and there is no shuffling + for (unsigned i = 1; i < NumElts; ++i) { +auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i)); +if (UnmergeI != Unmerge || (unsigned)IdxI != i) + return {}; + } + return Unmerge->getSourceReg(); +} + +// ..., VgprI, ... = G_UNMERGE_VALUES VgprLarge +// SgprI = G_AMDGPU_READANYLANE VgprI +// SgprLarge G_MERGE_VALUES ..., SgprI, ... +// ..., Src, ... = G_UNMERGE_VALUES SgprLarge +auto *UnMerge = getOpcodeDef(Src, MRI); +if (UnMerge) { + int Idx = UnMerge->findRegisterDefOperandIdx(Src, nullptr); + auto *Merge = getOpcodeDef(UnMerge->getSourceReg(), MRI); + if (Merge) { +auto [RAL, RALSrc] = +tryMatch(Merge->getSourceReg(Idx), AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + } +} + +return {}; + } + + bool tryEliminateReadAnyLane(MachineInstr &Copy) { +Register Dst = Copy.getOperand(0).getReg(); +Register Src = Copy.getOperand(1).getReg(); +if (!Src.isVirtual()) + return false; + +Register RALDst = Src; +MachineInstr &SrcMI = *MRI.getVRegDef(Src); +if (SrcMI.getOpcode() == AMDGPU::G_BITCAST) { + RALDst = SrcMI.getOperand(1).getReg(); +} + +Register RALSrc = getReadAnyLaneSrc(RALDst); +if (!RALSrc) + return false; + +if (Dst.isVirtual()) { + if (SrcMI.getOpcode() != AMDGPU::G_BITCAST) { +// Src = READANYLANE RALSrc +// Dst = Copy Src +// -> +// Dst = RALSrc +MRI.replaceRegWith(Dst, RALSrc); Pierre-vh wrote: Just wondering, can we just emit a COPY instead and let another combine take care of the folding? The two branches are very similar, it'd be nice to make this more terse. Maybe we could use a helper like `copyOrReplace` for `Dst` that does the right thing depending on whether `Dst` is virtual or not? https://github.com/llvm/llvm-project/pull/142789 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [clang-repl] Ensure clang-repl accepts all C keywords supported in all language models (#142749) (PR #142909)
anutosh491 wrote: See https://github.com/llvm/llvm-project/pull/142933#issuecomment-2943354247 :( https://github.com/llvm/llvm-project/pull/142909 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] LowerTypeTests: Avoid zext of ptrtoint ConstantExpr. (PR #142886)
https://github.com/nikic approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/142886 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [clang-repl] Ensure clang-repl accepts all C keywords supported in all language models (#142749) (PR #142909)
https://github.com/anutosh491 converted_to_draft https://github.com/llvm/llvm-project/pull/142909 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [CI] Use LLVM_ENABLE_RUNTIMES for runtimes builds on Linux (PR #142694)
@@ -102,51 +102,25 @@ if [[ "${runtimes}" != "" ]]; then exit 1 fi - echo "--- ninja install-clang" - - ninja -C ${BUILD_DIR} install-clang install-clang-resource-headers - - RUNTIMES_BUILD_DIR="${MONOREPO_ROOT}/build-runtimes" - INSTALL_DIR="${BUILD_DIR}/install" - mkdir -p ${RUNTIMES_BUILD_DIR} - echo "--- cmake runtimes C++26" - rm -rf "${RUNTIMES_BUILD_DIR}" - cmake -S "${MONOREPO_ROOT}/runtimes" -B "${RUNTIMES_BUILD_DIR}" -GNinja \ - -D CMAKE_C_COMPILER="${INSTALL_DIR}/bin/clang" \ - -D CMAKE_CXX_COMPILER="${INSTALL_DIR}/bin/clang++" \ - -D LLVM_ENABLE_RUNTIMES="${runtimes}" \ - -D LIBCXX_CXX_ABI=libcxxabi \ - -D CMAKE_BUILD_TYPE=RelWithDebInfo \ - -D CMAKE_INSTALL_PREFIX="${INSTALL_DIR}" \ - -D LIBCXX_TEST_PARAMS="std=c++26" \ - -D LIBCXXABI_TEST_PARAMS="std=c++26" \ - -D LLVM_LIT_ARGS="${lit_args}" + cmake \ ldionne wrote: I think I don't quite understand what this change does. You're basically just re-generating the CMake cache and re-running `ninja` every time? https://github.com/llvm/llvm-project/pull/142694 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [OpenMP] Add directive spellings introduced in spec v6.0 (PR #141772)
https://github.com/kparzysz closed https://github.com/llvm/llvm-project/pull/141772 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add SimplifyTypeTests pass. (PR #141327)
https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/141327 >From b36c74c344ed47b99e9bfdc28f9081c3c704d8c7 Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Tue, 27 May 2025 23:08:59 -0700 Subject: [PATCH] Format Created using spr 1.3.6-beta.1 --- llvm/lib/Transforms/IPO/LowerTypeTests.cpp | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp index 907a664b0f936..26238acbb3f4d 100644 --- a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp +++ b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp @@ -2508,8 +2508,8 @@ PreservedAnalyses SimplifyTypeTestsPass::run(Module &M, }; for (User *U : make_early_inc_range(GV.users())) { if (auto *CI = dyn_cast(U)) { -if (CI->getPredicate() == CmpInst::ICMP_EQ && -MaySimplifyPtr(CI->getOperand(0))) { +if (CI->getPredicate() == CmpInst::ICMP_EQ && +MaySimplifyPtr(CI->getOperand(0))) { // This is an equality comparison (TypeTestResolution::Single case in // lowerTypeTestCall). In this case we just replace the comparison // with true. @@ -2538,8 +2538,8 @@ PreservedAnalyses SimplifyTypeTestsPass::run(Module &M, if (U.getOperandNo() == 1 && CI && CI->getPredicate() == CmpInst::ICMP_EQ && MaySimplifyInt(CI->getOperand(0))) { - // This is an equality comparison. Unlike in the case above it remained - // as an integer compare. + // This is an equality comparison. Unlike in the case above it + // remained as an integer compare. CI->replaceAllUsesWith(ConstantInt::getTrue(M.getContext())); CI->eraseFromParent(); Changed = true; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add SimplifyTypeTests pass. (PR #141327)
https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/141327 >From b36c74c344ed47b99e9bfdc28f9081c3c704d8c7 Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Tue, 27 May 2025 23:08:59 -0700 Subject: [PATCH] Format Created using spr 1.3.6-beta.1 --- llvm/lib/Transforms/IPO/LowerTypeTests.cpp | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp index 907a664b0f936..26238acbb3f4d 100644 --- a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp +++ b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp @@ -2508,8 +2508,8 @@ PreservedAnalyses SimplifyTypeTestsPass::run(Module &M, }; for (User *U : make_early_inc_range(GV.users())) { if (auto *CI = dyn_cast(U)) { -if (CI->getPredicate() == CmpInst::ICMP_EQ && -MaySimplifyPtr(CI->getOperand(0))) { +if (CI->getPredicate() == CmpInst::ICMP_EQ && +MaySimplifyPtr(CI->getOperand(0))) { // This is an equality comparison (TypeTestResolution::Single case in // lowerTypeTestCall). In this case we just replace the comparison // with true. @@ -2538,8 +2538,8 @@ PreservedAnalyses SimplifyTypeTestsPass::run(Module &M, if (U.getOperandNo() == 1 && CI && CI->getPredicate() == CmpInst::ICMP_EQ && MaySimplifyInt(CI->getOperand(0))) { - // This is an equality comparison. Unlike in the case above it remained - // as an integer compare. + // This is an equality comparison. Unlike in the case above it + // remained as an integer compare. CI->replaceAllUsesWith(ConstantInt::getTrue(M.getContext())); CI->eraseFromParent(); Changed = true; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] DAG: Move soft float predicate management into RuntimeLibcalls (PR #142905)
@@ -11,12 +11,48 @@ using namespace llvm; using namespace RTLIB; +void RuntimeLibcallsInfo::initSoftFloatCmpLibcallPredicates() { + std::fill(SoftFloatCompareLibcallPredicates, topperc wrote: Should we be using `std::begin(SoftFloatCompareLibcallPredicates)` and `std::end(SoftFloatCompareLibcallPredicates)` rather than repeating the size? https://github.com/llvm/llvm-project/pull/142905 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] LowerTypeTests: Avoid zext of ptrtoint ConstantExpr. (PR #142886)
https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/142886 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] LowerTypeTests: Avoid zext of ptrtoint ConstantExpr. (PR #142886)
https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/142886 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)
@@ -1302,6 +1302,24 @@ static void addRange(SmallVectorImpl &EndPoints, EndPoints.push_back(High); } +MDNode *MDNode::getMergedCalleeTypeMetadata(LLVMContext &Ctx, MDNode *A, +MDNode *B) { + SmallVector AB; + SmallSet MergedCallees; nikic wrote: ```suggestion SmallPtrSet MergedCallees; ``` https://github.com/llvm/llvm-project/pull/87573 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type metadata (PR #87573)
@@ -0,0 +1,24 @@ +;; Test if the callee_type metadata attached to indirect call sites adhere to the expected format. + +; RUN: llvm-as < %s | llvm-dis | FileCheck %s +define i32 @_Z13call_indirectPFicEc(ptr %func, i8 signext %x) !type !0 { +entry: + %func.addr = alloca ptr, align 8 + %x.addr = alloca i8, align 1 + store ptr %func, ptr %func.addr, align 8 + store i8 %x, ptr %x.addr, align 1 + %fptr = load ptr, ptr %func.addr, align 8 + %x_val = load i8, ptr %x.addr, align 1 + ; CHECK: %call = call i32 %fptr(i8 signext %x_val), !callee_type !1 + %call = call i32 %fptr(i8 signext %x_val), !callee_type !1 + ret i32 %call +} + +declare !type !2 i32 @_Z3barc(i8 signext) + +!0 = !{i64 0, !"_ZTSFiPvcE.generalized"} +!1 = !{!2} +!2 = !{i64 0, !"_ZTSFicE.generalized"} +!3 = !{i64 0, !"_ZTSFicE"} +!4 = !{!3} +!8 = !{i64 0, !"_ZTSFicE.generalized"} nikic wrote: Looks like there's a bunch of unused metadata here? https://github.com/llvm/llvm-project/pull/87573 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)
https://github.com/momchil-velikov updated https://github.com/llvm/llvm-project/pull/142422 >From 2eb6c95955dc22b6b59eb4e5ba269e4744bbdd2a Mon Sep 17 00:00:00 2001 From: Momchil Velikov Date: Mon, 2 Jun 2025 15:13:13 + Subject: [PATCH 1/3] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` Previously, slices were sometimes marked as non-contiguous when they were actually contiguous. This occurred when the vector type had leading unit dimensions, e.g., `vector<1x1x...x1xd0xd1x...xdn-1xT>``. In such cases, only the trailing n dimensions of the memref need to be contiguous, not the entire vector rank. This affects how `FlattenContiguousRowMajorTransfer{Read,Write}Pattern` flattens `transfer_read` and `transfer_write`` ops. The pattern used to collapse a number of dimensions equal the vector rank, which may be is incorrect when leading dimensions are unit-sized. This patch fixes the issue by collapsing only as many trailing memref dimensions as are actually contiguous. --- .../mlir/Dialect/Vector/Utils/VectorUtils.h | 54 - .../Transforms/VectorTransferOpTransforms.cpp | 8 +- mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp | 25 ++-- .../Vector/vector-transfer-flatten.mlir | 108 +- 4 files changed, 120 insertions(+), 75 deletions(-) diff --git a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h index 6609b28d77b6c..ed06d7a029494 100644 --- a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h +++ b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h @@ -49,35 +49,37 @@ FailureOr> isTranspose2DSlice(vector::TransposeOp op); /// Return true if `vectorType` is a contiguous slice of `memrefType`. /// -/// Only the N = vectorType.getRank() trailing dims of `memrefType` are -/// checked (the other dims are not relevant). Note that for `vectorType` to be -/// a contiguous slice of `memrefType`, the trailing dims of the latter have -/// to be contiguous - this is checked by looking at the corresponding strides. +/// The leading unit dimensions of the vector type are ignored as they +/// are not relevant to the result. Let N be the number of the vector +/// dimensions after ignoring a leading sequence of unit ones. /// -/// There might be some restriction on the leading dim of `VectorType`: +/// For `vectorType` to be a contiguous slice of `memrefType` +/// a) the N trailing dimensions of the latter must be contiguous, and +/// b) the trailing N dimensions of `vectorType` and `memrefType`, +/// except the first of them, must match. /// -/// Case 1. If all the trailing dims of `vectorType` match the trailing dims -/// of `memrefType` then the leading dim of `vectorType` can be -/// arbitrary. -/// -///Ex. 1.1 contiguous slice, perfect match -/// vector<4x3x2xi32> from memref<5x4x3x2xi32> -///Ex. 1.2 contiguous slice, the leading dim does not match (2 != 4) -/// vector<2x3x2xi32> from memref<5x4x3x2xi32> -/// -/// Case 2. If an "internal" dim of `vectorType` does not match the -/// corresponding trailing dim in `memrefType` then the remaining -/// leading dims of `vectorType` have to be 1 (the first non-matching -/// dim can be arbitrary). +/// Examples: /// -///Ex. 2.1 non-contiguous slice, 2 != 3 and the leading dim != <1> -/// vector<2x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.2 contiguous slice, 2 != 3 and the leading dim == <1> -/// vector<1x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.3. contiguous slice, 2 != 3 and the leading dims == <1x1> -/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32> -///Ex. 2.4. non-contiguous slice, 2 != 3 and the leading dims != <1x1> -/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>) +/// Ex.1 contiguous slice, perfect match +/// vector<4x3x2xi32> from memref<5x4x3x2xi32> +/// Ex.2 contiguous slice, the leading dim does not match (2 != 4) +/// vector<2x3x2xi32> from memref<5x4x3x2xi32> +/// Ex.3 non-contiguous slice, 2 != 3 +/// vector<2x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.4 contiguous slice, leading unit dimension of the vector ignored, +///2 != 3 (allowed) +/// vector<1x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.5. contiguous slice, leasing two unit dims of the vector ignored, +/// 2 != 3 (allowed) +/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32> +/// Ex.6. non-contiguous slice, 2 != 3, no leading sequence of unit dims +/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>) +/// Ex.7 contiguous slice, memref needs to be contiguous only on the last +///dimension +/// vector<1x1x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>> +/// Ex.8 non-contiguous slice, memref needs to be contiguous one the last +///two dimensions, and it isn't +/// vector<1x2x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>> bool isContiguo