[clang] [llvm] [NVPTX] Support inline asm with 128-bit operand in NVPTX backend (PR #97113)
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/97113 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/97762 >From c2913d1074c5bfa771379d68e9ba728a3d1d1ce5 Mon Sep 17 00:00:00 2001 From: Alex MacLean Date: Mon, 1 Jul 2024 17:06:56 + Subject: [PATCH 1/4] [ValueTracking] use KnownBits to compute fpclass from bitcast --- llvm/lib/Analysis/ValueTracking.cpp | 30 ++ llvm/test/Transforms/Attributor/nofpclass.ll | 104 +++ 2 files changed, 134 insertions(+) diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp index 85abf00774a02..a16c8e3d48403 100644 --- a/llvm/lib/Analysis/ValueTracking.cpp +++ b/llvm/lib/Analysis/ValueTracking.cpp @@ -5805,6 +5805,36 @@ void computeKnownFPClass(const Value *V, const APInt &DemandedElts, break; } + case Instruction::BitCast: { +const Type *Ty = Op->getType(); +const Value *Casted = Op->getOperand(0); +if (Ty->isVectorTy() || !Casted->getType()->isIntOrIntVectorTy()) + break; + +KnownBits Bits(Ty->getScalarSizeInBits()); +computeKnownBits(Casted, Bits, Depth + 1, Q); + +// Transfer information from the sign bit. +if (Bits.Zero.isSignBitSet()) + Known.signBitMustBeZero(); +else if (Bits.One.isSignBitSet()) + Known.signBitMustBeOne(); + +if (Ty->isIEEE()) { + // IEEE floats are NaN when all bits of the exponent plus at least one of + // the fraction bits are 1. This means: + // - If we assume unknown bits are 0 and the value is NaN, it will + // always be NaN + // - If we assume unknown bits are 1 and the value is not NaN, it can + // never be NaN + if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN()) +Known.KnownFPClasses = fcNan; + else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN()) +Known.knownNot(fcNan); +} + +break; + } default: break; } diff --git a/llvm/test/Transforms/Attributor/nofpclass.ll b/llvm/test/Transforms/Attributor/nofpclass.ll index 781ba636c3ab3..c5d562a436b33 100644 --- a/llvm/test/Transforms/Attributor/nofpclass.ll +++ b/llvm/test/Transforms/Attributor/nofpclass.ll @@ -2690,6 +2690,110 @@ entry: ret double %abs } +define float @bitcast_to_float_sign_0(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float @bitcast_to_float_sign_0 +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 1 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = lshr i32 %arg, 1 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define float @bitcast_to_float_nnan(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float @bitcast_to_float_nnan +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 2 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = lshr i32 %arg, 2 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define float @bitcast_to_float_sign_1(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) float @bitcast_to_float_sign_1 +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], -2147483648 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = or i32 %arg, -2147483648 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define float @bitcast_to_float_nan(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(inf zero sub norm) float @bitcast_to_float_nan +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], 2139095041 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = or i32 %arg, 2139095041 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define double @bitcast_to_double_sign_0(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) double @bitcast_to_double_sign_0 +; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = lshr i64 [[ARG]], 1 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i64 [[TMP1]] to double +; CHECK-NEXT:ret double [[TMP2]] +; + %1 = lshr i64 %arg, 1 + %2 = bitcast i64 %1 to double + ret double %2 +} + +define double @bitcast_to_double_nnan(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define no
[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)
AlexMaclean wrote: > Can you add some tests to demonstrate that this patch will enable more > optimizations in some real-world applications? I can extend the existing test cases to make them more elaborate/real-looking, but I'm guessing that would not qualify as "real-world". This patch is motivated by an internal benchmark where there were some cases where this helped, though even that case is in some sense artificial. Is this a necessary criteria for landing this change? I believe we already handle float to int in KnownBits and adding the inverse in KnownFPClass seems like a correct and reasonable extension of the logic, even if there are not many cases where it is used. https://github.com/llvm/llvm-project/pull/97762 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Remove nvvm.bitcast.* intrinsics (PR #107936)
https://github.com/AlexMaclean created https://github.com/llvm/llvm-project/pull/107936 Remove the following intrinsics which correspond directly to a bitcast: - llvm.nvvm.bitcast.f2i - llvm.nvvm.bitcast.i2f - llvm.nvvm.bitcast.d2ll - llvm.nvvm.bitcast.ll2d >From ff978f81e0eedbc5e7547acabe414f2f1b0fd31a Mon Sep 17 00:00:00 2001 From: Alex MacLean Date: Fri, 6 Sep 2024 18:35:20 + Subject: [PATCH] [NVPTX] Remove nvvm.bitcast.* intrinsics --- clang/include/clang/Basic/BuiltinsNVPTX.def | 8 llvm/include/llvm/IR/IntrinsicsNVVM.td| 18 - llvm/lib/IR/AutoUpgrade.cpp | 8 llvm/lib/Target/NVPTX/NVPTXIntrinsics.td | 14 - .../Assembler/auto_upgrade_nvvm_intrinsics.ll | 20 +++ 5 files changed, 32 insertions(+), 36 deletions(-) diff --git a/clang/include/clang/Basic/BuiltinsNVPTX.def b/clang/include/clang/Basic/BuiltinsNVPTX.def index 20f038a0a9bbde..6fff562165080a 100644 --- a/clang/include/clang/Basic/BuiltinsNVPTX.def +++ b/clang/include/clang/Basic/BuiltinsNVPTX.def @@ -599,14 +599,6 @@ TARGET_BUILTIN(__nvvm_e4m3x2_to_f16x2_rn_relu, "V2hs", "", AND(SM_89,PTX81)) TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn, "V2hs", "", AND(SM_89,PTX81)) TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn_relu, "V2hs", "", AND(SM_89,PTX81)) -// Bitcast - -BUILTIN(__nvvm_bitcast_f2i, "if", "") -BUILTIN(__nvvm_bitcast_i2f, "fi", "") - -BUILTIN(__nvvm_bitcast_ll2d, "dLLi", "") -BUILTIN(__nvvm_bitcast_d2ll, "LLid", "") - // FNS TARGET_BUILTIN(__nvvm_fns, "UiUiUii", "n", PTX60) diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td index 39685c920d948d..737dd6092e2183 100644 --- a/llvm/include/llvm/IR/IntrinsicsNVVM.td +++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td @@ -30,6 +30,10 @@ // * llvm.nvvm.max.ui --> select(x ule y, x, y) // * llvm.nvvm.max.ull --> ibid. // * llvm.nvvm.h2f --> llvm.convert.to.fp16.f32 +// * llvm.nvvm.bitcast.f2i --> bitcast +// * llvm.nvvm.bitcast.i2f --> ibid. +// * llvm.nvvm.bitcast.d2ll --> ibid. +// * llvm.nvvm.bitcast.ll2d --> ibid. def llvm_global_ptr_ty : LLVMQualPointerType<1>; // (global)ptr def llvm_shared_ptr_ty : LLVMQualPointerType<3>; // (shared)ptr @@ -1339,20 +1343,6 @@ let TargetPrefix = "nvvm" in { def int_nvvm_e5m2x2_to_f16x2_rn_relu : ClangBuiltin<"__nvvm_e5m2x2_to_f16x2_rn_relu">, Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>; -// -// Bitcast -// - - def int_nvvm_bitcast_f2i : ClangBuiltin<"__nvvm_bitcast_f2i">, - DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_float_ty], [IntrNoMem, IntrSpeculatable]>; - def int_nvvm_bitcast_i2f : ClangBuiltin<"__nvvm_bitcast_i2f">, - DefaultAttrsIntrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrNoMem, IntrSpeculatable]>; - - def int_nvvm_bitcast_ll2d : ClangBuiltin<"__nvvm_bitcast_ll2d">, - DefaultAttrsIntrinsic<[llvm_double_ty], [llvm_i64_ty], [IntrNoMem, IntrSpeculatable]>; - def int_nvvm_bitcast_d2ll : ClangBuiltin<"__nvvm_bitcast_d2ll">, - DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_double_ty], [IntrNoMem, IntrSpeculatable]>; - // FNS def int_nvvm_fns : ClangBuiltin<"__nvvm_fns">, diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp index 69dae5e32dbbe8..02d1d9d9f78984 100644 --- a/llvm/lib/IR/AutoUpgrade.cpp +++ b/llvm/lib/IR/AutoUpgrade.cpp @@ -1268,6 +1268,10 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn, else if (Name.consume_front("atomic.load.add.")) // nvvm.atomic.load.add.{f32.p,f64.p} Expand = Name.starts_with("f32.p") || Name.starts_with("f64.p"); + else if (Name.consume_front("bitcast.")) +// nvvm.bitcast.{f2i,i2f,ll2d,d2ll} +Expand = +Name == "f2i" || Name == "i2f" || Name == "ll2d" || Name == "d2ll"; else Expand = false; @@ -4258,6 +4262,10 @@ void llvm::UpgradeIntrinsicCall(CallBase *CI, Function *NewFn) { F->getParent(), Intrinsic::convert_from_fp16, {Builder.getFloatTy()}), CI->getArgOperand(0), "h2f"); + } else if (Name.consume_front("bitcast.") && + (Name == "f2i" || Name == "i2f" || Name == "ll2d" || + Name == "d2ll")) { +Rep = Builder.CreateBitCast(CI->getArgOperand(0), CI->getType()); } else { Intrinsic::ID IID = shouldUpgradeNVPTXBF16Intrinsic(Name); if (IID != Intrinsic::not_intrinsic && diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td index 0c883093dd0a54..5c2ef4fa417ac1 100644 --- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td +++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td @@ -1561,20 +1561,6 @@ def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn Int16Regs:$a), def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn_relu Int16Regs:$a), (CVT_f16x2_e5m2x2 Int16Regs:$a, CvtRN_RE
[clang] [llvm] [NVPTX] Remove nvvm.bitcast.* intrinsics (PR #107936)
AlexMaclean wrote: > It may be worth adding a note about this in the release notes. I'm not familiar with these, can you point me to an analogous change I could use as an example? https://github.com/llvm/llvm-project/pull/107936 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/94422 >From 708374e03f1bf70006f2472f19edad1bd621e2d6 Mon Sep 17 00:00:00 2001 From: Alex MacLean Date: Mon, 3 Jun 2024 16:46:36 + Subject: [PATCH] [NVPTX] Revamp NVVMIntrRange pass --- clang/test/CodeGenCUDA/cuda-builtin-vars.cu | 24 +-- llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp| 32 ++-- llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp | 6 +- llvm/lib/Target/NVPTX/NVPTXUtilities.cpp | 58 -- llvm/lib/Target/NVPTX/NVPTXUtilities.h | 16 +- llvm/lib/Target/NVPTX/NVVMIntrRange.cpp | 177 ++- llvm/test/CodeGen/NVPTX/intr-range.ll| 60 +++ llvm/test/CodeGen/NVPTX/intrinsic-old.ll | 43 ++--- 8 files changed, 249 insertions(+), 167 deletions(-) create mode 100644 llvm/test/CodeGen/NVPTX/intr-range.ll diff --git a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu index ba5e5f13ebe70..dba0a76af21dd 100644 --- a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu +++ b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu @@ -6,21 +6,21 @@ __attribute__((global)) void kernel(int *out) { int i = 0; - out[i++] = threadIdx.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() - out[i++] = threadIdx.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.y() - out[i++] = threadIdx.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.z() + out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.x() + out[i++] = threadIdx.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.y() + out[i++] = threadIdx.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.z() - out[i++] = blockIdx.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() - out[i++] = blockIdx.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ctaid.y() - out[i++] = blockIdx.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ctaid.z() + out[i++] = blockIdx.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() + out[i++] = blockIdx.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.y() + out[i++] = blockIdx.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.z() - out[i++] = blockDim.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ntid.x() - out[i++] = blockDim.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ntid.y() - out[i++] = blockDim.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ntid.z() + out[i++] = blockDim.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.x() + out[i++] = blockDim.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.y() + out[i++] = blockDim.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.z() - out[i++] = gridDim.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() - out[i++] = gridDim.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.nctaid.y() - out[i++] = gridDim.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.nctaid.z() + out[i++] = gridDim.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() + out[i++] = gridDim.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.y() + out[i++] = gridDim.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.z() out[i++] = warpSize; // CHECK: store i32 32, diff --git a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp index f63697916d902..82770f8660850 100644 --- a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp @@ -542,30 +542,24 @@ void NVPTXAsmPrinter::emitKernelFunctionDirectives(const Function &F, // If the NVVM IR has some of reqntid* specified, then output // the reqntid directive, and set the unspecified ones to 1. // If none of Reqntid* is specified, don't output reqntid directive. - unsigned Reqntidx, Reqntidy, Reqntidz; - Reqntidx = Reqntidy = Reqntidz = 1; - bool ReqSpecified = false; - ReqSpecified |= getReqNTIDx(F, Reqntidx); - ReqSpecified |= getReqNTIDy(F, Reqntidy); - ReqSpecified |= getReqNTIDz(F, Reqntidz); + std::optional Reqntidx = getReqNTIDx(F); + std::optional Reqntidy = getReqNTIDy(F); + std::optional Reqntidz = getReqNTIDz(F); - if (ReqSpecified) -O << ".reqntid " << Reqntidx << ", " << Reqntidy << ", " << Reqntidz - << "\n"; + if (Reqntidx || Reqntidy || Reqntidz) +O << ".reqntid " << Reqntidx.value_or(1) << ", " << Reqntidy.value_or(1) + << ", " << Reqntidz.value_or(1) << "\n"; // If the NVVM IR has some of maxntid* specified, then output // the maxntid directive, and set the unspecified ones to 1. // If none of maxntid* is specified, don't output maxntid directive. - unsigned Maxntidx, Maxntidy, Maxntidz; - Maxntidx = Maxntidy = Maxntidz = 1; - bool MaxSpecified = false; - MaxSpecified |= getMaxNTIDx(F, Maxntid
[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/94422 >From 708374e03f1bf70006f2472f19edad1bd621e2d6 Mon Sep 17 00:00:00 2001 From: Alex MacLean Date: Mon, 3 Jun 2024 16:46:36 + Subject: [PATCH 1/2] [NVPTX] Revamp NVVMIntrRange pass --- clang/test/CodeGenCUDA/cuda-builtin-vars.cu | 24 +-- llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp| 32 ++-- llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp | 6 +- llvm/lib/Target/NVPTX/NVPTXUtilities.cpp | 58 -- llvm/lib/Target/NVPTX/NVPTXUtilities.h | 16 +- llvm/lib/Target/NVPTX/NVVMIntrRange.cpp | 177 ++- llvm/test/CodeGen/NVPTX/intr-range.ll| 60 +++ llvm/test/CodeGen/NVPTX/intrinsic-old.ll | 43 ++--- 8 files changed, 249 insertions(+), 167 deletions(-) create mode 100644 llvm/test/CodeGen/NVPTX/intr-range.ll diff --git a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu index ba5e5f13ebe70..dba0a76af21dd 100644 --- a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu +++ b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu @@ -6,21 +6,21 @@ __attribute__((global)) void kernel(int *out) { int i = 0; - out[i++] = threadIdx.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() - out[i++] = threadIdx.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.y() - out[i++] = threadIdx.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.z() + out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.x() + out[i++] = threadIdx.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.y() + out[i++] = threadIdx.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.z() - out[i++] = blockIdx.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() - out[i++] = blockIdx.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ctaid.y() - out[i++] = blockIdx.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ctaid.z() + out[i++] = blockIdx.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() + out[i++] = blockIdx.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.y() + out[i++] = blockIdx.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.z() - out[i++] = blockDim.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ntid.x() - out[i++] = blockDim.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ntid.y() - out[i++] = blockDim.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ntid.z() + out[i++] = blockDim.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.x() + out[i++] = blockDim.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.y() + out[i++] = blockDim.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.z() - out[i++] = gridDim.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() - out[i++] = gridDim.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.nctaid.y() - out[i++] = gridDim.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.nctaid.z() + out[i++] = gridDim.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() + out[i++] = gridDim.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.y() + out[i++] = gridDim.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.z() out[i++] = warpSize; // CHECK: store i32 32, diff --git a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp index f63697916d902..82770f8660850 100644 --- a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp @@ -542,30 +542,24 @@ void NVPTXAsmPrinter::emitKernelFunctionDirectives(const Function &F, // If the NVVM IR has some of reqntid* specified, then output // the reqntid directive, and set the unspecified ones to 1. // If none of Reqntid* is specified, don't output reqntid directive. - unsigned Reqntidx, Reqntidy, Reqntidz; - Reqntidx = Reqntidy = Reqntidz = 1; - bool ReqSpecified = false; - ReqSpecified |= getReqNTIDx(F, Reqntidx); - ReqSpecified |= getReqNTIDy(F, Reqntidy); - ReqSpecified |= getReqNTIDz(F, Reqntidz); + std::optional Reqntidx = getReqNTIDx(F); + std::optional Reqntidy = getReqNTIDy(F); + std::optional Reqntidz = getReqNTIDz(F); - if (ReqSpecified) -O << ".reqntid " << Reqntidx << ", " << Reqntidy << ", " << Reqntidz - << "\n"; + if (Reqntidx || Reqntidy || Reqntidz) +O << ".reqntid " << Reqntidx.value_or(1) << ", " << Reqntidy.value_or(1) + << ", " << Reqntidz.value_or(1) << "\n"; // If the NVVM IR has some of maxntid* specified, then output // the maxntid directive, and set the unspecified ones to 1. // If none of maxntid* is specified, don't output maxntid directive. - unsigned Maxntidx, Maxntidy, Maxntidz; - Maxntidx = Maxntidy = Maxntidz = 1; - bool MaxSpecified = false; - MaxSpecified |= getMaxNTIDx(F, Max
[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)
@@ -1,50 +1,51 @@ -//===- NVVMIntrRange.cpp - Set !range metadata for NVVM intrinsics ===// +//===- NVVMIntrRange.cpp - Set range attributes for NVVM intrinsics ---===// // // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // See https://llvm.org/LICENSE.txt for license information. // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // //===--===// // -// This pass adds appropriate !range metadata for calls to NVVM +// This pass adds appropriate range attributes for calls to NVVM // intrinsics that return a limited range of values. // //===--===// #include "NVPTX.h" -#include "llvm/IR/Constants.h" +#include "NVPTXUtilities.h" #include "llvm/IR/InstIterator.h" #include "llvm/IR/Instructions.h" +#include "llvm/IR/IntrinsicInst.h" #include "llvm/IR/Intrinsics.h" #include "llvm/IR/IntrinsicsNVPTX.h" #include "llvm/IR/PassManager.h" #include "llvm/Support/CommandLine.h" +#include using namespace llvm; #define DEBUG_TYPE "nvvm-intr-range" namespace llvm { void initializeNVVMIntrRangePass(PassRegistry &); } -// Add !range metadata based on limits of given SM variant. +// Add range attributes based on limits of given SM variant. static cl::opt NVVMIntrRangeSM("nvvm-intr-range-sm", cl::init(20), AlexMaclean wrote: I just went ahead and removed the SM logic from this pass altogether, all it is doing is reducing a single range for `sm_20`. I think it is fine to give up some small chance of improving perf on this architecture. https://github.com/llvm/llvm-project/pull/94422 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/94422 >From 708374e03f1bf70006f2472f19edad1bd621e2d6 Mon Sep 17 00:00:00 2001 From: Alex MacLean Date: Mon, 3 Jun 2024 16:46:36 + Subject: [PATCH 1/3] [NVPTX] Revamp NVVMIntrRange pass --- clang/test/CodeGenCUDA/cuda-builtin-vars.cu | 24 +-- llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp| 32 ++-- llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp | 6 +- llvm/lib/Target/NVPTX/NVPTXUtilities.cpp | 58 -- llvm/lib/Target/NVPTX/NVPTXUtilities.h | 16 +- llvm/lib/Target/NVPTX/NVVMIntrRange.cpp | 177 ++- llvm/test/CodeGen/NVPTX/intr-range.ll| 60 +++ llvm/test/CodeGen/NVPTX/intrinsic-old.ll | 43 ++--- 8 files changed, 249 insertions(+), 167 deletions(-) create mode 100644 llvm/test/CodeGen/NVPTX/intr-range.ll diff --git a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu index ba5e5f13ebe70..dba0a76af21dd 100644 --- a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu +++ b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu @@ -6,21 +6,21 @@ __attribute__((global)) void kernel(int *out) { int i = 0; - out[i++] = threadIdx.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() - out[i++] = threadIdx.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.y() - out[i++] = threadIdx.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.z() + out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.x() + out[i++] = threadIdx.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.y() + out[i++] = threadIdx.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.z() - out[i++] = blockIdx.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() - out[i++] = blockIdx.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ctaid.y() - out[i++] = blockIdx.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ctaid.z() + out[i++] = blockIdx.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() + out[i++] = blockIdx.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.y() + out[i++] = blockIdx.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.z() - out[i++] = blockDim.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ntid.x() - out[i++] = blockDim.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ntid.y() - out[i++] = blockDim.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ntid.z() + out[i++] = blockDim.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.x() + out[i++] = blockDim.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.y() + out[i++] = blockDim.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.z() - out[i++] = gridDim.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() - out[i++] = gridDim.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.nctaid.y() - out[i++] = gridDim.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.nctaid.z() + out[i++] = gridDim.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() + out[i++] = gridDim.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.y() + out[i++] = gridDim.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.z() out[i++] = warpSize; // CHECK: store i32 32, diff --git a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp index f63697916d902..82770f8660850 100644 --- a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp @@ -542,30 +542,24 @@ void NVPTXAsmPrinter::emitKernelFunctionDirectives(const Function &F, // If the NVVM IR has some of reqntid* specified, then output // the reqntid directive, and set the unspecified ones to 1. // If none of Reqntid* is specified, don't output reqntid directive. - unsigned Reqntidx, Reqntidy, Reqntidz; - Reqntidx = Reqntidy = Reqntidz = 1; - bool ReqSpecified = false; - ReqSpecified |= getReqNTIDx(F, Reqntidx); - ReqSpecified |= getReqNTIDy(F, Reqntidy); - ReqSpecified |= getReqNTIDz(F, Reqntidz); + std::optional Reqntidx = getReqNTIDx(F); + std::optional Reqntidy = getReqNTIDy(F); + std::optional Reqntidz = getReqNTIDz(F); - if (ReqSpecified) -O << ".reqntid " << Reqntidx << ", " << Reqntidy << ", " << Reqntidz - << "\n"; + if (Reqntidx || Reqntidy || Reqntidz) +O << ".reqntid " << Reqntidx.value_or(1) << ", " << Reqntidy.value_or(1) + << ", " << Reqntidz.value_or(1) << "\n"; // If the NVVM IR has some of maxntid* specified, then output // the maxntid directive, and set the unspecified ones to 1. // If none of maxntid* is specified, don't output maxntid directive. - unsigned Maxntidx, Maxntidy, Maxntidz; - Maxntidx = Maxntidy = Maxntidz = 1; - bool MaxSpecified = false; - MaxSpecified |= getMaxNTIDx(F, Max
[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)
@@ -128,6 +128,15 @@ bool findOneNVVMAnnotation(const GlobalValue *gv, const std::string &prop, return true; } +static std::optional +findOneNVVMAnnotation(const GlobalValue &GV, const std::string &PropName) { + unsigned RetVal; + bool Found = findOneNVVMAnnotation(&GV, PropName, RetVal); + if (Found) +return RetVal; AlexMaclean wrote: Done https://github.com/llvm/llvm-project/pull/94422 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)
@@ -0,0 +1,60 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes --version 5 +; RUN: opt < %s -S -mtriple=nvptx-nvidia-cuda -mcpu=sm_20 -passes=nvvm-intr-range | FileCheck %s + +define i32 @test_maxntid() { +; CHECK-LABEL: define i32 @test_maxntid( +; CHECK-SAME: ) #[[ATTR0:[0-9]+]] { +; CHECK-NEXT:[[TMP1:%.*]] = call range(i32 0, 96) i32 @llvm.nvvm.read.ptx.sreg.tid.x() +; CHECK-NEXT:[[TMP2:%.*]] = call range(i32 0, 64) i32 @llvm.nvvm.read.ptx.sreg.tid.z() +; CHECK-NEXT:[[TMP4:%.*]] = call range(i32 1, 97) i32 @llvm.nvvm.read.ptx.sreg.ntid.y() +; CHECK-NEXT:[[TMP3:%.*]] = add i32 [[TMP1]], [[TMP2]] +; CHECK-NEXT:[[TMP5:%.*]] = add i32 [[TMP3]], [[TMP4]] +; CHECK-NEXT:ret i32 [[TMP5]] +; + %1 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() + %2 = call i32 @llvm.nvvm.read.ptx.sreg.tid.z() + %3 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.y() AlexMaclean wrote: Added all the variants. I've removed SM logic so I'm not sure if there is anything else you'd like me to change? https://github.com/llvm/llvm-project/pull/94422 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/94422 >From 708374e03f1bf70006f2472f19edad1bd621e2d6 Mon Sep 17 00:00:00 2001 From: Alex MacLean Date: Mon, 3 Jun 2024 16:46:36 + Subject: [PATCH 1/4] [NVPTX] Revamp NVVMIntrRange pass --- clang/test/CodeGenCUDA/cuda-builtin-vars.cu | 24 +-- llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp| 32 ++-- llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp | 6 +- llvm/lib/Target/NVPTX/NVPTXUtilities.cpp | 58 -- llvm/lib/Target/NVPTX/NVPTXUtilities.h | 16 +- llvm/lib/Target/NVPTX/NVVMIntrRange.cpp | 177 ++- llvm/test/CodeGen/NVPTX/intr-range.ll| 60 +++ llvm/test/CodeGen/NVPTX/intrinsic-old.ll | 43 ++--- 8 files changed, 249 insertions(+), 167 deletions(-) create mode 100644 llvm/test/CodeGen/NVPTX/intr-range.ll diff --git a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu index ba5e5f13ebe70..dba0a76af21dd 100644 --- a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu +++ b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu @@ -6,21 +6,21 @@ __attribute__((global)) void kernel(int *out) { int i = 0; - out[i++] = threadIdx.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() - out[i++] = threadIdx.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.y() - out[i++] = threadIdx.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.z() + out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.x() + out[i++] = threadIdx.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.y() + out[i++] = threadIdx.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.z() - out[i++] = blockIdx.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() - out[i++] = blockIdx.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ctaid.y() - out[i++] = blockIdx.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ctaid.z() + out[i++] = blockIdx.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() + out[i++] = blockIdx.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.y() + out[i++] = blockIdx.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.z() - out[i++] = blockDim.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ntid.x() - out[i++] = blockDim.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ntid.y() - out[i++] = blockDim.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ntid.z() + out[i++] = blockDim.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.x() + out[i++] = blockDim.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.y() + out[i++] = blockDim.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.z() - out[i++] = gridDim.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() - out[i++] = gridDim.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.nctaid.y() - out[i++] = gridDim.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.nctaid.z() + out[i++] = gridDim.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() + out[i++] = gridDim.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.y() + out[i++] = gridDim.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.z() out[i++] = warpSize; // CHECK: store i32 32, diff --git a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp index f63697916d902..82770f8660850 100644 --- a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp @@ -542,30 +542,24 @@ void NVPTXAsmPrinter::emitKernelFunctionDirectives(const Function &F, // If the NVVM IR has some of reqntid* specified, then output // the reqntid directive, and set the unspecified ones to 1. // If none of Reqntid* is specified, don't output reqntid directive. - unsigned Reqntidx, Reqntidy, Reqntidz; - Reqntidx = Reqntidy = Reqntidz = 1; - bool ReqSpecified = false; - ReqSpecified |= getReqNTIDx(F, Reqntidx); - ReqSpecified |= getReqNTIDy(F, Reqntidy); - ReqSpecified |= getReqNTIDz(F, Reqntidz); + std::optional Reqntidx = getReqNTIDx(F); + std::optional Reqntidy = getReqNTIDy(F); + std::optional Reqntidz = getReqNTIDz(F); - if (ReqSpecified) -O << ".reqntid " << Reqntidx << ", " << Reqntidy << ", " << Reqntidz - << "\n"; + if (Reqntidx || Reqntidy || Reqntidz) +O << ".reqntid " << Reqntidx.value_or(1) << ", " << Reqntidy.value_or(1) + << ", " << Reqntidz.value_or(1) << "\n"; // If the NVVM IR has some of maxntid* specified, then output // the maxntid directive, and set the unspecified ones to 1. // If none of maxntid* is specified, don't output maxntid directive. - unsigned Maxntidx, Maxntidy, Maxntidz; - Maxntidx = Maxntidy = Maxntidz = 1; - bool MaxSpecified = false; - MaxSpecified |= getMaxNTIDx(F, Max
[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)
@@ -139,24 +138,23 @@ define ptx_device i32 @test_ctaid_w() { define ptx_device i32 @test_nctaid_y() { ; CHECK: mov.u32 %r{{[0-9]+}}, %nctaid.y; -; RANGE: call i32 @llvm.nvvm.read.ptx.sreg.nctaid.y(), !range ![[GRID_SIZE_YZ:[0-9]+]] +; RANGE: call range(i32 1, 65536) i32 @llvm.nvvm.read.ptx.sreg.nctaid.y() ; CHECK: ret; %x = call i32 @llvm.nvvm.read.ptx.sreg.nctaid.y() ret i32 %x } define ptx_device i32 @test_nctaid_z() { ; CHECK: mov.u32 %r{{[0-9]+}}, %nctaid.z; -; RANGE: call i32 @llvm.nvvm.read.ptx.sreg.nctaid.z(), !range ![[GRID_SIZE_YZ]] +; RANGE: call range(i32 1, 65536) i32 @llvm.nvvm.read.ptx.sreg.nctaid.z() ; CHECK: ret; %x = call i32 @llvm.nvvm.read.ptx.sreg.nctaid.z() ret i32 %x } define ptx_device i32 @test_nctaid_x() { ; CHECK: mov.u32 %r{{[0-9]+}}, %nctaid.x; -; RANGE_30: call i32 @llvm.nvvm.read.ptx.sreg.nctaid.x(), !range ![[GRID_SIZE_X:[0-9]+]] -; RANGE_20: call i32 @llvm.nvvm.read.ptx.sreg.nctaid.x(), !range ![[GRID_SIZE_YZ]] +; RANGE: call range(i32 1, -2147483648) i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() AlexMaclean wrote: I agree it looks weird but my understanding as well is that it is fine, is there anyone else you think we should check with? https://github.com/llvm/llvm-project/pull/94422 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] fixup cuda-builtin-vars.cu broken in IntrRange change (PR #94639)
https://github.com/AlexMaclean created https://github.com/llvm/llvm-project/pull/94639 None >From 227c36f7261854a1b6f8fb12fd902ffa7380be0d Mon Sep 17 00:00:00 2001 From: Alex MacLean Date: Thu, 6 Jun 2024 16:36:19 + Subject: [PATCH] fixup cuda-builtin-vars.cu broken in IntrRange change --- clang/test/CodeGenCUDA/cuda-builtin-vars.cu | 24 ++--- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu index dba0a76af21dd..7880a8036f8cd 100644 --- a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu +++ b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu @@ -6,21 +6,21 @@ __attribute__((global)) void kernel(int *out) { int i = 0; - out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.x() - out[i++] = threadIdx.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.y() - out[i++] = threadIdx.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.z() + out[i++] = threadIdx.x; // CHECK: call noundef{{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.x() + out[i++] = threadIdx.y; // CHECK: call noundef{{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.y() + out[i++] = threadIdx.z; // CHECK: call noundef{{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.z() - out[i++] = blockIdx.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() - out[i++] = blockIdx.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.y() - out[i++] = blockIdx.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.z() + out[i++] = blockIdx.x; // CHECK: call noundef{{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() + out[i++] = blockIdx.y; // CHECK: call noundef{{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.y() + out[i++] = blockIdx.z; // CHECK: call noundef{{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.z() - out[i++] = blockDim.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.x() - out[i++] = blockDim.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.y() - out[i++] = blockDim.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.z() + out[i++] = blockDim.x; // CHECK: call noundef{{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.x() + out[i++] = blockDim.y; // CHECK: call noundef{{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.y() + out[i++] = blockDim.z; // CHECK: call noundef{{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.z() - out[i++] = gridDim.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() - out[i++] = gridDim.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.y() - out[i++] = gridDim.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.z() + out[i++] = gridDim.x; // CHECK: call noundef{{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() + out[i++] = gridDim.y; // CHECK: call noundef{{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.y() + out[i++] = gridDim.z; // CHECK: call noundef{{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.z() out[i++] = warpSize; // CHECK: store i32 32, ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)
@@ -6,21 +6,21 @@ __attribute__((global)) void kernel(int *out) { int i = 0; - out[i++] = threadIdx.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() - out[i++] = threadIdx.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.y() - out[i++] = threadIdx.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.z() + out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.x() AlexMaclean wrote: https://github.com/llvm/llvm-project/pull/94639 https://github.com/llvm/llvm-project/pull/94422 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] fixup cuda-builtin-vars.cu broken in IntrRange change (PR #94639)
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/94639 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)
@@ -1549,30 +1549,10 @@ define amdgpu_kernel void @multiple_uses_fneg_select_f64(double %x, double %y, i define amdgpu_kernel void @fnge_select_f32_multi_use_regression(float %.i2369) { ; GCN-LABEL: fnge_select_f32_multi_use_regression: ; GCN: ; %bb.0: ; %.entry -; GCN-NEXT:s_load_dword s0, s[4:5], 0x0 -; GCN-NEXT:s_waitcnt lgkmcnt(0) -; GCN-NEXT:v_cmp_nlt_f32_e64 s[0:1], s0, 0 -; GCN-NEXT:v_cndmask_b32_e64 v0, 0, 1, s[0:1] -; GCN-NEXT:v_cmp_ngt_f32_e32 vcc, 0, v0 -; GCN-NEXT:v_cndmask_b32_e32 v1, 0, v0, vcc -; GCN-NEXT:v_mul_f32_e64 v0, -v0, v1 -; GCN-NEXT:v_cmp_lt_f32_e32 vcc, 0, v0 -; GCN-NEXT:s_and_b64 vcc, exec, vcc AlexMaclean wrote: https://github.com/llvm/llvm-project/pull/106268 slightly adjusts this test to ensure it doesn't get DCE'd away after this change. https://github.com/llvm/llvm-project/pull/97762 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/97762 >From ce146e18f74e8e984ef83d152f3a5fe88e56f287 Mon Sep 17 00:00:00 2001 From: Alex MacLean Date: Mon, 1 Jul 2024 17:06:56 + Subject: [PATCH 1/6] [ValueTracking] use KnownBits to compute fpclass from bitcast --- llvm/lib/Analysis/ValueTracking.cpp | 30 ++ llvm/test/Transforms/Attributor/nofpclass.ll | 104 +++ 2 files changed, 134 insertions(+) diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp index 173faa32a3878d..023303aa09e362 100644 --- a/llvm/lib/Analysis/ValueTracking.cpp +++ b/llvm/lib/Analysis/ValueTracking.cpp @@ -5921,6 +5921,36 @@ void computeKnownFPClass(const Value *V, const APInt &DemandedElts, break; } + case Instruction::BitCast: { +const Type *Ty = Op->getType(); +const Value *Casted = Op->getOperand(0); +if (Ty->isVectorTy() || !Casted->getType()->isIntOrIntVectorTy()) + break; + +KnownBits Bits(Ty->getScalarSizeInBits()); +computeKnownBits(Casted, Bits, Depth + 1, Q); + +// Transfer information from the sign bit. +if (Bits.Zero.isSignBitSet()) + Known.signBitMustBeZero(); +else if (Bits.One.isSignBitSet()) + Known.signBitMustBeOne(); + +if (Ty->isIEEE()) { + // IEEE floats are NaN when all bits of the exponent plus at least one of + // the fraction bits are 1. This means: + // - If we assume unknown bits are 0 and the value is NaN, it will + // always be NaN + // - If we assume unknown bits are 1 and the value is not NaN, it can + // never be NaN + if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN()) +Known.KnownFPClasses = fcNan; + else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN()) +Known.knownNot(fcNan); +} + +break; + } default: break; } diff --git a/llvm/test/Transforms/Attributor/nofpclass.ll b/llvm/test/Transforms/Attributor/nofpclass.ll index 781ba636c3ab3c..c5d562a436b337 100644 --- a/llvm/test/Transforms/Attributor/nofpclass.ll +++ b/llvm/test/Transforms/Attributor/nofpclass.ll @@ -2690,6 +2690,110 @@ entry: ret double %abs } +define float @bitcast_to_float_sign_0(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float @bitcast_to_float_sign_0 +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 1 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = lshr i32 %arg, 1 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define float @bitcast_to_float_nnan(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float @bitcast_to_float_nnan +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 2 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = lshr i32 %arg, 2 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define float @bitcast_to_float_sign_1(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) float @bitcast_to_float_sign_1 +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], -2147483648 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = or i32 %arg, -2147483648 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define float @bitcast_to_float_nan(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(inf zero sub norm) float @bitcast_to_float_nan +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], 2139095041 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = or i32 %arg, 2139095041 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define double @bitcast_to_double_sign_0(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) double @bitcast_to_double_sign_0 +; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = lshr i64 [[ARG]], 1 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i64 [[TMP1]] to double +; CHECK-NEXT:ret double [[TMP2]] +; + %1 = lshr i64 %arg, 1 + %2 = bitcast i64 %1 to double + ret double %2 +} + +define double @bitcast_to_double_nnan(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: defin
[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/97762 >From ddb38bd6c86e36ab8b46a4fb5f97390d140f4aa1 Mon Sep 17 00:00:00 2001 From: Alex MacLean Date: Mon, 1 Jul 2024 17:06:56 + Subject: [PATCH 1/6] [ValueTracking] use KnownBits to compute fpclass from bitcast --- llvm/lib/Analysis/ValueTracking.cpp | 30 ++ llvm/test/Transforms/Attributor/nofpclass.ll | 104 +++ 2 files changed, 134 insertions(+) diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp index 173faa32a3878d..023303aa09e362 100644 --- a/llvm/lib/Analysis/ValueTracking.cpp +++ b/llvm/lib/Analysis/ValueTracking.cpp @@ -5921,6 +5921,36 @@ void computeKnownFPClass(const Value *V, const APInt &DemandedElts, break; } + case Instruction::BitCast: { +const Type *Ty = Op->getType(); +const Value *Casted = Op->getOperand(0); +if (Ty->isVectorTy() || !Casted->getType()->isIntOrIntVectorTy()) + break; + +KnownBits Bits(Ty->getScalarSizeInBits()); +computeKnownBits(Casted, Bits, Depth + 1, Q); + +// Transfer information from the sign bit. +if (Bits.Zero.isSignBitSet()) + Known.signBitMustBeZero(); +else if (Bits.One.isSignBitSet()) + Known.signBitMustBeOne(); + +if (Ty->isIEEE()) { + // IEEE floats are NaN when all bits of the exponent plus at least one of + // the fraction bits are 1. This means: + // - If we assume unknown bits are 0 and the value is NaN, it will + // always be NaN + // - If we assume unknown bits are 1 and the value is not NaN, it can + // never be NaN + if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN()) +Known.KnownFPClasses = fcNan; + else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN()) +Known.knownNot(fcNan); +} + +break; + } default: break; } diff --git a/llvm/test/Transforms/Attributor/nofpclass.ll b/llvm/test/Transforms/Attributor/nofpclass.ll index 781ba636c3ab3c..c5d562a436b337 100644 --- a/llvm/test/Transforms/Attributor/nofpclass.ll +++ b/llvm/test/Transforms/Attributor/nofpclass.ll @@ -2690,6 +2690,110 @@ entry: ret double %abs } +define float @bitcast_to_float_sign_0(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float @bitcast_to_float_sign_0 +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 1 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = lshr i32 %arg, 1 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define float @bitcast_to_float_nnan(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float @bitcast_to_float_nnan +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 2 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = lshr i32 %arg, 2 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define float @bitcast_to_float_sign_1(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) float @bitcast_to_float_sign_1 +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], -2147483648 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = or i32 %arg, -2147483648 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define float @bitcast_to_float_nan(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(inf zero sub norm) float @bitcast_to_float_nan +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], 2139095041 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = or i32 %arg, 2139095041 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define double @bitcast_to_double_sign_0(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) double @bitcast_to_double_sign_0 +; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = lshr i64 [[ARG]], 1 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i64 [[TMP1]] to double +; CHECK-NEXT:ret double [[TMP2]] +; + %1 = lshr i64 %arg, 1 + %2 = bitcast i64 %1 to double + ret double %2 +} + +define double @bitcast_to_double_nnan(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: defin
[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)
AlexMaclean wrote: @arsenm, @goldsteinn when you have a minute could you take another look at this? I think I've addressed all the issues you've raised. https://github.com/llvm/llvm-project/pull/97762 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)
@@ -5921,6 +5921,61 @@ void computeKnownFPClass(const Value *V, const APInt &DemandedElts, break; } + case Instruction::BitCast: { +const Value *Src; +if (!match(Op, m_ElementWiseBitCast(m_Value(Src))) || +!Src->getType()->isIntOrIntVectorTy()) + break; + +const Type *Ty = Op->getType()->getScalarType(); +KnownBits Bits(Ty->getScalarSizeInBits()); +computeKnownBits(Src, DemandedElts, Bits, Depth + 1, Q); + +// Transfer information from the sign bit. +if (Bits.isNonNegative()) + Known.signBitMustBeZero(); +else if (Bits.isNegative()) + Known.signBitMustBeOne(); + +if (Ty->isIEEE()) { + // IEEE floats are NaN when all bits of the exponent plus at least one of + // the fraction bits are 1. This means: + // - If we assume unknown bits are 0 and the value is NaN, it will + // always be NaN + // - If we assume unknown bits are 1 and the value is not NaN, it can + // never be NaN + if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN()) +Known.KnownFPClasses = fcNan; + else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN()) +Known.knownNot(fcNan); + + // Build KnownBits representing Inf and check if it must be equal or + // unequal to this value. + auto InfKB = KnownBits::makeConstant( + APFloat::getInf(Ty->getFltSemantics()).bitcastToAPInt()); + InfKB.Zero.clearSignBit(); + if (const auto InfResult = KnownBits::eq(Bits, InfKB)) { AlexMaclean wrote: I don't think so. `KnownBits::eq` will return `false` if the inputs cannot be equal and `std::nullopt` if the may or may not be equal (in this case it cannot return `true` because `InfKB` is not fully known). Clearing the sign bit of `Bits` won't change the result either way. https://github.com/llvm/llvm-project/pull/97762 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)
@@ -5921,6 +5921,61 @@ void computeKnownFPClass(const Value *V, const APInt &DemandedElts, break; } + case Instruction::BitCast: { +const Value *Src; +if (!match(Op, m_ElementWiseBitCast(m_Value(Src))) || +!Src->getType()->isIntOrIntVectorTy()) + break; + +const Type *Ty = Op->getType()->getScalarType(); +KnownBits Bits(Ty->getScalarSizeInBits()); +computeKnownBits(Src, DemandedElts, Bits, Depth + 1, Q); + +// Transfer information from the sign bit. +if (Bits.isNonNegative()) + Known.signBitMustBeZero(); +else if (Bits.isNegative()) + Known.signBitMustBeOne(); + +if (Ty->isIEEE()) { + // IEEE floats are NaN when all bits of the exponent plus at least one of + // the fraction bits are 1. This means: + // - If we assume unknown bits are 0 and the value is NaN, it will + // always be NaN + // - If we assume unknown bits are 1 and the value is not NaN, it can + // never be NaN + if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN()) +Known.KnownFPClasses = fcNan; + else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN()) +Known.knownNot(fcNan); + + // Build KnownBits representing Inf and check if it must be equal or + // unequal to this value. + auto InfKB = KnownBits::makeConstant( + APFloat::getInf(Ty->getFltSemantics()).bitcastToAPInt()); + InfKB.Zero.clearSignBit(); + if (const auto InfResult = KnownBits::eq(Bits, InfKB)) { AlexMaclean wrote: If that is the case the values will be: ``` Bits = 1 000 InfKB = ? 000 ``` These may or may not be equal so `std::nullopt` will be returned and no information will be added to the fpclass. I suppose we could handle this case but it will be constant folded anyway so I don't think it is really necessary. https://github.com/llvm/llvm-project/pull/97762 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/97762 >From 0477447f29b2889f92abf44cacd5e0f2c4e7f387 Mon Sep 17 00:00:00 2001 From: Alex MacLean Date: Mon, 1 Jul 2024 17:06:56 + Subject: [PATCH 1/6] [ValueTracking] use KnownBits to compute fpclass from bitcast --- llvm/lib/Analysis/ValueTracking.cpp | 30 ++ llvm/test/Transforms/Attributor/nofpclass.ll | 104 +++ 2 files changed, 134 insertions(+) diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp index 173faa32a3878d..023303aa09e362 100644 --- a/llvm/lib/Analysis/ValueTracking.cpp +++ b/llvm/lib/Analysis/ValueTracking.cpp @@ -5921,6 +5921,36 @@ void computeKnownFPClass(const Value *V, const APInt &DemandedElts, break; } + case Instruction::BitCast: { +const Type *Ty = Op->getType(); +const Value *Casted = Op->getOperand(0); +if (Ty->isVectorTy() || !Casted->getType()->isIntOrIntVectorTy()) + break; + +KnownBits Bits(Ty->getScalarSizeInBits()); +computeKnownBits(Casted, Bits, Depth + 1, Q); + +// Transfer information from the sign bit. +if (Bits.Zero.isSignBitSet()) + Known.signBitMustBeZero(); +else if (Bits.One.isSignBitSet()) + Known.signBitMustBeOne(); + +if (Ty->isIEEE()) { + // IEEE floats are NaN when all bits of the exponent plus at least one of + // the fraction bits are 1. This means: + // - If we assume unknown bits are 0 and the value is NaN, it will + // always be NaN + // - If we assume unknown bits are 1 and the value is not NaN, it can + // never be NaN + if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN()) +Known.KnownFPClasses = fcNan; + else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN()) +Known.knownNot(fcNan); +} + +break; + } default: break; } diff --git a/llvm/test/Transforms/Attributor/nofpclass.ll b/llvm/test/Transforms/Attributor/nofpclass.ll index 781ba636c3ab3c..c5d562a436b337 100644 --- a/llvm/test/Transforms/Attributor/nofpclass.ll +++ b/llvm/test/Transforms/Attributor/nofpclass.ll @@ -2690,6 +2690,110 @@ entry: ret double %abs } +define float @bitcast_to_float_sign_0(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float @bitcast_to_float_sign_0 +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 1 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = lshr i32 %arg, 1 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define float @bitcast_to_float_nnan(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float @bitcast_to_float_nnan +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 2 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = lshr i32 %arg, 2 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define float @bitcast_to_float_sign_1(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) float @bitcast_to_float_sign_1 +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], -2147483648 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = or i32 %arg, -2147483648 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define float @bitcast_to_float_nan(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(inf zero sub norm) float @bitcast_to_float_nan +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], 2139095041 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = or i32 %arg, 2139095041 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define double @bitcast_to_double_sign_0(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) double @bitcast_to_double_sign_0 +; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = lshr i64 [[ARG]], 1 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i64 [[TMP1]] to double +; CHECK-NEXT:ret double [[TMP2]] +; + %1 = lshr i64 %arg, 1 + %2 = bitcast i64 %1 to double + ret double %2 +} + +define double @bitcast_to_double_nnan(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: defin
[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/97762 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/97762 >From 2dc91ada9078e5c7344e74d5b549e896056f89ad Mon Sep 17 00:00:00 2001 From: Alex MacLean Date: Mon, 1 Jul 2024 17:06:56 + Subject: [PATCH 1/5] [ValueTracking] use KnownBits to compute fpclass from bitcast --- llvm/lib/Analysis/ValueTracking.cpp | 30 ++ llvm/test/Transforms/Attributor/nofpclass.ll | 104 +++ 2 files changed, 134 insertions(+) diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp index 173faa32a3878d..023303aa09e362 100644 --- a/llvm/lib/Analysis/ValueTracking.cpp +++ b/llvm/lib/Analysis/ValueTracking.cpp @@ -5921,6 +5921,36 @@ void computeKnownFPClass(const Value *V, const APInt &DemandedElts, break; } + case Instruction::BitCast: { +const Type *Ty = Op->getType(); +const Value *Casted = Op->getOperand(0); +if (Ty->isVectorTy() || !Casted->getType()->isIntOrIntVectorTy()) + break; + +KnownBits Bits(Ty->getScalarSizeInBits()); +computeKnownBits(Casted, Bits, Depth + 1, Q); + +// Transfer information from the sign bit. +if (Bits.Zero.isSignBitSet()) + Known.signBitMustBeZero(); +else if (Bits.One.isSignBitSet()) + Known.signBitMustBeOne(); + +if (Ty->isIEEE()) { + // IEEE floats are NaN when all bits of the exponent plus at least one of + // the fraction bits are 1. This means: + // - If we assume unknown bits are 0 and the value is NaN, it will + // always be NaN + // - If we assume unknown bits are 1 and the value is not NaN, it can + // never be NaN + if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN()) +Known.KnownFPClasses = fcNan; + else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN()) +Known.knownNot(fcNan); +} + +break; + } default: break; } diff --git a/llvm/test/Transforms/Attributor/nofpclass.ll b/llvm/test/Transforms/Attributor/nofpclass.ll index 781ba636c3ab3c..c5d562a436b337 100644 --- a/llvm/test/Transforms/Attributor/nofpclass.ll +++ b/llvm/test/Transforms/Attributor/nofpclass.ll @@ -2690,6 +2690,110 @@ entry: ret double %abs } +define float @bitcast_to_float_sign_0(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float @bitcast_to_float_sign_0 +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 1 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = lshr i32 %arg, 1 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define float @bitcast_to_float_nnan(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float @bitcast_to_float_nnan +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 2 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = lshr i32 %arg, 2 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define float @bitcast_to_float_sign_1(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) float @bitcast_to_float_sign_1 +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], -2147483648 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = or i32 %arg, -2147483648 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define float @bitcast_to_float_nan(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(inf zero sub norm) float @bitcast_to_float_nan +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], 2139095041 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float +; CHECK-NEXT:ret float [[TMP2]] +; + %1 = or i32 %arg, 2139095041 + %2 = bitcast i32 %1 to float + ret float %2 +} + +define double @bitcast_to_double_sign_0(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) double @bitcast_to_double_sign_0 +; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[TMP1:%.*]] = lshr i64 [[ARG]], 1 +; CHECK-NEXT:[[TMP2:%.*]] = bitcast i64 [[TMP1]] to double +; CHECK-NEXT:ret double [[TMP2]] +; + %1 = lshr i64 %arg, 1 + %2 = bitcast i64 %1 to double + ret double %2 +} + +define double @bitcast_to_double_nnan(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: defin
[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)
@@ -5805,6 +5805,37 @@ void computeKnownFPClass(const Value *V, const APInt &DemandedElts, break; } + case Instruction::BitCast: { +const Value *Src; +if (!match(Op, m_ElementWiseBitCast(m_Value(Src))) || +!Src->getType()->isIntOrIntVectorTy()) + break; + +const Type *Ty = Op->getType()->getScalarType(); +KnownBits Bits(Ty->getScalarSizeInBits()); +computeKnownBits(Src, DemandedElts, Bits, Depth + 1, Q); + +// Transfer information from the sign bit. +if (Bits.isNonNegative()) + Known.signBitMustBeZero(); +else if (Bits.isNegative()) + Known.signBitMustBeOne(); + +if (Ty->isIEEE()) { + // IEEE floats are NaN when all bits of the exponent plus at least one of + // the fraction bits are 1. This means: + // - If we assume unknown bits are 0 and the value is NaN, it will + // always be NaN + // - If we assume unknown bits are 1 and the value is not NaN, it can + // never be NaN + if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN()) +Known.KnownFPClasses = fcNan; + else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN()) +Known.knownNot(fcNan); AlexMaclean wrote: Okay, I've added `inf` and also `zero` since those are both relatively simple, but I've left normal / subnormal to the side for now. https://github.com/llvm/llvm-project/pull/97762 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)
@@ -2690,6 +2690,163 @@ entry: ret double %abs } +define float @bitcast_to_float_sign_0(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float @bitcast_to_float_sign_0 +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[SHR:%.*]] = lshr i32 [[ARG]], 1 +; CHECK-NEXT:[[CAST:%.*]] = bitcast i32 [[SHR]] to float +; CHECK-NEXT:ret float [[CAST]] +; + %shr = lshr i32 %arg, 1 + %cast = bitcast i32 %shr to float + ret float %cast +} + +define float @bitcast_to_float_nnan(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float @bitcast_to_float_nnan +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[SHR:%.*]] = lshr i32 [[ARG]], 2 +; CHECK-NEXT:[[CAST:%.*]] = bitcast i32 [[SHR]] to float +; CHECK-NEXT:ret float [[CAST]] +; + %shr = lshr i32 %arg, 2 + %cast = bitcast i32 %shr to float + ret float %cast +} + +define float @bitcast_to_float_sign_1(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) float @bitcast_to_float_sign_1 +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[OR:%.*]] = or i32 [[ARG]], -2147483648 +; CHECK-NEXT:[[CAST:%.*]] = bitcast i32 [[OR]] to float +; CHECK-NEXT:ret float [[CAST]] +; + %or = or i32 %arg, -2147483648 + %cast = bitcast i32 %or to float + ret float %cast +} + +define float @bitcast_to_float_nan(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(inf zero sub norm) float @bitcast_to_float_nan +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[OR:%.*]] = or i32 [[ARG]], 2139095041 +; CHECK-NEXT:[[CAST:%.*]] = bitcast i32 [[OR]] to float +; CHECK-NEXT:ret float [[CAST]] +; + %or = or i32 %arg, 2139095041 + %cast = bitcast i32 %or to float + ret float %cast +} + +define double @bitcast_to_double_sign_0(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) double @bitcast_to_double_sign_0 +; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[SHR:%.*]] = lshr i64 [[ARG]], 1 +; CHECK-NEXT:[[CAST:%.*]] = bitcast i64 [[SHR]] to double +; CHECK-NEXT:ret double [[CAST]] +; + %shr = lshr i64 %arg, 1 + %cast = bitcast i64 %shr to double + ret double %cast +} + +define double @bitcast_to_double_nnan(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) double @bitcast_to_double_nnan +; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[SHR:%.*]] = lshr i64 [[ARG]], 2 +; CHECK-NEXT:[[CAST:%.*]] = bitcast i64 [[SHR]] to double +; CHECK-NEXT:ret double [[CAST]] +; + %shr = lshr i64 %arg, 2 + %cast = bitcast i64 %shr to double + ret double %cast +} + +define double @bitcast_to_double_sign_1(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) double @bitcast_to_double_sign_1 +; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[OR:%.*]] = or i64 [[ARG]], -9223372036854775808 +; CHECK-NEXT:[[CAST:%.*]] = bitcast i64 [[OR]] to double +; CHECK-NEXT:ret double [[CAST]] +; + %or = or i64 %arg, -9223372036854775808 + %cast = bitcast i64 %or to double + ret double %cast +} + +define double @bitcast_to_double_nan(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(inf zero sub norm) double @bitcast_to_double_nan +; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[OR:%.*]] = or i64 [[ARG]], -4503599627370495 +; CHECK-NEXT:[[CAST:%.*]] = bitcast i64 [[OR]] to double +; CHECK-NEXT:ret double [[CAST]] +; + %or = or i64 %arg, -4503599627370495 + %cast = bitcast i64 %or to double + ret double %cast +} + + +define <2 x float> @bitcast_to_float_vect_sign_0(<2 x i32> %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) <2 x float> @bitcast_to_float_vect_sign_0 +; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT:[[SHR:%.*]] = lshr <2 x i32> [[ARG]], +; CHECK-NEXT:[[CAST:%.*]] = bitcast <2 x i32> [[SHR]] to <2 x float> +; CHECK-NEXT:ret <2 x float> [[CAST]] +; + %shr = lshr <2 x i32> %arg, + %cast = bitcast <2 x i32> %shr to <2 x float> + ret <2 x float> %cast +} + +define <2 x float> @bitcast_to_float_vec
[clang] [llvm] [NVPTX] Remove nvvm.bitcast.* intrinsics (PR #107936)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/107936 >From ff978f81e0eedbc5e7547acabe414f2f1b0fd31a Mon Sep 17 00:00:00 2001 From: Alex MacLean Date: Fri, 6 Sep 2024 18:35:20 + Subject: [PATCH 1/2] [NVPTX] Remove nvvm.bitcast.* intrinsics --- clang/include/clang/Basic/BuiltinsNVPTX.def | 8 llvm/include/llvm/IR/IntrinsicsNVVM.td| 18 - llvm/lib/IR/AutoUpgrade.cpp | 8 llvm/lib/Target/NVPTX/NVPTXIntrinsics.td | 14 - .../Assembler/auto_upgrade_nvvm_intrinsics.ll | 20 +++ 5 files changed, 32 insertions(+), 36 deletions(-) diff --git a/clang/include/clang/Basic/BuiltinsNVPTX.def b/clang/include/clang/Basic/BuiltinsNVPTX.def index 20f038a0a9bbde..6fff562165080a 100644 --- a/clang/include/clang/Basic/BuiltinsNVPTX.def +++ b/clang/include/clang/Basic/BuiltinsNVPTX.def @@ -599,14 +599,6 @@ TARGET_BUILTIN(__nvvm_e4m3x2_to_f16x2_rn_relu, "V2hs", "", AND(SM_89,PTX81)) TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn, "V2hs", "", AND(SM_89,PTX81)) TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn_relu, "V2hs", "", AND(SM_89,PTX81)) -// Bitcast - -BUILTIN(__nvvm_bitcast_f2i, "if", "") -BUILTIN(__nvvm_bitcast_i2f, "fi", "") - -BUILTIN(__nvvm_bitcast_ll2d, "dLLi", "") -BUILTIN(__nvvm_bitcast_d2ll, "LLid", "") - // FNS TARGET_BUILTIN(__nvvm_fns, "UiUiUii", "n", PTX60) diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td index 39685c920d948d..737dd6092e2183 100644 --- a/llvm/include/llvm/IR/IntrinsicsNVVM.td +++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td @@ -30,6 +30,10 @@ // * llvm.nvvm.max.ui --> select(x ule y, x, y) // * llvm.nvvm.max.ull --> ibid. // * llvm.nvvm.h2f --> llvm.convert.to.fp16.f32 +// * llvm.nvvm.bitcast.f2i --> bitcast +// * llvm.nvvm.bitcast.i2f --> ibid. +// * llvm.nvvm.bitcast.d2ll --> ibid. +// * llvm.nvvm.bitcast.ll2d --> ibid. def llvm_global_ptr_ty : LLVMQualPointerType<1>; // (global)ptr def llvm_shared_ptr_ty : LLVMQualPointerType<3>; // (shared)ptr @@ -1339,20 +1343,6 @@ let TargetPrefix = "nvvm" in { def int_nvvm_e5m2x2_to_f16x2_rn_relu : ClangBuiltin<"__nvvm_e5m2x2_to_f16x2_rn_relu">, Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>; -// -// Bitcast -// - - def int_nvvm_bitcast_f2i : ClangBuiltin<"__nvvm_bitcast_f2i">, - DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_float_ty], [IntrNoMem, IntrSpeculatable]>; - def int_nvvm_bitcast_i2f : ClangBuiltin<"__nvvm_bitcast_i2f">, - DefaultAttrsIntrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrNoMem, IntrSpeculatable]>; - - def int_nvvm_bitcast_ll2d : ClangBuiltin<"__nvvm_bitcast_ll2d">, - DefaultAttrsIntrinsic<[llvm_double_ty], [llvm_i64_ty], [IntrNoMem, IntrSpeculatable]>; - def int_nvvm_bitcast_d2ll : ClangBuiltin<"__nvvm_bitcast_d2ll">, - DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_double_ty], [IntrNoMem, IntrSpeculatable]>; - // FNS def int_nvvm_fns : ClangBuiltin<"__nvvm_fns">, diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp index 69dae5e32dbbe8..02d1d9d9f78984 100644 --- a/llvm/lib/IR/AutoUpgrade.cpp +++ b/llvm/lib/IR/AutoUpgrade.cpp @@ -1268,6 +1268,10 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn, else if (Name.consume_front("atomic.load.add.")) // nvvm.atomic.load.add.{f32.p,f64.p} Expand = Name.starts_with("f32.p") || Name.starts_with("f64.p"); + else if (Name.consume_front("bitcast.")) +// nvvm.bitcast.{f2i,i2f,ll2d,d2ll} +Expand = +Name == "f2i" || Name == "i2f" || Name == "ll2d" || Name == "d2ll"; else Expand = false; @@ -4258,6 +4262,10 @@ void llvm::UpgradeIntrinsicCall(CallBase *CI, Function *NewFn) { F->getParent(), Intrinsic::convert_from_fp16, {Builder.getFloatTy()}), CI->getArgOperand(0), "h2f"); + } else if (Name.consume_front("bitcast.") && + (Name == "f2i" || Name == "i2f" || Name == "ll2d" || + Name == "d2ll")) { +Rep = Builder.CreateBitCast(CI->getArgOperand(0), CI->getType()); } else { Intrinsic::ID IID = shouldUpgradeNVPTXBF16Intrinsic(Name); if (IID != Intrinsic::not_intrinsic && diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td index 0c883093dd0a54..5c2ef4fa417ac1 100644 --- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td +++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td @@ -1561,20 +1561,6 @@ def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn Int16Regs:$a), def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn_relu Int16Regs:$a), (CVT_f16x2_e5m2x2 Int16Regs:$a, CvtRN_RELU)>; -// -// Bitcast -// - -def INT_NVVM_BITCAST_F2I : F_MATH_1<"mov.b32 \t$dst, $src0;", Int32Regs, - Float32Regs, int_nvvm_bitcast_f2i>; -def INT_NVVM_BITCAST_I2
[clang] [llvm] [NVPTX] Remove nvvm.bitcast.* intrinsics (PR #107936)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/107936 >From ff978f81e0eedbc5e7547acabe414f2f1b0fd31a Mon Sep 17 00:00:00 2001 From: Alex MacLean Date: Fri, 6 Sep 2024 18:35:20 + Subject: [PATCH 1/2] [NVPTX] Remove nvvm.bitcast.* intrinsics --- clang/include/clang/Basic/BuiltinsNVPTX.def | 8 llvm/include/llvm/IR/IntrinsicsNVVM.td| 18 - llvm/lib/IR/AutoUpgrade.cpp | 8 llvm/lib/Target/NVPTX/NVPTXIntrinsics.td | 14 - .../Assembler/auto_upgrade_nvvm_intrinsics.ll | 20 +++ 5 files changed, 32 insertions(+), 36 deletions(-) diff --git a/clang/include/clang/Basic/BuiltinsNVPTX.def b/clang/include/clang/Basic/BuiltinsNVPTX.def index 20f038a0a9bbde..6fff562165080a 100644 --- a/clang/include/clang/Basic/BuiltinsNVPTX.def +++ b/clang/include/clang/Basic/BuiltinsNVPTX.def @@ -599,14 +599,6 @@ TARGET_BUILTIN(__nvvm_e4m3x2_to_f16x2_rn_relu, "V2hs", "", AND(SM_89,PTX81)) TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn, "V2hs", "", AND(SM_89,PTX81)) TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn_relu, "V2hs", "", AND(SM_89,PTX81)) -// Bitcast - -BUILTIN(__nvvm_bitcast_f2i, "if", "") -BUILTIN(__nvvm_bitcast_i2f, "fi", "") - -BUILTIN(__nvvm_bitcast_ll2d, "dLLi", "") -BUILTIN(__nvvm_bitcast_d2ll, "LLid", "") - // FNS TARGET_BUILTIN(__nvvm_fns, "UiUiUii", "n", PTX60) diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td index 39685c920d948d..737dd6092e2183 100644 --- a/llvm/include/llvm/IR/IntrinsicsNVVM.td +++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td @@ -30,6 +30,10 @@ // * llvm.nvvm.max.ui --> select(x ule y, x, y) // * llvm.nvvm.max.ull --> ibid. // * llvm.nvvm.h2f --> llvm.convert.to.fp16.f32 +// * llvm.nvvm.bitcast.f2i --> bitcast +// * llvm.nvvm.bitcast.i2f --> ibid. +// * llvm.nvvm.bitcast.d2ll --> ibid. +// * llvm.nvvm.bitcast.ll2d --> ibid. def llvm_global_ptr_ty : LLVMQualPointerType<1>; // (global)ptr def llvm_shared_ptr_ty : LLVMQualPointerType<3>; // (shared)ptr @@ -1339,20 +1343,6 @@ let TargetPrefix = "nvvm" in { def int_nvvm_e5m2x2_to_f16x2_rn_relu : ClangBuiltin<"__nvvm_e5m2x2_to_f16x2_rn_relu">, Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>; -// -// Bitcast -// - - def int_nvvm_bitcast_f2i : ClangBuiltin<"__nvvm_bitcast_f2i">, - DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_float_ty], [IntrNoMem, IntrSpeculatable]>; - def int_nvvm_bitcast_i2f : ClangBuiltin<"__nvvm_bitcast_i2f">, - DefaultAttrsIntrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrNoMem, IntrSpeculatable]>; - - def int_nvvm_bitcast_ll2d : ClangBuiltin<"__nvvm_bitcast_ll2d">, - DefaultAttrsIntrinsic<[llvm_double_ty], [llvm_i64_ty], [IntrNoMem, IntrSpeculatable]>; - def int_nvvm_bitcast_d2ll : ClangBuiltin<"__nvvm_bitcast_d2ll">, - DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_double_ty], [IntrNoMem, IntrSpeculatable]>; - // FNS def int_nvvm_fns : ClangBuiltin<"__nvvm_fns">, diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp index 69dae5e32dbbe8..02d1d9d9f78984 100644 --- a/llvm/lib/IR/AutoUpgrade.cpp +++ b/llvm/lib/IR/AutoUpgrade.cpp @@ -1268,6 +1268,10 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn, else if (Name.consume_front("atomic.load.add.")) // nvvm.atomic.load.add.{f32.p,f64.p} Expand = Name.starts_with("f32.p") || Name.starts_with("f64.p"); + else if (Name.consume_front("bitcast.")) +// nvvm.bitcast.{f2i,i2f,ll2d,d2ll} +Expand = +Name == "f2i" || Name == "i2f" || Name == "ll2d" || Name == "d2ll"; else Expand = false; @@ -4258,6 +4262,10 @@ void llvm::UpgradeIntrinsicCall(CallBase *CI, Function *NewFn) { F->getParent(), Intrinsic::convert_from_fp16, {Builder.getFloatTy()}), CI->getArgOperand(0), "h2f"); + } else if (Name.consume_front("bitcast.") && + (Name == "f2i" || Name == "i2f" || Name == "ll2d" || + Name == "d2ll")) { +Rep = Builder.CreateBitCast(CI->getArgOperand(0), CI->getType()); } else { Intrinsic::ID IID = shouldUpgradeNVPTXBF16Intrinsic(Name); if (IID != Intrinsic::not_intrinsic && diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td index 0c883093dd0a54..5c2ef4fa417ac1 100644 --- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td +++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td @@ -1561,20 +1561,6 @@ def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn Int16Regs:$a), def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn_relu Int16Regs:$a), (CVT_f16x2_e5m2x2 Int16Regs:$a, CvtRN_RELU)>; -// -// Bitcast -// - -def INT_NVVM_BITCAST_F2I : F_MATH_1<"mov.b32 \t$dst, $src0;", Int32Regs, - Float32Regs, int_nvvm_bitcast_f2i>; -def INT_NVVM_BITCAST_I2
[clang] [llvm] [NVPTX] Remove nvvm.ldg.global.* intrinsics (PR #112834)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/112834 >From 0b43fa7364bf45515905d98cd0731c5509de5196 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Thu, 17 Oct 2024 16:49:24 + Subject: [PATCH 1/2] [NVPTX] Remove nvvm.ldg.global.* intrinsics --- clang/lib/CodeGen/CGBuiltin.cpp | 45 +++-- .../builtins-nvptx-native-half-type-native.c | 4 +- .../CodeGen/builtins-nvptx-native-half-type.c | 4 +- clang/test/CodeGen/builtins-nvptx.c | 72 +++ llvm/include/llvm/IR/IntrinsicsNVVM.td| 18 +- llvm/lib/IR/AutoUpgrade.cpp | 14 ++ llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp | 189 +++--- llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp | 55 + llvm/lib/Target/NVPTX/NVPTXISelLowering.h | 2 - .../Assembler/auto_upgrade_nvvm_intrinsics.ll | 31 +++ 10 files changed, 188 insertions(+), 246 deletions(-) diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index f6d7db2c204c12..3b42977b578e15 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -20473,7 +20473,7 @@ static NVPTXMmaInfo getNVPTXMmaInfo(unsigned BuiltinID) { #undef MMA_VARIANTS_B1_XOR } -static Value *MakeLdgLdu(unsigned IntrinsicID, CodeGenFunction &CGF, +static Value *MakeLdu(unsigned IntrinsicID, CodeGenFunction &CGF, const CallExpr *E) { Value *Ptr = CGF.EmitScalarExpr(E->getArg(0)); QualType ArgType = E->getArg(0)->getType(); @@ -20484,6 +20484,21 @@ static Value *MakeLdgLdu(unsigned IntrinsicID, CodeGenFunction &CGF, {Ptr, ConstantInt::get(CGF.Builder.getInt32Ty(), Align.getQuantity())}); } +static Value *MakeLdg(CodeGenFunction &CGF, const CallExpr *E) { + Value *Ptr = CGF.EmitScalarExpr(E->getArg(0)); + QualType ArgType = E->getArg(0)->getType(); + clang::CharUnits AlignV = CGF.CGM.getNaturalPointeeTypeAlignment(ArgType); + llvm::Type *ElemTy = CGF.ConvertTypeForMem(ArgType->getPointeeType()); + + // Use addrspace(1) for NVPTX ADDRESS_SPACE_GLOBAL + auto *ASC = CGF.Builder.CreateAddrSpaceCast(Ptr, CGF.Builder.getPtrTy(1)); + auto *LD = CGF.Builder.CreateAlignedLoad(ElemTy, ASC, AlignV.getAsAlign()); + MDNode *MD = MDNode::get(CGF.Builder.getContext(), {}); + LD->setMetadata(LLVMContext::MD_invariant_load, MD); + + return LD; +} + static Value *MakeScopedAtomic(unsigned IntrinsicID, CodeGenFunction &CGF, const CallExpr *E) { Value *Ptr = CGF.EmitScalarExpr(E->getArg(0)); @@ -20517,9 +20532,11 @@ static Value *MakeHalfType(unsigned IntrinsicID, unsigned BuiltinID, return nullptr; } - if (IntrinsicID == Intrinsic::nvvm_ldg_global_f || - IntrinsicID == Intrinsic::nvvm_ldu_global_f) -return MakeLdgLdu(IntrinsicID, CGF, E); + if (BuiltinID == NVPTX::BI__nvvm_ldg_h || BuiltinID == NVPTX::BI__nvvm_ldg_h2) +return MakeLdg(CGF, E); + + if (IntrinsicID == Intrinsic::nvvm_ldu_global_f) +return MakeLdu(IntrinsicID, CGF, E); SmallVector Args; auto *F = CGF.CGM.getIntrinsic(IntrinsicID); @@ -20656,16 +20673,15 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned BuiltinID, case NVPTX::BI__nvvm_ldg_ul2: case NVPTX::BI__nvvm_ldg_ull: case NVPTX::BI__nvvm_ldg_ull2: -// PTX Interoperability section 2.2: "For a vector with an even number of -// elements, its alignment is set to number of elements times the alignment -// of its member: n*alignof(t)." -return MakeLdgLdu(Intrinsic::nvvm_ldg_global_i, *this, E); case NVPTX::BI__nvvm_ldg_f: case NVPTX::BI__nvvm_ldg_f2: case NVPTX::BI__nvvm_ldg_f4: case NVPTX::BI__nvvm_ldg_d: case NVPTX::BI__nvvm_ldg_d2: -return MakeLdgLdu(Intrinsic::nvvm_ldg_global_f, *this, E); +// PTX Interoperability section 2.2: "For a vector with an even number of +// elements, its alignment is set to number of elements times the alignment +// of its member: n*alignof(t)." +return MakeLdg(*this, E); case NVPTX::BI__nvvm_ldu_c: case NVPTX::BI__nvvm_ldu_sc: @@ -20696,13 +20712,13 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned BuiltinID, case NVPTX::BI__nvvm_ldu_ul2: case NVPTX::BI__nvvm_ldu_ull: case NVPTX::BI__nvvm_ldu_ull2: -return MakeLdgLdu(Intrinsic::nvvm_ldu_global_i, *this, E); +return MakeLdu(Intrinsic::nvvm_ldu_global_i, *this, E); case NVPTX::BI__nvvm_ldu_f: case NVPTX::BI__nvvm_ldu_f2: case NVPTX::BI__nvvm_ldu_f4: case NVPTX::BI__nvvm_ldu_d: case NVPTX::BI__nvvm_ldu_d2: -return MakeLdgLdu(Intrinsic::nvvm_ldu_global_f, *this, E); +return MakeLdu(Intrinsic::nvvm_ldu_global_f, *this, E); case NVPTX::BI__nvvm_atom_cta_add_gen_i: case NVPTX::BI__nvvm_atom_cta_add_gen_l: @@ -21176,14 +21192,11 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned BuiltinID, return MakeHalfType(Intrinsic::nvvm_fmin_xorsign_abs_f16x2, BuiltinID, E, *this); case NVPTX::BI__nvvm_ldg_h
[clang] [llvm] [NVPTX] Remove nvvm.ldg.global.* intrinsics (PR #112834)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/112834 >From 3c21269ad0b7be617b06cde5debe405f99ef17ef Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Thu, 17 Oct 2024 16:49:24 + Subject: [PATCH 1/2] [NVPTX] Remove nvvm.ldg.global.* intrinsics --- clang/lib/CodeGen/CGBuiltin.cpp | 45 +++-- .../builtins-nvptx-native-half-type-native.c | 4 +- .../CodeGen/builtins-nvptx-native-half-type.c | 4 +- clang/test/CodeGen/builtins-nvptx.c | 72 +++ llvm/include/llvm/IR/IntrinsicsNVVM.td| 18 +- llvm/lib/IR/AutoUpgrade.cpp | 14 ++ llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp | 189 +++--- llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp | 55 + llvm/lib/Target/NVPTX/NVPTXISelLowering.h | 2 - .../Assembler/auto_upgrade_nvvm_intrinsics.ll | 31 +++ 10 files changed, 188 insertions(+), 246 deletions(-) diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 1ad950798c2118..40a875ab29c900 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -20485,7 +20485,7 @@ static NVPTXMmaInfo getNVPTXMmaInfo(unsigned BuiltinID) { #undef MMA_VARIANTS_B1_XOR } -static Value *MakeLdgLdu(unsigned IntrinsicID, CodeGenFunction &CGF, +static Value *MakeLdu(unsigned IntrinsicID, CodeGenFunction &CGF, const CallExpr *E) { Value *Ptr = CGF.EmitScalarExpr(E->getArg(0)); QualType ArgType = E->getArg(0)->getType(); @@ -20496,6 +20496,21 @@ static Value *MakeLdgLdu(unsigned IntrinsicID, CodeGenFunction &CGF, {Ptr, ConstantInt::get(CGF.Builder.getInt32Ty(), Align.getQuantity())}); } +static Value *MakeLdg(CodeGenFunction &CGF, const CallExpr *E) { + Value *Ptr = CGF.EmitScalarExpr(E->getArg(0)); + QualType ArgType = E->getArg(0)->getType(); + clang::CharUnits AlignV = CGF.CGM.getNaturalPointeeTypeAlignment(ArgType); + llvm::Type *ElemTy = CGF.ConvertTypeForMem(ArgType->getPointeeType()); + + // Use addrspace(1) for NVPTX ADDRESS_SPACE_GLOBAL + auto *ASC = CGF.Builder.CreateAddrSpaceCast(Ptr, CGF.Builder.getPtrTy(1)); + auto *LD = CGF.Builder.CreateAlignedLoad(ElemTy, ASC, AlignV.getAsAlign()); + MDNode *MD = MDNode::get(CGF.Builder.getContext(), {}); + LD->setMetadata(LLVMContext::MD_invariant_load, MD); + + return LD; +} + static Value *MakeScopedAtomic(unsigned IntrinsicID, CodeGenFunction &CGF, const CallExpr *E) { Value *Ptr = CGF.EmitScalarExpr(E->getArg(0)); @@ -20529,9 +20544,11 @@ static Value *MakeHalfType(unsigned IntrinsicID, unsigned BuiltinID, return nullptr; } - if (IntrinsicID == Intrinsic::nvvm_ldg_global_f || - IntrinsicID == Intrinsic::nvvm_ldu_global_f) -return MakeLdgLdu(IntrinsicID, CGF, E); + if (BuiltinID == NVPTX::BI__nvvm_ldg_h || BuiltinID == NVPTX::BI__nvvm_ldg_h2) +return MakeLdg(CGF, E); + + if (IntrinsicID == Intrinsic::nvvm_ldu_global_f) +return MakeLdu(IntrinsicID, CGF, E); SmallVector Args; auto *F = CGF.CGM.getIntrinsic(IntrinsicID); @@ -20668,16 +20685,15 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned BuiltinID, case NVPTX::BI__nvvm_ldg_ul2: case NVPTX::BI__nvvm_ldg_ull: case NVPTX::BI__nvvm_ldg_ull2: -// PTX Interoperability section 2.2: "For a vector with an even number of -// elements, its alignment is set to number of elements times the alignment -// of its member: n*alignof(t)." -return MakeLdgLdu(Intrinsic::nvvm_ldg_global_i, *this, E); case NVPTX::BI__nvvm_ldg_f: case NVPTX::BI__nvvm_ldg_f2: case NVPTX::BI__nvvm_ldg_f4: case NVPTX::BI__nvvm_ldg_d: case NVPTX::BI__nvvm_ldg_d2: -return MakeLdgLdu(Intrinsic::nvvm_ldg_global_f, *this, E); +// PTX Interoperability section 2.2: "For a vector with an even number of +// elements, its alignment is set to number of elements times the alignment +// of its member: n*alignof(t)." +return MakeLdg(*this, E); case NVPTX::BI__nvvm_ldu_c: case NVPTX::BI__nvvm_ldu_sc: @@ -20708,13 +20724,13 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned BuiltinID, case NVPTX::BI__nvvm_ldu_ul2: case NVPTX::BI__nvvm_ldu_ull: case NVPTX::BI__nvvm_ldu_ull2: -return MakeLdgLdu(Intrinsic::nvvm_ldu_global_i, *this, E); +return MakeLdu(Intrinsic::nvvm_ldu_global_i, *this, E); case NVPTX::BI__nvvm_ldu_f: case NVPTX::BI__nvvm_ldu_f2: case NVPTX::BI__nvvm_ldu_f4: case NVPTX::BI__nvvm_ldu_d: case NVPTX::BI__nvvm_ldu_d2: -return MakeLdgLdu(Intrinsic::nvvm_ldu_global_f, *this, E); +return MakeLdu(Intrinsic::nvvm_ldu_global_f, *this, E); case NVPTX::BI__nvvm_atom_cta_add_gen_i: case NVPTX::BI__nvvm_atom_cta_add_gen_l: @@ -21188,14 +21204,11 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned BuiltinID, return MakeHalfType(Intrinsic::nvvm_fmin_xorsign_abs_f16x2, BuiltinID, E, *this); case NVPTX::BI__nvvm_ldg_h
[clang] [llvm] [NVPTX] Remove nvvm.ldg.global.* intrinsics (PR #112834)
https://github.com/AlexMaclean created https://github.com/llvm/llvm-project/pull/112834 Remove these intrinsics which can be better represented by load instructions with `!invariant.load` metadata: - llvm.nvvm.ldg.global.i - llvm.nvvm.ldg.global.f - llvm.nvvm.ldg.global.p >From 0b43fa7364bf45515905d98cd0731c5509de5196 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Thu, 17 Oct 2024 16:49:24 + Subject: [PATCH] [NVPTX] Remove nvvm.ldg.global.* intrinsics --- clang/lib/CodeGen/CGBuiltin.cpp | 45 +++-- .../builtins-nvptx-native-half-type-native.c | 4 +- .../CodeGen/builtins-nvptx-native-half-type.c | 4 +- clang/test/CodeGen/builtins-nvptx.c | 72 +++ llvm/include/llvm/IR/IntrinsicsNVVM.td| 18 +- llvm/lib/IR/AutoUpgrade.cpp | 14 ++ llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp | 189 +++--- llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp | 55 + llvm/lib/Target/NVPTX/NVPTXISelLowering.h | 2 - .../Assembler/auto_upgrade_nvvm_intrinsics.ll | 31 +++ 10 files changed, 188 insertions(+), 246 deletions(-) diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index f6d7db2c204c12..3b42977b578e15 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -20473,7 +20473,7 @@ static NVPTXMmaInfo getNVPTXMmaInfo(unsigned BuiltinID) { #undef MMA_VARIANTS_B1_XOR } -static Value *MakeLdgLdu(unsigned IntrinsicID, CodeGenFunction &CGF, +static Value *MakeLdu(unsigned IntrinsicID, CodeGenFunction &CGF, const CallExpr *E) { Value *Ptr = CGF.EmitScalarExpr(E->getArg(0)); QualType ArgType = E->getArg(0)->getType(); @@ -20484,6 +20484,21 @@ static Value *MakeLdgLdu(unsigned IntrinsicID, CodeGenFunction &CGF, {Ptr, ConstantInt::get(CGF.Builder.getInt32Ty(), Align.getQuantity())}); } +static Value *MakeLdg(CodeGenFunction &CGF, const CallExpr *E) { + Value *Ptr = CGF.EmitScalarExpr(E->getArg(0)); + QualType ArgType = E->getArg(0)->getType(); + clang::CharUnits AlignV = CGF.CGM.getNaturalPointeeTypeAlignment(ArgType); + llvm::Type *ElemTy = CGF.ConvertTypeForMem(ArgType->getPointeeType()); + + // Use addrspace(1) for NVPTX ADDRESS_SPACE_GLOBAL + auto *ASC = CGF.Builder.CreateAddrSpaceCast(Ptr, CGF.Builder.getPtrTy(1)); + auto *LD = CGF.Builder.CreateAlignedLoad(ElemTy, ASC, AlignV.getAsAlign()); + MDNode *MD = MDNode::get(CGF.Builder.getContext(), {}); + LD->setMetadata(LLVMContext::MD_invariant_load, MD); + + return LD; +} + static Value *MakeScopedAtomic(unsigned IntrinsicID, CodeGenFunction &CGF, const CallExpr *E) { Value *Ptr = CGF.EmitScalarExpr(E->getArg(0)); @@ -20517,9 +20532,11 @@ static Value *MakeHalfType(unsigned IntrinsicID, unsigned BuiltinID, return nullptr; } - if (IntrinsicID == Intrinsic::nvvm_ldg_global_f || - IntrinsicID == Intrinsic::nvvm_ldu_global_f) -return MakeLdgLdu(IntrinsicID, CGF, E); + if (BuiltinID == NVPTX::BI__nvvm_ldg_h || BuiltinID == NVPTX::BI__nvvm_ldg_h2) +return MakeLdg(CGF, E); + + if (IntrinsicID == Intrinsic::nvvm_ldu_global_f) +return MakeLdu(IntrinsicID, CGF, E); SmallVector Args; auto *F = CGF.CGM.getIntrinsic(IntrinsicID); @@ -20656,16 +20673,15 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned BuiltinID, case NVPTX::BI__nvvm_ldg_ul2: case NVPTX::BI__nvvm_ldg_ull: case NVPTX::BI__nvvm_ldg_ull2: -// PTX Interoperability section 2.2: "For a vector with an even number of -// elements, its alignment is set to number of elements times the alignment -// of its member: n*alignof(t)." -return MakeLdgLdu(Intrinsic::nvvm_ldg_global_i, *this, E); case NVPTX::BI__nvvm_ldg_f: case NVPTX::BI__nvvm_ldg_f2: case NVPTX::BI__nvvm_ldg_f4: case NVPTX::BI__nvvm_ldg_d: case NVPTX::BI__nvvm_ldg_d2: -return MakeLdgLdu(Intrinsic::nvvm_ldg_global_f, *this, E); +// PTX Interoperability section 2.2: "For a vector with an even number of +// elements, its alignment is set to number of elements times the alignment +// of its member: n*alignof(t)." +return MakeLdg(*this, E); case NVPTX::BI__nvvm_ldu_c: case NVPTX::BI__nvvm_ldu_sc: @@ -20696,13 +20712,13 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned BuiltinID, case NVPTX::BI__nvvm_ldu_ul2: case NVPTX::BI__nvvm_ldu_ull: case NVPTX::BI__nvvm_ldu_ull2: -return MakeLdgLdu(Intrinsic::nvvm_ldu_global_i, *this, E); +return MakeLdu(Intrinsic::nvvm_ldu_global_i, *this, E); case NVPTX::BI__nvvm_ldu_f: case NVPTX::BI__nvvm_ldu_f2: case NVPTX::BI__nvvm_ldu_f4: case NVPTX::BI__nvvm_ldu_d: case NVPTX::BI__nvvm_ldu_d2: -return MakeLdgLdu(Intrinsic::nvvm_ldu_global_f, *this, E); +return MakeLdu(Intrinsic::nvvm_ldu_global_f, *this, E); case NVPTX::BI__nvvm_atom_cta_add_gen_i: case NVPTX::BI__nvvm_atom_cta_add_gen_l: @@ -21176,14 +21192,11 @@ Value *CodeGenFunction:
[clang] [llvm] [NVPTX] Remove nvvm.ldg.global.* intrinsics (PR #112834)
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/112834 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Remove nvvm.bitcast.* intrinsics (PR #107936)
@@ -599,14 +599,6 @@ TARGET_BUILTIN(__nvvm_e4m3x2_to_f16x2_rn_relu, "V2hs", "", AND(SM_89,PTX81)) TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn, "V2hs", "", AND(SM_89,PTX81)) TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn_relu, "V2hs", "", AND(SM_89,PTX81)) -// Bitcast AlexMaclean wrote: Thanks! I've just confirmed these do not work in nvcc. https://github.com/llvm/llvm-project/pull/107936 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Remove nvvm.bitcast.* intrinsics (PR #107936)
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/107936 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Remove nvvm.bitcast.* intrinsics (PR #107936)
@@ -599,14 +599,6 @@ TARGET_BUILTIN(__nvvm_e4m3x2_to_f16x2_rn_relu, "V2hs", "", AND(SM_89,PTX81)) TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn, "V2hs", "", AND(SM_89,PTX81)) TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn_relu, "V2hs", "", AND(SM_89,PTX81)) -// Bitcast AlexMaclean wrote: @jlebar can you confirm it is okay to remove builtins like this? I'm doing this based on your commit 46624a822d3a3df4a4b6dff0d231acb45d269853. Just want to make sure I'm not missing something. https://github.com/llvm/llvm-project/pull/107936 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)
AlexMaclean wrote: @Artem-B ping for review https://github.com/llvm/llvm-project/pull/120806 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)
@@ -10,8 +10,14 @@ // CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[RET_ADDR]], align 8 // CHECK-NEXT:store i32 1, ptr [[TMP0]], align 4 // CHECK-NEXT:ret void +// __attribute__((nvptx_kernel)) void foo(int *ret) { *ret = 1; } -// CHECK: !0 = !{ptr @foo, !"kernel", i32 1} +//. +// CHECK: attributes #[[ATTR0]] = { convergent noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="sm_61" "target-features"="+ptx32,+sm_61" } +//. AlexMaclean wrote: Yep https://github.com/llvm/llvm-project/pull/120806 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/120806 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/125908 >From 12bdf8bfa72b10d1e8ccc305cd57c337f2799e52 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Wed, 5 Feb 2025 18:46:03 + Subject: [PATCH 1/2] [NVPTX] Convert scalar function nvvm.annotations to attributes --- clang/lib/CodeGen/Targets/NVPTX.cpp | 15 ++--- clang/test/CodeGenCUDA/launch-bounds.cu | 32 ++ llvm/docs/NVPTXUsage.rst | 37 +++ llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 2 +- llvm/lib/IR/AutoUpgrade.cpp | 16 + .../Target/NVPTX/NVPTXCtorDtorLowering.cpp| 9 +-- llvm/lib/Target/NVPTX/NVPTXUtilities.cpp | 13 +++- .../KernelInfo/launch-bounds/nvptx.ll | 4 +- llvm/test/CodeGen/NVPTX/annotations.ll| 12 +--- llvm/test/CodeGen/NVPTX/lower-ctor-dtor.ll| 16 +++-- llvm/test/CodeGen/NVPTX/maxclusterrank.ll | 8 +-- .../CodeGen/NVPTX/upgrade-nvvm-annotations.ll | 64 +++ .../Dialect/NVVM/NVVMToLLVMIRTranslation.cpp | 7 +- .../LLVMIR/external-func-dialect-attr.mlir| 4 +- mlir/test/Target/LLVMIR/nvvmir.mlir | 21 +++--- 15 files changed, 160 insertions(+), 100 deletions(-) diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp b/clang/lib/CodeGen/Targets/NVPTX.cpp index b82e4ddb9f3f2b..f89d32d4e13fe9 100644 --- a/clang/lib/CodeGen/Targets/NVPTX.cpp +++ b/clang/lib/CodeGen/Targets/NVPTX.cpp @@ -375,11 +375,8 @@ void CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F, if (MinBlocks > 0) { if (MinBlocksVal) *MinBlocksVal = MinBlocks.getExtValue(); - if (F) { -// Create !{, metadata !"minctasm", i32 } node -NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "minctasm", -MinBlocks.getExtValue()); - } + if (F) +F->addFnAttr("nvvm.minctasm", llvm::utostr(MinBlocks.getExtValue())); } } if (Attr->getMaxBlocks()) { @@ -388,11 +385,9 @@ void CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F, if (MaxBlocks > 0) { if (MaxClusterRankVal) *MaxClusterRankVal = MaxBlocks.getExtValue(); - if (F) { -// Create !{, metadata !"maxclusterrank", i32 } node -NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "maxclusterrank", -MaxBlocks.getExtValue()); - } + if (F) +F->addFnAttr("nvvm.maxclusterrank", + llvm::utostr(MaxBlocks.getExtValue())); } } } diff --git a/clang/test/CodeGenCUDA/launch-bounds.cu b/clang/test/CodeGenCUDA/launch-bounds.cu index 31ca9216b413e9..72f7857264f8cf 100644 --- a/clang/test/CodeGenCUDA/launch-bounds.cu +++ b/clang/test/CodeGenCUDA/launch-bounds.cu @@ -9,6 +9,25 @@ #define MAX_BLOCKS_PER_MP 4 #endif +// CHECK: @Kernel1() #[[ATTR0:[0-9]+]] +// CHECK: @{{.*}}Kernel4{{.*}}() #[[ATTR0]] +// CHECK: @{{.*}}Kernel5{{.*}}() #[[ATTR1:[0-9]+]] +// CHECK: @{{.*}}Kernel6{{.*}}() #[[ATTR0]] +// CHECK: @{{.*}}Kernel8{{.*}}() #[[ATTR3:[0-9]+]] + +// CHECK: attributes #[[ATTR0]] = {{{.*}} "nvvm.minctasm"="2" {{.*}}} +// CHECK: attributes #[[ATTR1]] = {{{.*}} "nvvm.minctasm"="258" {{.*}}} +// CHECK: attributes #[[ATTR3]] = {{{.*}} "nvvm.minctasm"="12" {{.*}}} + +// CHECK_MAX_BLOCKS: @Kernel1_sm_90() #[[ATTR4:[0-9]+]] +// CHECK_MAX_BLOCKS: @{{.*}}Kernel4_sm_90{{.*}} #[[ATTR4]] +// CHECK_MAX_BLOCKS: @{{.*}}Kernel5_sm_90{{.*}} #[[ATTR5:[0-9]+]] +// CHECK_MAX_BLOCKS: @{{.*}}Kernel8_sm_90{{.*}} #[[ATTR6:[0-9]+]] + +// CHECK_MAX_BLOCKS: attributes #[[ATTR4]] = {{{.*}} "nvvm.maxclusterrank"="4" "nvvm.minctasm"="2" {{.*}}} +// CHECK_MAX_BLOCKS: attributes #[[ATTR5]] = {{{.*}} "nvvm.maxclusterrank"="260" "nvvm.minctasm"="258" {{.*}}} +// CHECK_MAX_BLOCKS: attributes #[[ATTR6]] = {{{.*}} "nvvm.maxclusterrank"="14" "nvvm.minctasm"="12" {{.*}}} + // Test both max threads per block and Min cta per sm. extern "C" { __global__ void @@ -19,7 +38,6 @@ Kernel1() } // CHECK: !{{[0-9]+}} = !{ptr @Kernel1, !"maxntidx", i32 256} -// CHECK: !{{[0-9]+}} = !{ptr @Kernel1, !"minctasm", i32 2} #ifdef USE_MAX_BLOCKS // Test max threads per block and min/max cta per sm. @@ -32,8 +50,6 @@ Kernel1_sm_90() } // CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"maxntidx", i32 256} -// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"minctasm", i32 2} -// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"maxclusterrank", i32 4} #endif // USE_MAX_BLOCKS // Test only max threads per block. Min cta per sm defaults to 0, and @@ -67,7 +83,6 @@ Kernel4() template __global__ void Kernel4(); // CHECK: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4{{.*}}, !"maxntidx", i32 256} -// CHECK: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4{{.*}}, !"minctasm", i32 2} #ifdef USE_MAX_BLOCKS template @@ -79,8 +94,6 @@ Kernel4_sm_90() template __global__ void Kernel4_sm_90(); // CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4_s
[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)
@@ -179,6 +179,13 @@ static bool argHasNVVMAnnotation(const Value &Val, return false; } +static std::optional getFnAttrParsedIntOrNull(const Function &F, +StringRef Attr) { + if (F.hasFnAttribute(Attr)) +return F.getFnAttributeAsParsedInteger(Attr); + return std::nullopt; AlexMaclean wrote: Had to be a little more explicit to make the compiler happy but I've switched to a ternary as requested. https://github.com/llvm/llvm-project/pull/125908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)
@@ -179,6 +179,13 @@ static bool argHasNVVMAnnotation(const Value &Val, return false; } +static std::optional getFnAttrParsedIntOrNull(const Function &F, AlexMaclean wrote: Removed https://github.com/llvm/llvm-project/pull/125908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)
https://github.com/AlexMaclean created https://github.com/llvm/llvm-project/pull/125908 Replace some more nvvm.annotations with function attributes, auto-upgrading the annotations as needed. These new attributes will be more idiomatic and compile-time efficient than the annotations. - !"maxclusterrank" / !"cluster_max_blocks" -> "nvvm.maxclusterrank" - !"minctasm" -> "nvvm.minctasm" - !"maxnreg" -> "nvvm.maxnreg" >From 8dd9f3bbd91678ca8a56c5c62d65008faf5ff21f Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Wed, 5 Feb 2025 18:46:03 + Subject: [PATCH] [NVPTX] Convert scalar function nvvm.annotations to attributes --- clang/lib/CodeGen/Targets/NVPTX.cpp | 15 ++--- clang/test/CodeGenCUDA/launch-bounds.cu | 32 ++ llvm/docs/NVPTXUsage.rst | 37 +++ llvm/lib/IR/AutoUpgrade.cpp | 16 + .../Target/NVPTX/NVPTXCtorDtorLowering.cpp| 9 +-- llvm/lib/Target/NVPTX/NVPTXUtilities.cpp | 13 +++- .../KernelInfo/launch-bounds/nvptx.ll | 4 +- llvm/test/CodeGen/NVPTX/annotations.ll| 12 +--- llvm/test/CodeGen/NVPTX/lower-ctor-dtor.ll| 16 +++-- llvm/test/CodeGen/NVPTX/maxclusterrank.ll | 8 +-- .../CodeGen/NVPTX/upgrade-nvvm-annotations.ll | 64 +++ .../Dialect/NVVM/NVVMToLLVMIRTranslation.cpp | 7 +- mlir/test/Target/LLVMIR/nvvmir.mlir | 21 +++--- 13 files changed, 157 insertions(+), 97 deletions(-) diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp b/clang/lib/CodeGen/Targets/NVPTX.cpp index b82e4ddb9f3f2b..f89d32d4e13fe9 100644 --- a/clang/lib/CodeGen/Targets/NVPTX.cpp +++ b/clang/lib/CodeGen/Targets/NVPTX.cpp @@ -375,11 +375,8 @@ void CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F, if (MinBlocks > 0) { if (MinBlocksVal) *MinBlocksVal = MinBlocks.getExtValue(); - if (F) { -// Create !{, metadata !"minctasm", i32 } node -NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "minctasm", -MinBlocks.getExtValue()); - } + if (F) +F->addFnAttr("nvvm.minctasm", llvm::utostr(MinBlocks.getExtValue())); } } if (Attr->getMaxBlocks()) { @@ -388,11 +385,9 @@ void CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F, if (MaxBlocks > 0) { if (MaxClusterRankVal) *MaxClusterRankVal = MaxBlocks.getExtValue(); - if (F) { -// Create !{, metadata !"maxclusterrank", i32 } node -NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "maxclusterrank", -MaxBlocks.getExtValue()); - } + if (F) +F->addFnAttr("nvvm.maxclusterrank", + llvm::utostr(MaxBlocks.getExtValue())); } } } diff --git a/clang/test/CodeGenCUDA/launch-bounds.cu b/clang/test/CodeGenCUDA/launch-bounds.cu index 31ca9216b413e9..72f7857264f8cf 100644 --- a/clang/test/CodeGenCUDA/launch-bounds.cu +++ b/clang/test/CodeGenCUDA/launch-bounds.cu @@ -9,6 +9,25 @@ #define MAX_BLOCKS_PER_MP 4 #endif +// CHECK: @Kernel1() #[[ATTR0:[0-9]+]] +// CHECK: @{{.*}}Kernel4{{.*}}() #[[ATTR0]] +// CHECK: @{{.*}}Kernel5{{.*}}() #[[ATTR1:[0-9]+]] +// CHECK: @{{.*}}Kernel6{{.*}}() #[[ATTR0]] +// CHECK: @{{.*}}Kernel8{{.*}}() #[[ATTR3:[0-9]+]] + +// CHECK: attributes #[[ATTR0]] = {{{.*}} "nvvm.minctasm"="2" {{.*}}} +// CHECK: attributes #[[ATTR1]] = {{{.*}} "nvvm.minctasm"="258" {{.*}}} +// CHECK: attributes #[[ATTR3]] = {{{.*}} "nvvm.minctasm"="12" {{.*}}} + +// CHECK_MAX_BLOCKS: @Kernel1_sm_90() #[[ATTR4:[0-9]+]] +// CHECK_MAX_BLOCKS: @{{.*}}Kernel4_sm_90{{.*}} #[[ATTR4]] +// CHECK_MAX_BLOCKS: @{{.*}}Kernel5_sm_90{{.*}} #[[ATTR5:[0-9]+]] +// CHECK_MAX_BLOCKS: @{{.*}}Kernel8_sm_90{{.*}} #[[ATTR6:[0-9]+]] + +// CHECK_MAX_BLOCKS: attributes #[[ATTR4]] = {{{.*}} "nvvm.maxclusterrank"="4" "nvvm.minctasm"="2" {{.*}}} +// CHECK_MAX_BLOCKS: attributes #[[ATTR5]] = {{{.*}} "nvvm.maxclusterrank"="260" "nvvm.minctasm"="258" {{.*}}} +// CHECK_MAX_BLOCKS: attributes #[[ATTR6]] = {{{.*}} "nvvm.maxclusterrank"="14" "nvvm.minctasm"="12" {{.*}}} + // Test both max threads per block and Min cta per sm. extern "C" { __global__ void @@ -19,7 +38,6 @@ Kernel1() } // CHECK: !{{[0-9]+}} = !{ptr @Kernel1, !"maxntidx", i32 256} -// CHECK: !{{[0-9]+}} = !{ptr @Kernel1, !"minctasm", i32 2} #ifdef USE_MAX_BLOCKS // Test max threads per block and min/max cta per sm. @@ -32,8 +50,6 @@ Kernel1_sm_90() } // CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"maxntidx", i32 256} -// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"minctasm", i32 2} -// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"maxclusterrank", i32 4} #endif // USE_MAX_BLOCKS // Test only max threads per block. Min cta per sm defaults to 0, and @@ -67,7 +83,6 @@ Kernel4() template __global__ void Kernel4(); // CHECK: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4{{.*}}, !"maxntidx", i32 256} -// CHECK: !{{[0-9]+}} = !{
[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/125908 >From d66d8adac5cf32f7f9f5878799c0167d39f41df7 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Wed, 5 Feb 2025 18:46:03 + Subject: [PATCH] [NVPTX] Convert scalar function nvvm.annotations to attributes --- clang/lib/CodeGen/Targets/NVPTX.cpp | 15 ++--- clang/test/CodeGenCUDA/launch-bounds.cu | 32 ++ llvm/docs/NVPTXUsage.rst | 37 +++ llvm/lib/IR/AutoUpgrade.cpp | 16 + .../Target/NVPTX/NVPTXCtorDtorLowering.cpp| 9 +-- llvm/lib/Target/NVPTX/NVPTXUtilities.cpp | 13 +++- .../KernelInfo/launch-bounds/nvptx.ll | 4 +- llvm/test/CodeGen/NVPTX/annotations.ll| 12 +--- llvm/test/CodeGen/NVPTX/lower-ctor-dtor.ll| 16 +++-- llvm/test/CodeGen/NVPTX/maxclusterrank.ll | 8 +-- .../CodeGen/NVPTX/upgrade-nvvm-annotations.ll | 64 +++ .../Dialect/NVVM/NVVMToLLVMIRTranslation.cpp | 7 +- mlir/test/Target/LLVMIR/nvvmir.mlir | 21 +++--- 13 files changed, 157 insertions(+), 97 deletions(-) diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp b/clang/lib/CodeGen/Targets/NVPTX.cpp index b82e4ddb9f3f2b..f89d32d4e13fe9 100644 --- a/clang/lib/CodeGen/Targets/NVPTX.cpp +++ b/clang/lib/CodeGen/Targets/NVPTX.cpp @@ -375,11 +375,8 @@ void CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F, if (MinBlocks > 0) { if (MinBlocksVal) *MinBlocksVal = MinBlocks.getExtValue(); - if (F) { -// Create !{, metadata !"minctasm", i32 } node -NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "minctasm", -MinBlocks.getExtValue()); - } + if (F) +F->addFnAttr("nvvm.minctasm", llvm::utostr(MinBlocks.getExtValue())); } } if (Attr->getMaxBlocks()) { @@ -388,11 +385,9 @@ void CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F, if (MaxBlocks > 0) { if (MaxClusterRankVal) *MaxClusterRankVal = MaxBlocks.getExtValue(); - if (F) { -// Create !{, metadata !"maxclusterrank", i32 } node -NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "maxclusterrank", -MaxBlocks.getExtValue()); - } + if (F) +F->addFnAttr("nvvm.maxclusterrank", + llvm::utostr(MaxBlocks.getExtValue())); } } } diff --git a/clang/test/CodeGenCUDA/launch-bounds.cu b/clang/test/CodeGenCUDA/launch-bounds.cu index 31ca9216b413e9..72f7857264f8cf 100644 --- a/clang/test/CodeGenCUDA/launch-bounds.cu +++ b/clang/test/CodeGenCUDA/launch-bounds.cu @@ -9,6 +9,25 @@ #define MAX_BLOCKS_PER_MP 4 #endif +// CHECK: @Kernel1() #[[ATTR0:[0-9]+]] +// CHECK: @{{.*}}Kernel4{{.*}}() #[[ATTR0]] +// CHECK: @{{.*}}Kernel5{{.*}}() #[[ATTR1:[0-9]+]] +// CHECK: @{{.*}}Kernel6{{.*}}() #[[ATTR0]] +// CHECK: @{{.*}}Kernel8{{.*}}() #[[ATTR3:[0-9]+]] + +// CHECK: attributes #[[ATTR0]] = {{{.*}} "nvvm.minctasm"="2" {{.*}}} +// CHECK: attributes #[[ATTR1]] = {{{.*}} "nvvm.minctasm"="258" {{.*}}} +// CHECK: attributes #[[ATTR3]] = {{{.*}} "nvvm.minctasm"="12" {{.*}}} + +// CHECK_MAX_BLOCKS: @Kernel1_sm_90() #[[ATTR4:[0-9]+]] +// CHECK_MAX_BLOCKS: @{{.*}}Kernel4_sm_90{{.*}} #[[ATTR4]] +// CHECK_MAX_BLOCKS: @{{.*}}Kernel5_sm_90{{.*}} #[[ATTR5:[0-9]+]] +// CHECK_MAX_BLOCKS: @{{.*}}Kernel8_sm_90{{.*}} #[[ATTR6:[0-9]+]] + +// CHECK_MAX_BLOCKS: attributes #[[ATTR4]] = {{{.*}} "nvvm.maxclusterrank"="4" "nvvm.minctasm"="2" {{.*}}} +// CHECK_MAX_BLOCKS: attributes #[[ATTR5]] = {{{.*}} "nvvm.maxclusterrank"="260" "nvvm.minctasm"="258" {{.*}}} +// CHECK_MAX_BLOCKS: attributes #[[ATTR6]] = {{{.*}} "nvvm.maxclusterrank"="14" "nvvm.minctasm"="12" {{.*}}} + // Test both max threads per block and Min cta per sm. extern "C" { __global__ void @@ -19,7 +38,6 @@ Kernel1() } // CHECK: !{{[0-9]+}} = !{ptr @Kernel1, !"maxntidx", i32 256} -// CHECK: !{{[0-9]+}} = !{ptr @Kernel1, !"minctasm", i32 2} #ifdef USE_MAX_BLOCKS // Test max threads per block and min/max cta per sm. @@ -32,8 +50,6 @@ Kernel1_sm_90() } // CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"maxntidx", i32 256} -// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"minctasm", i32 2} -// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"maxclusterrank", i32 4} #endif // USE_MAX_BLOCKS // Test only max threads per block. Min cta per sm defaults to 0, and @@ -67,7 +83,6 @@ Kernel4() template __global__ void Kernel4(); // CHECK: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4{{.*}}, !"maxntidx", i32 256} -// CHECK: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4{{.*}}, !"minctasm", i32 2} #ifdef USE_MAX_BLOCKS template @@ -79,8 +94,6 @@ Kernel4_sm_90() template __global__ void Kernel4_sm_90(); // CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4_sm_90{{.*}}, !"maxntidx", i32 256} -// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4_sm_90{{.*}}, !"minctas
[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)
@@ -179,6 +179,13 @@ static bool argHasNVVMAnnotation(const Value &Val, return false; } +static std::optional getFnAttrParsedInt(const Function &F, + StringRef Attr) { + return F.hasFnAttribute(Attr) + ? std::optional(F.getFnAttributeAsParsedInteger(Attr)) + : std::nullopt; AlexMaclean wrote: No worries! I agree it is basically a wash and will leave it as it currently is. https://github.com/llvm/llvm-project/pull/125908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/125908 >From cb6ac07e72cc1361343470842793cf9bc4995a19 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Wed, 5 Feb 2025 18:46:03 + Subject: [PATCH 1/2] [NVPTX] Convert scalar function nvvm.annotations to attributes --- clang/lib/CodeGen/Targets/NVPTX.cpp | 15 ++--- clang/test/CodeGenCUDA/launch-bounds.cu | 32 ++ llvm/docs/NVPTXUsage.rst | 37 +++ llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 2 +- llvm/lib/IR/AutoUpgrade.cpp | 16 + .../Target/NVPTX/NVPTXCtorDtorLowering.cpp| 9 +-- llvm/lib/Target/NVPTX/NVPTXUtilities.cpp | 13 +++- .../KernelInfo/launch-bounds/nvptx.ll | 4 +- llvm/test/CodeGen/NVPTX/annotations.ll| 12 +--- llvm/test/CodeGen/NVPTX/lower-ctor-dtor.ll| 16 +++-- llvm/test/CodeGen/NVPTX/maxclusterrank.ll | 8 +-- .../CodeGen/NVPTX/upgrade-nvvm-annotations.ll | 64 +++ .../Dialect/NVVM/NVVMToLLVMIRTranslation.cpp | 7 +- .../LLVMIR/external-func-dialect-attr.mlir| 4 +- mlir/test/Target/LLVMIR/nvvmir.mlir | 21 +++--- 15 files changed, 160 insertions(+), 100 deletions(-) diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp b/clang/lib/CodeGen/Targets/NVPTX.cpp index b82e4ddb9f3f2..f89d32d4e13fe 100644 --- a/clang/lib/CodeGen/Targets/NVPTX.cpp +++ b/clang/lib/CodeGen/Targets/NVPTX.cpp @@ -375,11 +375,8 @@ void CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F, if (MinBlocks > 0) { if (MinBlocksVal) *MinBlocksVal = MinBlocks.getExtValue(); - if (F) { -// Create !{, metadata !"minctasm", i32 } node -NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "minctasm", -MinBlocks.getExtValue()); - } + if (F) +F->addFnAttr("nvvm.minctasm", llvm::utostr(MinBlocks.getExtValue())); } } if (Attr->getMaxBlocks()) { @@ -388,11 +385,9 @@ void CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F, if (MaxBlocks > 0) { if (MaxClusterRankVal) *MaxClusterRankVal = MaxBlocks.getExtValue(); - if (F) { -// Create !{, metadata !"maxclusterrank", i32 } node -NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "maxclusterrank", -MaxBlocks.getExtValue()); - } + if (F) +F->addFnAttr("nvvm.maxclusterrank", + llvm::utostr(MaxBlocks.getExtValue())); } } } diff --git a/clang/test/CodeGenCUDA/launch-bounds.cu b/clang/test/CodeGenCUDA/launch-bounds.cu index 31ca9216b413e..72f7857264f8c 100644 --- a/clang/test/CodeGenCUDA/launch-bounds.cu +++ b/clang/test/CodeGenCUDA/launch-bounds.cu @@ -9,6 +9,25 @@ #define MAX_BLOCKS_PER_MP 4 #endif +// CHECK: @Kernel1() #[[ATTR0:[0-9]+]] +// CHECK: @{{.*}}Kernel4{{.*}}() #[[ATTR0]] +// CHECK: @{{.*}}Kernel5{{.*}}() #[[ATTR1:[0-9]+]] +// CHECK: @{{.*}}Kernel6{{.*}}() #[[ATTR0]] +// CHECK: @{{.*}}Kernel8{{.*}}() #[[ATTR3:[0-9]+]] + +// CHECK: attributes #[[ATTR0]] = {{{.*}} "nvvm.minctasm"="2" {{.*}}} +// CHECK: attributes #[[ATTR1]] = {{{.*}} "nvvm.minctasm"="258" {{.*}}} +// CHECK: attributes #[[ATTR3]] = {{{.*}} "nvvm.minctasm"="12" {{.*}}} + +// CHECK_MAX_BLOCKS: @Kernel1_sm_90() #[[ATTR4:[0-9]+]] +// CHECK_MAX_BLOCKS: @{{.*}}Kernel4_sm_90{{.*}} #[[ATTR4]] +// CHECK_MAX_BLOCKS: @{{.*}}Kernel5_sm_90{{.*}} #[[ATTR5:[0-9]+]] +// CHECK_MAX_BLOCKS: @{{.*}}Kernel8_sm_90{{.*}} #[[ATTR6:[0-9]+]] + +// CHECK_MAX_BLOCKS: attributes #[[ATTR4]] = {{{.*}} "nvvm.maxclusterrank"="4" "nvvm.minctasm"="2" {{.*}}} +// CHECK_MAX_BLOCKS: attributes #[[ATTR5]] = {{{.*}} "nvvm.maxclusterrank"="260" "nvvm.minctasm"="258" {{.*}}} +// CHECK_MAX_BLOCKS: attributes #[[ATTR6]] = {{{.*}} "nvvm.maxclusterrank"="14" "nvvm.minctasm"="12" {{.*}}} + // Test both max threads per block and Min cta per sm. extern "C" { __global__ void @@ -19,7 +38,6 @@ Kernel1() } // CHECK: !{{[0-9]+}} = !{ptr @Kernel1, !"maxntidx", i32 256} -// CHECK: !{{[0-9]+}} = !{ptr @Kernel1, !"minctasm", i32 2} #ifdef USE_MAX_BLOCKS // Test max threads per block and min/max cta per sm. @@ -32,8 +50,6 @@ Kernel1_sm_90() } // CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"maxntidx", i32 256} -// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"minctasm", i32 2} -// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"maxclusterrank", i32 4} #endif // USE_MAX_BLOCKS // Test only max threads per block. Min cta per sm defaults to 0, and @@ -67,7 +83,6 @@ Kernel4() template __global__ void Kernel4(); // CHECK: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4{{.*}}, !"maxntidx", i32 256} -// CHECK: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4{{.*}}, !"minctasm", i32 2} #ifdef USE_MAX_BLOCKS template @@ -79,8 +94,6 @@ Kernel4_sm_90() template __global__ void Kernel4_sm_90(); // CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4_sm_90
[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/125908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)
@@ -1270,77 +1270,21 @@ exit: ; MODULE: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind } ; MODULE: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree nounwind willreturn } ; MODULE: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) } -; MODULE: attributes #[[ATTR4]] = { "kernel" } -; MODULE: attributes #[[ATTR5]] = { nosync memory(none) } +; MODULE: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" } +; MODULE: attributes #[[ATTR5]] = { "kernel" } +; MODULE: attributes #[[ATTR6]] = { nosync memory(none) } ;. ; CGSCC: attributes #[[ATTR0]] = { "llvm.assume"="ompx_aligned_barrier" } ; CGSCC: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind } ; CGSCC: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree nounwind willreturn } ; CGSCC: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) } -; CGSCC: attributes #[[ATTR4]] = { "kernel" } -; CGSCC: attributes #[[ATTR5]] = { nosync memory(none) } +; CGSCC: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" } AlexMaclean wrote: The problem is that OpenMP seems to need to be able to draw a distinction between OpenMP kernels and nvvm kernels. For example here it seems like OpenMP only wants to look at "kernel" not "nvvm.kernel". As a result it seems like these attributes cannot be easily unified. https://github.com/llvm/llvm-project/blob/c835b48a4d72227b174bcd86f071238a1583803a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp#L5932-L5938 https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)
@@ -324,14 +326,15 @@ MaybeAlign getAlign(const Function &F, unsigned Index) { F.getAttributes().getAttributes(Index).getStackAlignment()) return StackAlign; - // If that is missing, check the legacy nvvm metadata - std::vector Vs; - bool retval = findAllNVVMAnnotation(&F, "align", Vs); - if (!retval) -return std::nullopt; - for (unsigned V : Vs) -if ((V >> 16) == Index) - return Align(V & 0x); + // check the legacy nvvm metadata only for the return value since llvm does + // not support stackalign attribute for this. + if (Index == 0) { +std::vector Vs; +if (findAllNVVMAnnotation(&F, "align", Vs)) AlexMaclean wrote: Yea, I agree the NVVM annotation APIs could be cleaned up significantly, hopefully this work will remove the need for them altogether though. https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)
@@ -5022,6 +5022,69 @@ bool llvm::UpgradeDebugInfo(Module &M) { return Modified; } +bool static upgradeSingleNVVMAnnotation(GlobalValue *GV, StringRef K, +const Metadata *V) { + if (K == "kernel") { +assert(mdconst::extract(V)->getZExtValue() == 1); +cast(GV)->addFnAttr("nvvm.kernel"); +return true; + } + if (K == "align") { +const uint64_t AlignBits = mdconst::extract(V)->getZExtValue(); +const unsigned Idx = (AlignBits >> 16); +const Align StackAlign = Align(AlignBits & 0x); AlexMaclean wrote: Fixed https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/119261 >From f9f30a77f5e7232f968a3063c34338c9dfc7bac5 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Fri, 8 Nov 2024 22:39:34 + Subject: [PATCH 1/3] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations --- llvm/lib/Target/NVPTX/CMakeLists.txt | 1 + llvm/lib/Target/NVPTX/NVPTX.h | 5 + llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp | 4 + llvm/lib/Target/NVPTX/NVPTXUtilities.cpp | 9 +- .../Target/NVPTX/NVVMUpgradeAnnotations.cpp | 130 ++ .../CodeGen/NVPTX/upgrade-nvvm-annotations.ll | 30 6 files changed, 177 insertions(+), 2 deletions(-) create mode 100644 llvm/lib/Target/NVPTX/NVVMUpgradeAnnotations.cpp create mode 100644 llvm/test/CodeGen/NVPTX/upgrade-nvvm-annotations.ll diff --git a/llvm/lib/Target/NVPTX/CMakeLists.txt b/llvm/lib/Target/NVPTX/CMakeLists.txt index 693365161330f5..bb2e4ad48b51d8 100644 --- a/llvm/lib/Target/NVPTX/CMakeLists.txt +++ b/llvm/lib/Target/NVPTX/CMakeLists.txt @@ -39,6 +39,7 @@ set(NVPTXCodeGen_sources NVVMReflect.cpp NVPTXProxyRegErasure.cpp NVPTXCtorDtorLowering.cpp + NVVMUpgradeAnnotations.cpp ) add_llvm_target(NVPTXCodeGen diff --git a/llvm/lib/Target/NVPTX/NVPTX.h b/llvm/lib/Target/NVPTX/NVPTX.h index ca915cd3f3732f..53418148be3615 100644 --- a/llvm/lib/Target/NVPTX/NVPTX.h +++ b/llvm/lib/Target/NVPTX/NVPTX.h @@ -52,6 +52,7 @@ FunctionPass *createNVPTXLowerUnreachablePass(bool TrapUnreachable, bool NoTrapAfterNoreturn); MachineFunctionPass *createNVPTXPeephole(); MachineFunctionPass *createNVPTXProxyRegErasurePass(); +ModulePass *createNVVMUpgradeAnnotationsPass(); struct NVVMIntrRangePass : PassInfoMixin { PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM); @@ -74,6 +75,10 @@ struct NVPTXCopyByValArgsPass : PassInfoMixin { PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM); }; +struct NVVMUpgradeAnnotationsPass : PassInfoMixin { + PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM); +}; + namespace NVPTX { enum DrvInterface { NVCL, diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp index a5c5e9420ee737..b4fd36625adc9c 100644 --- a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp @@ -254,6 +254,8 @@ void NVPTXTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) { PB.registerPipelineStartEPCallback( [this](ModulePassManager &PM, OptimizationLevel Level) { +PM.addPass(NVVMUpgradeAnnotationsPass()); + FunctionPassManager FPM; FPM.addPass(NVVMReflectPass(Subtarget.getSmVersion())); // Note: NVVMIntrRangePass was causing numerical discrepancies at one @@ -349,6 +351,8 @@ void NVPTXPassConfig::addIRPasses() { AAR.addAAResult(WrapperPass->getResult()); })); + addPass(createNVVMUpgradeAnnotationsPass()); + // NVVMReflectPass is added in addEarlyAsPossiblePasses, so hopefully running // it here does nothing. But since we need it for correctness when lowering // to NVPTX, run it here too, in case whoever built our pass pipeline didn't diff --git a/llvm/lib/Target/NVPTX/NVPTXUtilities.cpp b/llvm/lib/Target/NVPTX/NVPTXUtilities.cpp index 98bffd92a087b6..04e83576cbf958 100644 --- a/llvm/lib/Target/NVPTX/NVPTXUtilities.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXUtilities.cpp @@ -311,11 +311,16 @@ std::optional getMaxNReg(const Function &F) { } bool isKernelFunction(const Function &F) { + if (F.getCallingConv() == CallingConv::PTX_Kernel) +return true; + + if (F.hasFnAttribute("nvvm.kernel")) +return true; + if (const auto X = findOneNVVMAnnotation(&F, "kernel")) return (*X == 1); - // There is no NVVM metadata, check the calling convention - return F.getCallingConv() == CallingConv::PTX_Kernel; + return false; } MaybeAlign getAlign(const Function &F, unsigned Index) { diff --git a/llvm/lib/Target/NVPTX/NVVMUpgradeAnnotations.cpp b/llvm/lib/Target/NVPTX/NVVMUpgradeAnnotations.cpp new file mode 100644 index 00..ca550434835a2c --- /dev/null +++ b/llvm/lib/Target/NVPTX/NVVMUpgradeAnnotations.cpp @@ -0,0 +1,130 @@ +//===- NVVMUpgradeAnnotations.cpp - Upgrade NVVM Annotations --===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This pass replaces deprecated metadata in nvvm.annotation with a more modern +// IR representation. +// +//===--===// + +#include "NVPTX.h" +#include "llvm/ADT/SmallSet.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/ADT/
[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)
@@ -302,6 +299,19 @@ void NVPTXTargetCodeGenInfo::addNVVMMetadata( llvm::ConstantAsMetadata::get(GV), llvm::MDString::get(Ctx, Name), llvm::ConstantAsMetadata::get( llvm::ConstantInt::get(llvm::Type::getInt32Ty(Ctx), Operand))}; + // Append metadata to nvvm.annotations + MD->addOperand(llvm::MDNode::get(Ctx, MDVals)); +} + +void NVPTXTargetCodeGenInfo::addNVVMGridConstantMetadata( +llvm::GlobalValue *GV, const SmallVectorImpl &GridConstantArgs) { + llvm::Module *M = GV->getParent(); + llvm::LLVMContext &Ctx = M->getContext(); + + // Get "nvvm.annotations" metadata node + llvm::NamedMDNode *MD = M->getOrInsertNamedMetadata("nvvm.annotations"); AlexMaclean wrote: Yea, I completely agree, I think almost all the other nvvm.annotations can be converted as well. For this MR I want to lay down the framework for a couple and once that is setup it should be fairly trivial to convert all the others. Specifically for grid_constant can we just upgrade to the existing `readonly` parameter attribute? https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)
@@ -1270,77 +1270,21 @@ exit: ; MODULE: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind } ; MODULE: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree nounwind willreturn } ; MODULE: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) } -; MODULE: attributes #[[ATTR4]] = { "kernel" } -; MODULE: attributes #[[ATTR5]] = { nosync memory(none) } +; MODULE: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" } +; MODULE: attributes #[[ATTR5]] = { "kernel" } +; MODULE: attributes #[[ATTR6]] = { nosync memory(none) } ;. ; CGSCC: attributes #[[ATTR0]] = { "llvm.assume"="ompx_aligned_barrier" } ; CGSCC: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind } ; CGSCC: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree nounwind willreturn } ; CGSCC: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) } -; CGSCC: attributes #[[ATTR4]] = { "kernel" } -; CGSCC: attributes #[[ATTR5]] = { nosync memory(none) } +; CGSCC: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" } AlexMaclean wrote: Unfortunately, I think we do. "kernel" is really more like "OpenMP kernel" and the semantics for this do not seem to be a perfect match for "nvvm.kernel". For example, `@multiple_blocks_functions_non_kernel_effects_2` in this test has "kernel" but is not an nvvm kernel. I'm vary unfamiliar with the OpenMP semantics so I thought keeping it separate would be the safest approach, it also may be clearest to have a common "nvvm.*" prefix for all attributes currently represented as nvvm.annotations. https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)
@@ -5022,6 +5022,69 @@ bool llvm::UpgradeDebugInfo(Module &M) { return Modified; } +bool static upgradeSingleNVVMAnnotation(GlobalValue *GV, StringRef K, +const Metadata *V) { + if (K == "kernel") { +assert(mdconst::extract(V)->getZExtValue() == 1); +cast(GV)->addFnAttr("nvvm.kernel"); AlexMaclean wrote: Some annotations (such as "texture") are applied to global variables, not functions. I cannot unconditionally cast to a Function until confirming the annotation kind. https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)
https://github.com/AlexMaclean edited https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)
@@ -10,7 +10,7 @@ extern "C" __device__ void device_function() {} -// CHECK-LABEL: define{{.*}} void @global_function +// CHECK: define{{.*}} void @global_function{{.*}} #[[ATTR0:[0-9]+]] AlexMaclean wrote: Fixed https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)
@@ -5022,6 +5022,69 @@ bool llvm::UpgradeDebugInfo(Module &M) { return Modified; } +bool static upgradeSingleNVVMAnnotation(GlobalValue *GV, StringRef K, +const Metadata *V) { + if (K == "kernel") { +assert(mdconst::extract(V)->getZExtValue() == 1); +cast(GV)->addFnAttr("nvvm.kernel"); +return true; + } + if (K == "align") { +const uint64_t AlignBits = mdconst::extract(V)->getZExtValue(); +const unsigned Idx = (AlignBits >> 16); +const Align StackAlign = Align(AlignBits & 0x); +// TODO: Skip adding the stackalign attribute for returns, for now. +if (!Idx) + return false; +cast(GV)->addAttributeAtIndex( +Idx, Attribute::getWithStackAlignment(GV->getContext(), StackAlign)); +return true; + } + + return false; +} + +void llvm::UpgradeNVVMAnnotations(Module &M) { + NamedMDNode *NamedMD = M.getNamedMetadata("nvvm.annotations"); + if (!NamedMD) +return; + + SmallVector NewNodes; + SmallSet SeenNodes; + for (MDNode *MD : NamedMD->operands()) { +if (SeenNodes.contains(MD)) + continue; +SeenNodes.insert(MD); + +auto *F = mdconst::dyn_extract_or_null(MD->getOperand(0)); +if (!F) + continue; + +assert(MD && "Invalid MDNode for annotation"); +assert((MD->getNumOperands() % 2) == 1 && "Invalid number of operands"); + +SmallVector NewOperands; +// start index = 1, to skip the global variable key +// increment = 2, to skip the value for each property-value pairs AlexMaclean wrote: Fixed https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)
@@ -5022,6 +5022,69 @@ bool llvm::UpgradeDebugInfo(Module &M) { return Modified; } +bool static upgradeSingleNVVMAnnotation(GlobalValue *GV, StringRef K, +const Metadata *V) { + if (K == "kernel") { +assert(mdconst::extract(V)->getZExtValue() == 1); +cast(GV)->addFnAttr("nvvm.kernel"); +return true; + } + if (K == "align") { +const uint64_t AlignBits = mdconst::extract(V)->getZExtValue(); +const unsigned Idx = (AlignBits >> 16); +const Align StackAlign = Align(AlignBits & 0x); +// TODO: Skip adding the stackalign attribute for returns, for now. +if (!Idx) + return false; +cast(GV)->addAttributeAtIndex( +Idx, Attribute::getWithStackAlignment(GV->getContext(), StackAlign)); +return true; + } + + return false; +} + +void llvm::UpgradeNVVMAnnotations(Module &M) { + NamedMDNode *NamedMD = M.getNamedMetadata("nvvm.annotations"); + if (!NamedMD) +return; + + SmallVector NewNodes; + SmallSet SeenNodes; + for (MDNode *MD : NamedMD->operands()) { +if (SeenNodes.contains(MD)) + continue; +SeenNodes.insert(MD); + +auto *F = mdconst::dyn_extract_or_null(MD->getOperand(0)); +if (!F) + continue; + +assert(MD && "Invalid MDNode for annotation"); +assert((MD->getNumOperands() % 2) == 1 && "Invalid number of operands"); + +SmallVector NewOperands; +// start index = 1, to skip the global variable key +// increment = 2, to skip the value for each property-value pairs +for (unsigned j = 1, je = MD->getNumOperands(); j < je; j += 2) { + MDString *K = cast(MD->getOperand(j)); + const MDOperand &V = MD->getOperand(j + 1); + bool Upgraded = upgradeSingleNVVMAnnotation(F, K->getString(), V); + if (!Upgraded) +NewOperands.append({K, V}); +} + +if (!NewOperands.empty()) { + NewOperands.insert(NewOperands.begin(), MD->getOperand(0)); AlexMaclean wrote: Fixed https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)
@@ -5911,31 +5911,21 @@ bool llvm::omp::isOpenMPKernel(Function &Fn) { KernelSet llvm::omp::getDeviceKernels(Module &M) { // TODO: Create a more cross-platform way of determining device kernels. - NamedMDNode *MD = M.getNamedMetadata("nvvm.annotations"); KernelSet Kernels; - if (!MD) -return Kernels; - - for (auto *Op : MD->operands()) { -if (Op->getNumOperands() < 2) - continue; -MDString *KindID = dyn_cast(Op->getOperand(1)); -if (!KindID || KindID->getString() != "kernel") - continue; - -Function *KernelFn = -mdconst::dyn_extract_or_null(Op->getOperand(0)); -if (!KernelFn) - continue; - -// We are only interested in OpenMP target regions. Others, such as kernels -// generated by CUDA but linked together, are not interesting to this pass. -if (isOpenMPKernel(*KernelFn)) { - ++NumOpenMPTargetRegionKernels; - Kernels.insert(KernelFn); -} else - ++NumNonOpenMPTargetRegionKernels; + for (auto &F : M) { +// TODO: unify this check with isKernelFunction in NVPTXUtilities. +if (F.hasFnAttribute("nvvm.kernel")) { + AlexMaclean wrote: Fixed https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)
@@ -324,14 +326,17 @@ MaybeAlign getAlign(const Function &F, unsigned Index) { F.getAttributes().getAttributes(Index).getStackAlignment()) return StackAlign; - // If that is missing, check the legacy nvvm metadata - std::vector Vs; - bool retval = findAllNVVMAnnotation(&F, "align", Vs); - if (!retval) -return std::nullopt; - for (unsigned V : Vs) -if ((V >> 16) == Index) - return Align(V & 0x); + // check the legacy nvvm metadata only for the return value since llvm does + // not support stackalign attribute for this. + if (Index == 0) { +std::vector Vs; +bool retval = findAllNVVMAnnotation(&F, "align", Vs); +if (!retval) AlexMaclean wrote: Fixed https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)
@@ -5022,6 +5022,69 @@ bool llvm::UpgradeDebugInfo(Module &M) { return Modified; } +bool static upgradeSingleNVVMAnnotation(GlobalValue *GV, StringRef K, +const Metadata *V) { + if (K == "kernel") { +assert(mdconst::extract(V)->getZExtValue() == 1); AlexMaclean wrote: Sounds good, fixed https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Auto-Upgrade some nvvm.annotations to attributes (PR #119261)
https://github.com/AlexMaclean edited https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Auto-Upgrade some nvvm.annotations to attributes (PR #119261)
https://github.com/AlexMaclean edited https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Auto-Upgrade some nvvm.annotations to attributes (PR #119261)
@@ -1270,77 +1270,21 @@ exit: ; MODULE: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind } ; MODULE: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree nounwind willreturn } ; MODULE: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) } -; MODULE: attributes #[[ATTR4]] = { "kernel" } -; MODULE: attributes #[[ATTR5]] = { nosync memory(none) } +; MODULE: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" } +; MODULE: attributes #[[ATTR5]] = { "kernel" } +; MODULE: attributes #[[ATTR6]] = { nosync memory(none) } ;. ; CGSCC: attributes #[[ATTR0]] = { "llvm.assume"="ompx_aligned_barrier" } ; CGSCC: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind } ; CGSCC: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree nounwind willreturn } ; CGSCC: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) } -; CGSCC: attributes #[[ATTR4]] = { "kernel" } -; CGSCC: attributes #[[ATTR5]] = { nosync memory(none) } +; CGSCC: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" } AlexMaclean wrote: There is a `ptx_kernel` calling convention which is an alternative to `nvvm.annoations` `!"kernel"` already. However, I don't think we can safely auto-upgrade to this in all cases, in the openMP example @jhuber6 provided above the function has both `amdgpu_kernel` and `"nvvm.kernel"` which would not be possible with `ptx_kernel` CC. Is there any way around this? if not an attribute seems like the only option. > The metadata use useful if we have cases where we really want fast lookup of > all the kernels in the TU. I don't think there are any cases where we do this, there isn't even a function to traverse the metadata and find all the kernels (that I know of). It's far more important to be able to quickly check if a function is a kernel, which the metadata solution is fairly slow for (there is a cache hacked on to try to mitigate this but that has other issues). In addition metadata should not be used to carry semantic information like this. https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Auto-Upgrade some nvvm.annotations to attributes (PR #119261)
@@ -1270,77 +1270,21 @@ exit: ; MODULE: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind } ; MODULE: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree nounwind willreturn } ; MODULE: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) } -; MODULE: attributes #[[ATTR4]] = { "kernel" } -; MODULE: attributes #[[ATTR5]] = { nosync memory(none) } +; MODULE: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" } +; MODULE: attributes #[[ATTR5]] = { "kernel" } +; MODULE: attributes #[[ATTR6]] = { nosync memory(none) } ;. ; CGSCC: attributes #[[ATTR0]] = { "llvm.assume"="ompx_aligned_barrier" } ; CGSCC: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind } ; CGSCC: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree nounwind willreturn } ; CGSCC: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) } -; CGSCC: attributes #[[ATTR4]] = { "kernel" } -; CGSCC: attributes #[[ATTR5]] = { nosync memory(none) } +; CGSCC: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" } AlexMaclean wrote: I agree that `"omp_kernel"` seems like a much better name for the meaning we're currently signifying with the `"kernel"` attribute. > Realistically this should be a calling convention and not an attribute, but > there's a lot of historical cruft around it. @jhuber6 are you saying that the `"kernel"` attribute should be a calling convention? or that `"nvvm.kernel"` should be (similar to `amdgpu_kernel`)? https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)
https://github.com/AlexMaclean edited https://github.com/llvm/llvm-project/pull/120806 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)
https://github.com/AlexMaclean ready_for_review https://github.com/llvm/llvm-project/pull/120806 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)
https://github.com/AlexMaclean edited https://github.com/llvm/llvm-project/pull/120806 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Auto-Upgrade some nvvm.annotations to attributes (PR #119261)
@@ -1270,77 +1270,21 @@ exit: ; MODULE: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind } ; MODULE: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree nounwind willreturn } ; MODULE: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) } -; MODULE: attributes #[[ATTR4]] = { "kernel" } -; MODULE: attributes #[[ATTR5]] = { nosync memory(none) } +; MODULE: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" } +; MODULE: attributes #[[ATTR5]] = { "kernel" } +; MODULE: attributes #[[ATTR6]] = { nosync memory(none) } ;. ; CGSCC: attributes #[[ATTR0]] = { "llvm.assume"="ompx_aligned_barrier" } ; CGSCC: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind } ; CGSCC: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree nounwind willreturn } ; CGSCC: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) } -; CGSCC: attributes #[[ATTR4]] = { "kernel" } -; CGSCC: attributes #[[ATTR5]] = { nosync memory(none) } +; CGSCC: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" } AlexMaclean wrote: Okay, fair enough. I'll start switch us over to a calling convention in https://github.com/llvm/llvm-project/pull/120806 https://github.com/llvm/llvm-project/pull/119261 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)
https://github.com/AlexMaclean edited https://github.com/llvm/llvm-project/pull/120806 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [OpenMP] Replace nvvm.annotation usage with kernel calling conventions (PR #122320)
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/122320 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)
@@ -556,19 +556,16 @@ llvm.func @kernel_func() attributes {nvvm.kernel} { llvm.return } -// CHECK: !nvvm.annotations = -// CHECK-NOT: {ptr @nvvm_special_regs, !"kernel", i32 1} -// CHECK: {ptr @kernel_func, !"kernel", i32 1} +// CHECK: ptx_kernel void @kernel_func AlexMaclean wrote: This change does not remove support for specifying a kernel via the metadata. It simply updates frontends and tests to use a different one of the two already supported methods for marking kernels. Long term I hope to remove the support for metadata, so downstream users should move the calling-convention, but this change does not yet force that. https://github.com/llvm/llvm-project/pull/120806 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)
AlexMaclean wrote: > In MLIR, we also have other NVVM metadata such as `reqntid` and `maxntid`, > among others. What is the plan for these? Will they remain as metadata, or > will they be expressed differently? Eventually, I hope to migrate all !nvvm.annotations, including `reqntid` and `maxntid`, to a more modern mechanism such as attributes, or at least metadata attached directly to the function/GV. !nvvm.annotations was added around llvm 3 when target-specific attributes were not yet present. > Could you please elaborate on the compile-time improvements? Auto-upgrading kernel metadata and no longer traversing !nvvm.annotations lead to around a 2% improvement in compile time for several cases in nvcc. This change alone won't have the same impact, since we still traverse the metadata for functions that do not have the `ptx_kernel` cc but it at least lets up bail out early some of the time and lays the foundation for bigger improvements. https://github.com/llvm/llvm-project/pull/120806 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [OpenMP] Replace nvvm.annotation usage with kernel calling conventions (PR #122320)
https://github.com/AlexMaclean edited https://github.com/llvm/llvm-project/pull/122320 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [OpenMP] Replace nvvm.annotation usage with kernel calling conventions (PR #122320)
AlexMaclean wrote: @jdoerfert / @arsenm ping for review when you have a moment https://github.com/llvm/llvm-project/pull/122320 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)
@@ -227,14 +228,14 @@ class NVVMDialectLLVMIRTranslationInterface } else if (attribute.getName() == AlexMaclean wrote: Yes, I plan to replace all !nvvm.annotations with attributes. This change is already fairly large and I would prefer to avoid a single monolithic PR to make debugging any issues easier and to prevent unnecessary churn if it needs to be reverted. Would it be alright to address these now and the others in separate follow ups? https://github.com/llvm/llvm-project/pull/125908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)
https://github.com/AlexMaclean approved this pull request. LGTM, please wait for @Artem-B's approval before landing. https://github.com/llvm/llvm-project/pull/134416 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 (PR #134111)
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/134111 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)
@@ -1548,6 +1548,45 @@ let TargetPrefix = "nvvm" in { Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>; def int_nvvm_e5m2x2_to_f16x2_rn_relu : ClangBuiltin<"__nvvm_e5m2x2_to_f16x2_rn_relu">, Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>; + + def int_nvvm_ff_to_e2m3x2_rn : ClangBuiltin<"__nvvm_ff_to_e2m3x2_rn">, + Intrinsic<[llvm_i16_ty], [llvm_float_ty, llvm_float_ty], [IntrNoMem, IntrNoCallback]>; + def int_nvvm_ff_to_e2m3x2_rn_relu : ClangBuiltin<"__nvvm_ff_to_e2m3x2_rn_relu">, AlexMaclean wrote: It seems like `f32x2` would be a clearer name than `ff` for these. This would also be more consistent with the affix used for f16. https://github.com/llvm/llvm-project/pull/134345 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)
@@ -1548,6 +1548,45 @@ let TargetPrefix = "nvvm" in { Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>; def int_nvvm_e5m2x2_to_f16x2_rn_relu : ClangBuiltin<"__nvvm_e5m2x2_to_f16x2_rn_relu">, Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>; + + def int_nvvm_ff_to_e2m3x2_rn : ClangBuiltin<"__nvvm_ff_to_e2m3x2_rn">, AlexMaclean wrote: It looks like there is a lot of copy/paste boilerplate here that can be folded away with a few foreach loops or multi-classes. https://github.com/llvm/llvm-project/pull/134345 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)
@@ -1944,6 +1944,62 @@ def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn Int16Regs:$a), def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn_relu Int16Regs:$a), (CVT_f16x2_e5m2x2 $a, CvtRN_RELU)>; +def : Pat<(int_nvvm_ff_to_e2m3x2_rn f32:$a, f32:$b), + (CVT_e2m3x2_f32 $a, $b, CvtRN)>, + Requires<[hasPTX<86>, hasSM<100>, hasArchAccelFeatures]>; +def : Pat<(int_nvvm_ff_to_e2m3x2_rn_relu f32:$a, f32:$b), + (CVT_e2m3x2_f32 $a, $b, CvtRN_RELU)>, + Requires<[hasPTX<86>, hasSM<100>, hasArchAccelFeatures]>; +def : Pat<(int_nvvm_ff_to_e3m2x2_rn f32:$a, f32:$b), + (CVT_e3m2x2_f32 $a, $b, CvtRN)>, + Requires<[hasPTX<86>, hasSM<100>, hasArchAccelFeatures]>; +def : Pat<(int_nvvm_ff_to_e3m2x2_rn_relu f32:$a, f32:$b), + (CVT_e3m2x2_f32 $a, $b, CvtRN_RELU)>, + Requires<[hasPTX<86>, hasSM<100>, hasArchAccelFeatures]>; + +def : Pat<(int_nvvm_e2m3x2_to_f16x2_rn Int16Regs:$a), AlexMaclean wrote: Instead of using a Register class in the input pattern, use whatever type we expect this to be, in this case `i16` I assume. https://github.com/llvm/llvm-project/pull/134345 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)
@@ -703,6 +703,53 @@ let hasSideEffects = false in { defm CVT_to_tf32_rz_satf : CVT_TO_TF32<"rz.satfinite", [hasPTX<86>, hasSM<100>]>; defm CVT_to_tf32_rn_relu_satf : CVT_TO_TF32<"rn.relu.satfinite", [hasPTX<86>, hasSM<100>]>; defm CVT_to_tf32_rz_relu_satf : CVT_TO_TF32<"rz.relu.satfinite", [hasPTX<86>, hasSM<100>]>; + + // FP6 conversions. + multiclass CVT_TO_F6X2 { AlexMaclean wrote: Since this only has a single string parameter and is called twice, I think it would be a bit clearer to simply use a foreach loop here (and for the below cases as well). https://github.com/llvm/llvm-project/pull/134345 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 (PR #134111)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/134111 >From 46de785e801bf8ca87e01aee9ad0a13ac07a47d6 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Tue, 1 Apr 2025 20:22:24 + Subject: [PATCH] [NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 --- clang/lib/CodeGen/TargetBuiltins/NVPTX.cpp| 18 ++- clang/test/CodeGen/builtins-nvptx.c | 4 +- llvm/include/llvm/IR/IntrinsicsNVVM.td| 10 +--- .../include/llvm/Target/TargetSelectionDAG.td | 2 + llvm/lib/IR/AutoUpgrade.cpp | 9 llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp | 15 -- llvm/lib/Target/NVPTX/NVPTXIntrinsics.td | 4 +- .../Target/NVPTX/NVPTXTargetTransformInfo.cpp | 52 +-- .../Assembler/auto_upgrade_nvvm_intrinsics.ll | 16 +- llvm/test/CodeGen/NVPTX/atomics.ll| 36 - 10 files changed, 107 insertions(+), 59 deletions(-) diff --git a/clang/lib/CodeGen/TargetBuiltins/NVPTX.cpp b/clang/lib/CodeGen/TargetBuiltins/NVPTX.cpp index aaac19b229905..0f7ab9fd3b099 100644 --- a/clang/lib/CodeGen/TargetBuiltins/NVPTX.cpp +++ b/clang/lib/CodeGen/TargetBuiltins/NVPTX.cpp @@ -481,21 +481,11 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned BuiltinID, AtomicOrdering::SequentiallyConsistent); } - case NVPTX::BI__nvvm_atom_inc_gen_ui: { -Value *Ptr = EmitScalarExpr(E->getArg(0)); -Value *Val = EmitScalarExpr(E->getArg(1)); -Function *FnALI32 = -CGM.getIntrinsic(Intrinsic::nvvm_atomic_load_inc_32, Ptr->getType()); -return Builder.CreateCall(FnALI32, {Ptr, Val}); - } + case NVPTX::BI__nvvm_atom_inc_gen_ui: +return MakeBinaryAtomicValue(*this, llvm::AtomicRMWInst::UIncWrap, E); - case NVPTX::BI__nvvm_atom_dec_gen_ui: { -Value *Ptr = EmitScalarExpr(E->getArg(0)); -Value *Val = EmitScalarExpr(E->getArg(1)); -Function *FnALD32 = -CGM.getIntrinsic(Intrinsic::nvvm_atomic_load_dec_32, Ptr->getType()); -return Builder.CreateCall(FnALD32, {Ptr, Val}); - } + case NVPTX::BI__nvvm_atom_dec_gen_ui: +return MakeBinaryAtomicValue(*this, llvm::AtomicRMWInst::UDecWrap, E); case NVPTX::BI__nvvm_ldg_c: case NVPTX::BI__nvvm_ldg_sc: diff --git a/clang/test/CodeGen/builtins-nvptx.c b/clang/test/CodeGen/builtins-nvptx.c index ffa41c85c2734..71b29849618b6 100644 --- a/clang/test/CodeGen/builtins-nvptx.c +++ b/clang/test/CodeGen/builtins-nvptx.c @@ -333,10 +333,10 @@ __device__ void nvvm_atom(float *fp, float f, double *dfp, double df, // CHECK: atomicrmw fadd ptr {{.*}} seq_cst, align 4 __nvvm_atom_add_gen_f(fp, f); - // CHECK: call i32 @llvm.nvvm.atomic.load.inc.32.p0 + // CHECK: atomicrmw uinc_wrap ptr {{.*}} seq_cst, align 4 __nvvm_atom_inc_gen_ui(uip, ui); - // CHECK: call i32 @llvm.nvvm.atomic.load.dec.32.p0 + // CHECK: atomicrmw udec_wrap ptr {{.*}} seq_cst, align 4 __nvvm_atom_dec_gen_ui(uip, ui); diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td index 3e9588a515c9e..4aeb1d8a2779e 100644 --- a/llvm/include/llvm/IR/IntrinsicsNVVM.td +++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td @@ -124,6 +124,8 @@ // * llvm.nvvm.ldg.global.f--> ibid. // * llvm.nvvm.ldg.global.p--> ibid. // * llvm.nvvm.swap.lo.hi.b64 --> llvm.fshl(x, x, 32) +// * llvm.nvvm.atomic.load.inc.32 --> atomicrmw uinc_wrap +// * llvm.nvvm.atomic.load.dec.32 --> atomicrmw udec_wrap def llvm_global_ptr_ty : LLVMQualPointerType<1>; // (global)ptr def llvm_shared_ptr_ty : LLVMQualPointerType<3>; // (shared)ptr @@ -1633,14 +1635,6 @@ let TargetPrefix = "nvvm" in { DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty, llvm_i32_ty], [IntrNoMem]>; -// Atomics not available as llvm intrinsics. - def int_nvvm_atomic_load_inc_32 : Intrinsic<[llvm_i32_ty], - [llvm_anyptr_ty, llvm_i32_ty], - [IntrArgMemOnly, IntrNoCallback, NoCapture>]>; - def int_nvvm_atomic_load_dec_32 : Intrinsic<[llvm_i32_ty], - [llvm_anyptr_ty, llvm_i32_ty], - [IntrArgMemOnly, IntrNoCallback, NoCapture>]>; - class SCOPED_ATOMIC2_impl : Intrinsic<[elty], [llvm_anyptr_ty, LLVMMatchType<0>], diff --git a/llvm/include/llvm/Target/TargetSelectionDAG.td b/llvm/include/llvm/Target/TargetSelectionDAG.td index 42a5fbec95174..9c241b6c4df0f 100644 --- a/llvm/include/llvm/Target/TargetSelectionDAG.td +++ b/llvm/include/llvm/Target/TargetSelectionDAG.td @@ -1825,6 +1825,8 @@ defm atomic_load_min : binary_atomic_op; defm atomic_load_max : binary_atomic_op; defm atomic_load_umin : binary_atomic_op; defm atomic_load_umax : binary_atomic_op; +defm atomic_load_uinc_wrap : binary_atomic_op; +defm atomic_load_udec_wrap : binary_atomic_op; defm atomic_cmp_swap : ternary_atomic_op; /// Atomic load which zeroes the excess high bits. diff --git
[clang] [llvm] [NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 (PR #134111)
@@ -2314,6 +2317,12 @@ static Value *upgradeNVVMIntrinsicCall(StringRef Name, CallBase *CI, Value *Val = CI->getArgOperand(1); Rep = Builder.CreateAtomicRMW(AtomicRMWInst::FAdd, Ptr, Val, MaybeAlign(), AtomicOrdering::SequentiallyConsistent); + } else if (Name.consume_front("atomic.load.") && Name.consume_back(".32")) { +Value *Ptr = CI->getArgOperand(0); +Value *Val = CI->getArgOperand(1); +auto Op = Name == "inc" ? AtomicRMWInst::UIncWrap : AtomicRMWInst::UDecWrap; +Rep = Builder.CreateAtomicRMW(Op, Ptr, Val, MaybeAlign(), + AtomicOrdering::SequentiallyConsistent); AlexMaclean wrote: Okay, sounds like there is a larger issue to address around the scope and semantics of atomics in NVPTX. This change maintains consistency with all other `atomicrmw` instructions and I think the larger bug can be addressed separately. https://github.com/llvm/llvm-project/pull/134111 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Clang][NVVM] Support `-f[no-]cuda-prec-sqrt` and propagate precision flag to `NVVMReflect` (PR #134244)
AlexMaclean wrote: It seems like we already have perhaps too many mechanisms to control how sqrt gets lowered. There is the `__nv_sqrtf` libdevice function which chooses between specific (1:1 to PTX) intrinsics based on NVVMReflect and then there is also `llvm.sqrt` and `nvvm.sqrt.f` which are lowered and optimized based on command-line options and function and instruction level flags, each in its own way. I think for more fine grained responsiveness to instruction and function level options it makes sense to use the existing intrinsics. While, it is consistent with the existing design to treat NVVMReflect as operating globally across the entire module. I'm not sure it makes sense to introduce a new module flag and clang cl opt though... I personally agree with @Artem-B that `__nv_sqrtf`+NVVMReflect may not be the way to go. Using one of the intrinsics seems like a better approach but I may be missing something. https://github.com/llvm/llvm-project/pull/134244 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/135644 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
https://github.com/AlexMaclean edited https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -0,0 +1,48 @@ +; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_80 | FileCheck %s -check-prefixes=ALL,NOPTRCONV,CLS64 AlexMaclean wrote: Use update_llc_test_checks for this test. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -2381,29 +2387,41 @@ def INT_PTX_LDG_G_v4i32_ELE : VLDG_G_ELE_V4<"u32", Int32Regs>; def INT_PTX_LDG_G_v4f32_ELE : VLDG_G_ELE_V4<"f32", Float32Regs>; -multiclass NG_TO_G { +multiclass NG_TO_G Preds = []> { def "" : NVPTXInst<(outs Int32Regs:$result), (ins Int32Regs:$src), - "cvta." # Str # ".u32 \t$result, $src;", []>; + "cvta." # Str # ".u32 \t$result, $src;", []>, Requires; + def _64 : NVPTXInst<(outs Int64Regs:$result), (ins Int64Regs:$src), + "cvta." # Str # ".u64 \t$result, $src;", []>, Requires; +} + +multiclass NG_TO_G_64 Preds = []> { def _64 : NVPTXInst<(outs Int64Regs:$result), (ins Int64Regs:$src), - "cvta." # Str # ".u64 \t$result, $src;", []>; + "cvta." # Str # ".u64 \t$result, $src;", []>, Requires; } AlexMaclean wrote: I think it would be cleaner to just add a bit to the `NG_TO_G` class for `supports_32` https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -3019,8 +3019,42 @@ SDValue NVPTXTargetLowering::LowerADDRSPACECAST(SDValue Op, unsigned SrcAS = N->getSrcAddressSpace(); unsigned DestAS = N->getDestAddressSpace(); if (SrcAS != llvm::ADDRESS_SPACE_GENERIC && - DestAS != llvm::ADDRESS_SPACE_GENERIC) + DestAS != llvm::ADDRESS_SPACE_GENERIC) { +// Shared and SharedCluster can be converted to each other through generic +// space +if (SrcAS == llvm::ADDRESS_SPACE_SHARED && AlexMaclean wrote: This `if` and the one below look essentially duplicated. Can you fold them together? https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
https://github.com/AlexMaclean commented: Getting close to ready, a couple more places to update: - NVPTXTargetTransformInfo.cpp: evaluateIsSpace - NVPTXUsage.rst: Address Space section, add intrinsics you're modifying, such as `mapa`, to the spec https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -0,0 +1,329 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc < %s -o - -mcpu=sm_90 -mattr=+ptx78 | FileCheck %s +; RUN: %if ptxas-12.0 %{ llc < %s -mcpu=sm_90 -mattr=+ptx78| %ptxas-verify -arch=sm_90 %} + +target triple = "nvptx64-nvidia-cuda" + +@llvm.used = appending global [5 x ptr] [ + ptr @test_distributed_shared_cluster_common, AlexMaclean wrote: our lit tests generally don't use `@llvm.used`, can you remove this? https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
https://github.com/AlexMaclean approved this pull request. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
https://github.com/AlexMaclean commented: llvm changes LGTM, though I'm not too familiar with the MLIR portion of this change. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)
@@ -703,6 +703,46 @@ let hasSideEffects = false in { defm CVT_to_tf32_rz_satf : CVT_TO_TF32<"rz.satfinite", [hasPTX<86>, hasSM<100>]>; defm CVT_to_tf32_rn_relu_satf : CVT_TO_TF32<"rn.relu.satfinite", [hasPTX<86>, hasSM<100>]>; defm CVT_to_tf32_rz_relu_satf : CVT_TO_TF32<"rz.relu.satfinite", [hasPTX<86>, hasSM<100>]>; + + // FP6 conversions. + class CVT_f32_to_f6x2 +: NVPTXInst<(outs Int16Regs:$dst), +(ins Float32Regs:$src1, Float32Regs:$src2, CvtMode:$mode), +!strconcat("cvt${mode:base}.satfinite${mode:relu}.", AlexMaclean wrote: Nit: for simple cases like this, `#` is preferable over `!strconcat` https://github.com/llvm/llvm-project/pull/134345 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)
https://github.com/AlexMaclean edited https://github.com/llvm/llvm-project/pull/134345 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)
AlexMaclean wrote: Merging on behalf of @YonahGoldberg at his request offline. https://github.com/llvm/llvm-project/pull/134416 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/134416 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)
@@ -1548,6 +1548,45 @@ let TargetPrefix = "nvvm" in { Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>; def int_nvvm_e5m2x2_to_f16x2_rn_relu : ClangBuiltin<"__nvvm_e5m2x2_to_f16x2_rn_relu">, Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>; + + def int_nvvm_ff_to_e2m3x2_rn : ClangBuiltin<"__nvvm_ff_to_e2m3x2_rn">, + Intrinsic<[llvm_i16_ty], [llvm_float_ty, llvm_float_ty], [IntrNoMem, IntrNoCallback]>; AlexMaclean wrote: Can all these intrinisics be made DefaultAttrsIntrinsics? https://github.com/llvm/llvm-project/pull/134345 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) { case ADDRESS_SPACE_SHARED: Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared; break; -case ADDRESS_SPACE_DSHARED: - Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 : NVPTX::cvta_dshared; +case ADDRESS_SPACE_SHARED_CLUSTER: + Opc = TM.is64Bit() ? NVPTX::cvta_shared_cluster_64 + : NVPTX::cvta_shared_cluster; AlexMaclean wrote: My understanding is that cluster is not supported until sm_90, and that sm_90+ do not support 32bit compilation. Is there something I'm missing? If not we should never select the 32-bit version here and instead check to ensure we're compiling for sm_90+. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits