[clang] [llvm] [NVPTX] Support inline asm with 128-bit operand in NVPTX backend (PR #97113)

2024-07-01 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean closed 
https://github.com/llvm/llvm-project/pull/97113
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)

2024-07-05 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/97762

>From c2913d1074c5bfa771379d68e9ba728a3d1d1ce5 Mon Sep 17 00:00:00 2001
From: Alex MacLean 
Date: Mon, 1 Jul 2024 17:06:56 +
Subject: [PATCH 1/4] [ValueTracking] use KnownBits to compute fpclass from
 bitcast

---
 llvm/lib/Analysis/ValueTracking.cpp  |  30 ++
 llvm/test/Transforms/Attributor/nofpclass.ll | 104 +++
 2 files changed, 134 insertions(+)

diff --git a/llvm/lib/Analysis/ValueTracking.cpp 
b/llvm/lib/Analysis/ValueTracking.cpp
index 85abf00774a02..a16c8e3d48403 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -5805,6 +5805,36 @@ void computeKnownFPClass(const Value *V, const APInt 
&DemandedElts,
 
 break;
   }
+  case Instruction::BitCast: {
+const Type *Ty = Op->getType();
+const Value *Casted = Op->getOperand(0);
+if (Ty->isVectorTy() || !Casted->getType()->isIntOrIntVectorTy())
+  break;
+
+KnownBits Bits(Ty->getScalarSizeInBits());
+computeKnownBits(Casted, Bits, Depth + 1, Q);
+
+// Transfer information from the sign bit.
+if (Bits.Zero.isSignBitSet())
+  Known.signBitMustBeZero();
+else if (Bits.One.isSignBitSet())
+  Known.signBitMustBeOne();
+
+if (Ty->isIEEE()) {
+  // IEEE floats are NaN when all bits of the exponent plus at least one of
+  // the fraction bits are 1. This means:
+  //   - If we assume unknown bits are 0 and the value is NaN, it will
+  // always be NaN
+  //   - If we assume unknown bits are 1 and the value is not NaN, it can
+  // never be NaN
+  if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN())
+Known.KnownFPClasses = fcNan;
+  else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN())
+Known.knownNot(fcNan);
+}
+
+break;
+  }
   default:
 break;
   }
diff --git a/llvm/test/Transforms/Attributor/nofpclass.ll 
b/llvm/test/Transforms/Attributor/nofpclass.ll
index 781ba636c3ab3..c5d562a436b33 100644
--- a/llvm/test/Transforms/Attributor/nofpclass.ll
+++ b/llvm/test/Transforms/Attributor/nofpclass.ll
@@ -2690,6 +2690,110 @@ entry:
   ret double %abs
 }
 
+define float @bitcast_to_float_sign_0(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float 
@bitcast_to_float_sign_0
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 1
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = lshr i32 %arg, 1
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_nnan(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float 
@bitcast_to_float_nnan
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 2
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = lshr i32 %arg, 2
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_sign_1(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) float 
@bitcast_to_float_sign_1
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], -2147483648
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = or i32 %arg, -2147483648
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_nan(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(inf zero sub norm) float @bitcast_to_float_nan
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], 2139095041
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = or i32 %arg, 2139095041
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define double @bitcast_to_double_sign_0(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) double 
@bitcast_to_double_sign_0
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = lshr i64 [[ARG]], 1
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i64 [[TMP1]] to double
+; CHECK-NEXT:ret double [[TMP2]]
+;
+  %1 = lshr i64 %arg, 1
+  %2 = bitcast i64 %1 to double
+  ret double %2
+}
+
+define double @bitcast_to_double_nnan(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define no

[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)

2024-07-06 Thread Alex MacLean via cfe-commits

AlexMaclean wrote:

> Can you add some tests to demonstrate that this patch will enable more 
> optimizations in some real-world applications?

I can extend the existing test cases to make them more elaborate/real-looking, 
but I'm guessing that would not qualify as "real-world". This patch is 
motivated by an internal benchmark where there were some cases where this 
helped, though even that case is in some sense artificial. Is this a necessary 
criteria for landing this change? I believe we already handle float to int in 
KnownBits and adding the inverse in KnownFPClass seems like a correct and 
reasonable extension of the logic, even if there are not many cases where it is 
used.

https://github.com/llvm/llvm-project/pull/97762
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Remove nvvm.bitcast.* intrinsics (PR #107936)

2024-09-09 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean created 
https://github.com/llvm/llvm-project/pull/107936

Remove the following intrinsics which correspond directly to a bitcast:
- llvm.nvvm.bitcast.f2i
- llvm.nvvm.bitcast.i2f
- llvm.nvvm.bitcast.d2ll
- llvm.nvvm.bitcast.ll2d

>From ff978f81e0eedbc5e7547acabe414f2f1b0fd31a Mon Sep 17 00:00:00 2001
From: Alex MacLean 
Date: Fri, 6 Sep 2024 18:35:20 +
Subject: [PATCH] [NVPTX] Remove nvvm.bitcast.* intrinsics

---
 clang/include/clang/Basic/BuiltinsNVPTX.def   |  8 
 llvm/include/llvm/IR/IntrinsicsNVVM.td| 18 -
 llvm/lib/IR/AutoUpgrade.cpp   |  8 
 llvm/lib/Target/NVPTX/NVPTXIntrinsics.td  | 14 -
 .../Assembler/auto_upgrade_nvvm_intrinsics.ll | 20 +++
 5 files changed, 32 insertions(+), 36 deletions(-)

diff --git a/clang/include/clang/Basic/BuiltinsNVPTX.def 
b/clang/include/clang/Basic/BuiltinsNVPTX.def
index 20f038a0a9bbde..6fff562165080a 100644
--- a/clang/include/clang/Basic/BuiltinsNVPTX.def
+++ b/clang/include/clang/Basic/BuiltinsNVPTX.def
@@ -599,14 +599,6 @@ TARGET_BUILTIN(__nvvm_e4m3x2_to_f16x2_rn_relu, "V2hs", "", 
AND(SM_89,PTX81))
 TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn, "V2hs", "", AND(SM_89,PTX81))
 TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn_relu, "V2hs", "", AND(SM_89,PTX81))
 
-// Bitcast
-
-BUILTIN(__nvvm_bitcast_f2i, "if", "")
-BUILTIN(__nvvm_bitcast_i2f, "fi", "")
-
-BUILTIN(__nvvm_bitcast_ll2d, "dLLi", "")
-BUILTIN(__nvvm_bitcast_d2ll, "LLid", "")
-
 // FNS
 TARGET_BUILTIN(__nvvm_fns, "UiUiUii", "n", PTX60)
 
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td 
b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 39685c920d948d..737dd6092e2183 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -30,6 +30,10 @@
 //   * llvm.nvvm.max.ui  --> select(x ule y, x, y)
 //   * llvm.nvvm.max.ull --> ibid.
 //   * llvm.nvvm.h2f --> llvm.convert.to.fp16.f32
+//   * llvm.nvvm.bitcast.f2i  --> bitcast
+//   * llvm.nvvm.bitcast.i2f  --> ibid.
+//   * llvm.nvvm.bitcast.d2ll --> ibid.
+//   * llvm.nvvm.bitcast.ll2d --> ibid.
 
 def llvm_global_ptr_ty  : LLVMQualPointerType<1>;  // (global)ptr
 def llvm_shared_ptr_ty  : LLVMQualPointerType<3>;  // (shared)ptr
@@ -1339,20 +1343,6 @@ let TargetPrefix = "nvvm" in {
   def int_nvvm_e5m2x2_to_f16x2_rn_relu : 
ClangBuiltin<"__nvvm_e5m2x2_to_f16x2_rn_relu">,
   Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>;
 
-//
-// Bitcast
-//
-
-  def int_nvvm_bitcast_f2i : ClangBuiltin<"__nvvm_bitcast_f2i">,
-  DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_float_ty], [IntrNoMem, 
IntrSpeculatable]>;
-  def int_nvvm_bitcast_i2f : ClangBuiltin<"__nvvm_bitcast_i2f">,
-  DefaultAttrsIntrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrNoMem, 
IntrSpeculatable]>;
-
-  def int_nvvm_bitcast_ll2d : ClangBuiltin<"__nvvm_bitcast_ll2d">,
-  DefaultAttrsIntrinsic<[llvm_double_ty], [llvm_i64_ty], [IntrNoMem, 
IntrSpeculatable]>;
-  def int_nvvm_bitcast_d2ll : ClangBuiltin<"__nvvm_bitcast_d2ll">,
-  DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_double_ty], [IntrNoMem, 
IntrSpeculatable]>;
-
 // FNS
 
   def int_nvvm_fns : ClangBuiltin<"__nvvm_fns">,
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 69dae5e32dbbe8..02d1d9d9f78984 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -1268,6 +1268,10 @@ static bool upgradeIntrinsicFunction1(Function *F, 
Function *&NewFn,
   else if (Name.consume_front("atomic.load.add."))
 // nvvm.atomic.load.add.{f32.p,f64.p}
 Expand = Name.starts_with("f32.p") || Name.starts_with("f64.p");
+  else if (Name.consume_front("bitcast."))
+// nvvm.bitcast.{f2i,i2f,ll2d,d2ll}
+Expand =
+Name == "f2i" || Name == "i2f" || Name == "ll2d" || Name == "d2ll";
   else
 Expand = false;
 
@@ -4258,6 +4262,10 @@ void llvm::UpgradeIntrinsicCall(CallBase *CI, Function 
*NewFn) {
F->getParent(), 
Intrinsic::convert_from_fp16,
{Builder.getFloatTy()}),
CI->getArgOperand(0), "h2f");
+  } else if (Name.consume_front("bitcast.") &&
+ (Name == "f2i" || Name == "i2f" || Name == "ll2d" ||
+  Name == "d2ll")) {
+Rep = Builder.CreateBitCast(CI->getArgOperand(0), CI->getType());
   } else {
 Intrinsic::ID IID = shouldUpgradeNVPTXBF16Intrinsic(Name);
 if (IID != Intrinsic::not_intrinsic &&
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td 
b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 0c883093dd0a54..5c2ef4fa417ac1 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -1561,20 +1561,6 @@ def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn Int16Regs:$a),
 def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn_relu Int16Regs:$a),
   (CVT_f16x2_e5m2x2 Int16Regs:$a, CvtRN_RE

[clang] [llvm] [NVPTX] Remove nvvm.bitcast.* intrinsics (PR #107936)

2024-09-09 Thread Alex MacLean via cfe-commits

AlexMaclean wrote:

> It may be worth adding a note about this in the release notes.

I'm not familiar with these, can you point me to an analogous change I could 
use as an example?

https://github.com/llvm/llvm-project/pull/107936
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)

2024-06-05 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/94422

>From 708374e03f1bf70006f2472f19edad1bd621e2d6 Mon Sep 17 00:00:00 2001
From: Alex MacLean 
Date: Mon, 3 Jun 2024 16:46:36 +
Subject: [PATCH] [NVPTX] Revamp NVVMIntrRange pass

---
 clang/test/CodeGenCUDA/cuda-builtin-vars.cu  |  24 +--
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp|  32 ++--
 llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp |   6 +-
 llvm/lib/Target/NVPTX/NVPTXUtilities.cpp |  58 --
 llvm/lib/Target/NVPTX/NVPTXUtilities.h   |  16 +-
 llvm/lib/Target/NVPTX/NVVMIntrRange.cpp  | 177 ++-
 llvm/test/CodeGen/NVPTX/intr-range.ll|  60 +++
 llvm/test/CodeGen/NVPTX/intrinsic-old.ll |  43 ++---
 8 files changed, 249 insertions(+), 167 deletions(-)
 create mode 100644 llvm/test/CodeGen/NVPTX/intr-range.ll

diff --git a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu 
b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu
index ba5e5f13ebe70..dba0a76af21dd 100644
--- a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu
+++ b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu
@@ -6,21 +6,21 @@
 __attribute__((global))
 void kernel(int *out) {
   int i = 0;
-  out[i++] = threadIdx.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.x()
-  out[i++] = threadIdx.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.y()
-  out[i++] = threadIdx.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.z()
+  out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.x()
+  out[i++] = threadIdx.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.y()
+  out[i++] = threadIdx.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.z()
 
-  out[i++] = blockIdx.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ctaid.x()
-  out[i++] = blockIdx.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ctaid.y()
-  out[i++] = blockIdx.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ctaid.z()
+  out[i++] = blockIdx.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.x()
+  out[i++] = blockIdx.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.y()
+  out[i++] = blockIdx.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.z()
 
-  out[i++] = blockDim.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ntid.x()
-  out[i++] = blockDim.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ntid.y()
-  out[i++] = blockDim.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ntid.z()
+  out[i++] = blockDim.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.x()
+  out[i++] = blockDim.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.y()
+  out[i++] = blockDim.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.z()
 
-  out[i++] = gridDim.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.nctaid.x()
-  out[i++] = gridDim.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.nctaid.y()
-  out[i++] = gridDim.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.nctaid.z()
+  out[i++] = gridDim.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.x()
+  out[i++] = gridDim.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.y()
+  out[i++] = gridDim.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.z()
 
   out[i++] = warpSize; // CHECK: store i32 32,
 
diff --git a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp 
b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
index f63697916d902..82770f8660850 100644
--- a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
@@ -542,30 +542,24 @@ void NVPTXAsmPrinter::emitKernelFunctionDirectives(const 
Function &F,
   // If the NVVM IR has some of reqntid* specified, then output
   // the reqntid directive, and set the unspecified ones to 1.
   // If none of Reqntid* is specified, don't output reqntid directive.
-  unsigned Reqntidx, Reqntidy, Reqntidz;
-  Reqntidx = Reqntidy = Reqntidz = 1;
-  bool ReqSpecified = false;
-  ReqSpecified |= getReqNTIDx(F, Reqntidx);
-  ReqSpecified |= getReqNTIDy(F, Reqntidy);
-  ReqSpecified |= getReqNTIDz(F, Reqntidz);
+  std::optional Reqntidx = getReqNTIDx(F);
+  std::optional Reqntidy = getReqNTIDy(F);
+  std::optional Reqntidz = getReqNTIDz(F);
 
-  if (ReqSpecified)
-O << ".reqntid " << Reqntidx << ", " << Reqntidy << ", " << Reqntidz
-  << "\n";
+  if (Reqntidx || Reqntidy || Reqntidz)
+O << ".reqntid " << Reqntidx.value_or(1) << ", " << Reqntidy.value_or(1)
+  << ", " << Reqntidz.value_or(1) << "\n";
 
   // If the NVVM IR has some of maxntid* specified, then output
   // the maxntid directive, and set the unspecified ones to 1.
   // If none of maxntid* is specified, don't output maxntid directive.
-  unsigned Maxntidx, Maxntidy, Maxntidz;
-  Maxntidx = Maxntidy = Maxntidz = 1;
-  bool MaxSpecified = false;
-  MaxSpecified |= getMaxNTIDx(F, Maxntid

[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)

2024-06-05 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/94422

>From 708374e03f1bf70006f2472f19edad1bd621e2d6 Mon Sep 17 00:00:00 2001
From: Alex MacLean 
Date: Mon, 3 Jun 2024 16:46:36 +
Subject: [PATCH 1/2] [NVPTX] Revamp NVVMIntrRange pass

---
 clang/test/CodeGenCUDA/cuda-builtin-vars.cu  |  24 +--
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp|  32 ++--
 llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp |   6 +-
 llvm/lib/Target/NVPTX/NVPTXUtilities.cpp |  58 --
 llvm/lib/Target/NVPTX/NVPTXUtilities.h   |  16 +-
 llvm/lib/Target/NVPTX/NVVMIntrRange.cpp  | 177 ++-
 llvm/test/CodeGen/NVPTX/intr-range.ll|  60 +++
 llvm/test/CodeGen/NVPTX/intrinsic-old.ll |  43 ++---
 8 files changed, 249 insertions(+), 167 deletions(-)
 create mode 100644 llvm/test/CodeGen/NVPTX/intr-range.ll

diff --git a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu 
b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu
index ba5e5f13ebe70..dba0a76af21dd 100644
--- a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu
+++ b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu
@@ -6,21 +6,21 @@
 __attribute__((global))
 void kernel(int *out) {
   int i = 0;
-  out[i++] = threadIdx.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.x()
-  out[i++] = threadIdx.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.y()
-  out[i++] = threadIdx.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.z()
+  out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.x()
+  out[i++] = threadIdx.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.y()
+  out[i++] = threadIdx.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.z()
 
-  out[i++] = blockIdx.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ctaid.x()
-  out[i++] = blockIdx.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ctaid.y()
-  out[i++] = blockIdx.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ctaid.z()
+  out[i++] = blockIdx.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.x()
+  out[i++] = blockIdx.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.y()
+  out[i++] = blockIdx.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.z()
 
-  out[i++] = blockDim.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ntid.x()
-  out[i++] = blockDim.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ntid.y()
-  out[i++] = blockDim.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ntid.z()
+  out[i++] = blockDim.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.x()
+  out[i++] = blockDim.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.y()
+  out[i++] = blockDim.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.z()
 
-  out[i++] = gridDim.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.nctaid.x()
-  out[i++] = gridDim.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.nctaid.y()
-  out[i++] = gridDim.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.nctaid.z()
+  out[i++] = gridDim.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.x()
+  out[i++] = gridDim.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.y()
+  out[i++] = gridDim.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.z()
 
   out[i++] = warpSize; // CHECK: store i32 32,
 
diff --git a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp 
b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
index f63697916d902..82770f8660850 100644
--- a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
@@ -542,30 +542,24 @@ void NVPTXAsmPrinter::emitKernelFunctionDirectives(const 
Function &F,
   // If the NVVM IR has some of reqntid* specified, then output
   // the reqntid directive, and set the unspecified ones to 1.
   // If none of Reqntid* is specified, don't output reqntid directive.
-  unsigned Reqntidx, Reqntidy, Reqntidz;
-  Reqntidx = Reqntidy = Reqntidz = 1;
-  bool ReqSpecified = false;
-  ReqSpecified |= getReqNTIDx(F, Reqntidx);
-  ReqSpecified |= getReqNTIDy(F, Reqntidy);
-  ReqSpecified |= getReqNTIDz(F, Reqntidz);
+  std::optional Reqntidx = getReqNTIDx(F);
+  std::optional Reqntidy = getReqNTIDy(F);
+  std::optional Reqntidz = getReqNTIDz(F);
 
-  if (ReqSpecified)
-O << ".reqntid " << Reqntidx << ", " << Reqntidy << ", " << Reqntidz
-  << "\n";
+  if (Reqntidx || Reqntidy || Reqntidz)
+O << ".reqntid " << Reqntidx.value_or(1) << ", " << Reqntidy.value_or(1)
+  << ", " << Reqntidz.value_or(1) << "\n";
 
   // If the NVVM IR has some of maxntid* specified, then output
   // the maxntid directive, and set the unspecified ones to 1.
   // If none of maxntid* is specified, don't output maxntid directive.
-  unsigned Maxntidx, Maxntidy, Maxntidz;
-  Maxntidx = Maxntidy = Maxntidz = 1;
-  bool MaxSpecified = false;
-  MaxSpecified |= getMaxNTIDx(F, Max

[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)

2024-06-05 Thread Alex MacLean via cfe-commits


@@ -1,50 +1,51 @@
-//===- NVVMIntrRange.cpp - Set !range metadata for NVVM intrinsics 
===//
+//===- NVVMIntrRange.cpp - Set range attributes for NVVM intrinsics 
---===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
 
//===--===//
 //
-// This pass adds appropriate !range metadata for calls to NVVM
+// This pass adds appropriate range attributes for calls to NVVM
 // intrinsics that return a limited range of values.
 //
 
//===--===//
 
 #include "NVPTX.h"
-#include "llvm/IR/Constants.h"
+#include "NVPTXUtilities.h"
 #include "llvm/IR/InstIterator.h"
 #include "llvm/IR/Instructions.h"
+#include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/IntrinsicsNVPTX.h"
 #include "llvm/IR/PassManager.h"
 #include "llvm/Support/CommandLine.h"
+#include 
 
 using namespace llvm;
 
 #define DEBUG_TYPE "nvvm-intr-range"
 
 namespace llvm { void initializeNVVMIntrRangePass(PassRegistry &); }
 
-// Add !range metadata based on limits of given SM variant.
+// Add range attributes based on limits of given SM variant.
 static cl::opt NVVMIntrRangeSM("nvvm-intr-range-sm", cl::init(20),

AlexMaclean wrote:

I just went ahead and removed the SM logic from this pass altogether, all it is 
doing is reducing a single range for `sm_20`. I think it is fine to give up 
some small chance of improving perf on this architecture. 

https://github.com/llvm/llvm-project/pull/94422
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)

2024-06-05 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/94422

>From 708374e03f1bf70006f2472f19edad1bd621e2d6 Mon Sep 17 00:00:00 2001
From: Alex MacLean 
Date: Mon, 3 Jun 2024 16:46:36 +
Subject: [PATCH 1/3] [NVPTX] Revamp NVVMIntrRange pass

---
 clang/test/CodeGenCUDA/cuda-builtin-vars.cu  |  24 +--
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp|  32 ++--
 llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp |   6 +-
 llvm/lib/Target/NVPTX/NVPTXUtilities.cpp |  58 --
 llvm/lib/Target/NVPTX/NVPTXUtilities.h   |  16 +-
 llvm/lib/Target/NVPTX/NVVMIntrRange.cpp  | 177 ++-
 llvm/test/CodeGen/NVPTX/intr-range.ll|  60 +++
 llvm/test/CodeGen/NVPTX/intrinsic-old.ll |  43 ++---
 8 files changed, 249 insertions(+), 167 deletions(-)
 create mode 100644 llvm/test/CodeGen/NVPTX/intr-range.ll

diff --git a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu 
b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu
index ba5e5f13ebe70..dba0a76af21dd 100644
--- a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu
+++ b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu
@@ -6,21 +6,21 @@
 __attribute__((global))
 void kernel(int *out) {
   int i = 0;
-  out[i++] = threadIdx.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.x()
-  out[i++] = threadIdx.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.y()
-  out[i++] = threadIdx.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.z()
+  out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.x()
+  out[i++] = threadIdx.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.y()
+  out[i++] = threadIdx.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.z()
 
-  out[i++] = blockIdx.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ctaid.x()
-  out[i++] = blockIdx.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ctaid.y()
-  out[i++] = blockIdx.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ctaid.z()
+  out[i++] = blockIdx.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.x()
+  out[i++] = blockIdx.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.y()
+  out[i++] = blockIdx.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.z()
 
-  out[i++] = blockDim.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ntid.x()
-  out[i++] = blockDim.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ntid.y()
-  out[i++] = blockDim.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ntid.z()
+  out[i++] = blockDim.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.x()
+  out[i++] = blockDim.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.y()
+  out[i++] = blockDim.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.z()
 
-  out[i++] = gridDim.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.nctaid.x()
-  out[i++] = gridDim.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.nctaid.y()
-  out[i++] = gridDim.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.nctaid.z()
+  out[i++] = gridDim.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.x()
+  out[i++] = gridDim.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.y()
+  out[i++] = gridDim.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.z()
 
   out[i++] = warpSize; // CHECK: store i32 32,
 
diff --git a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp 
b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
index f63697916d902..82770f8660850 100644
--- a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
@@ -542,30 +542,24 @@ void NVPTXAsmPrinter::emitKernelFunctionDirectives(const 
Function &F,
   // If the NVVM IR has some of reqntid* specified, then output
   // the reqntid directive, and set the unspecified ones to 1.
   // If none of Reqntid* is specified, don't output reqntid directive.
-  unsigned Reqntidx, Reqntidy, Reqntidz;
-  Reqntidx = Reqntidy = Reqntidz = 1;
-  bool ReqSpecified = false;
-  ReqSpecified |= getReqNTIDx(F, Reqntidx);
-  ReqSpecified |= getReqNTIDy(F, Reqntidy);
-  ReqSpecified |= getReqNTIDz(F, Reqntidz);
+  std::optional Reqntidx = getReqNTIDx(F);
+  std::optional Reqntidy = getReqNTIDy(F);
+  std::optional Reqntidz = getReqNTIDz(F);
 
-  if (ReqSpecified)
-O << ".reqntid " << Reqntidx << ", " << Reqntidy << ", " << Reqntidz
-  << "\n";
+  if (Reqntidx || Reqntidy || Reqntidz)
+O << ".reqntid " << Reqntidx.value_or(1) << ", " << Reqntidy.value_or(1)
+  << ", " << Reqntidz.value_or(1) << "\n";
 
   // If the NVVM IR has some of maxntid* specified, then output
   // the maxntid directive, and set the unspecified ones to 1.
   // If none of maxntid* is specified, don't output maxntid directive.
-  unsigned Maxntidx, Maxntidy, Maxntidz;
-  Maxntidx = Maxntidy = Maxntidz = 1;
-  bool MaxSpecified = false;
-  MaxSpecified |= getMaxNTIDx(F, Max

[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)

2024-06-05 Thread Alex MacLean via cfe-commits


@@ -128,6 +128,15 @@ bool findOneNVVMAnnotation(const GlobalValue *gv, const 
std::string &prop,
   return true;
 }
 
+static std::optional
+findOneNVVMAnnotation(const GlobalValue &GV, const std::string &PropName) {
+  unsigned RetVal;
+  bool Found = findOneNVVMAnnotation(&GV, PropName, RetVal);
+  if (Found)
+return RetVal;

AlexMaclean wrote:

Done

https://github.com/llvm/llvm-project/pull/94422
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)

2024-06-05 Thread Alex MacLean via cfe-commits


@@ -0,0 +1,60 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --check-attributes --version 5
+; RUN: opt < %s -S -mtriple=nvptx-nvidia-cuda -mcpu=sm_20 
-passes=nvvm-intr-range | FileCheck %s
+
+define i32 @test_maxntid() {
+; CHECK-LABEL: define i32 @test_maxntid(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:[[TMP1:%.*]] = call range(i32 0, 96) i32 
@llvm.nvvm.read.ptx.sreg.tid.x()
+; CHECK-NEXT:[[TMP2:%.*]] = call range(i32 0, 64) i32 
@llvm.nvvm.read.ptx.sreg.tid.z()
+; CHECK-NEXT:[[TMP4:%.*]] = call range(i32 1, 97) i32 
@llvm.nvvm.read.ptx.sreg.ntid.y()
+; CHECK-NEXT:[[TMP3:%.*]] = add i32 [[TMP1]], [[TMP2]]
+; CHECK-NEXT:[[TMP5:%.*]] = add i32 [[TMP3]], [[TMP4]]
+; CHECK-NEXT:ret i32 [[TMP5]]
+;
+  %1 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
+  %2 = call i32 @llvm.nvvm.read.ptx.sreg.tid.z()
+  %3 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.y()

AlexMaclean wrote:

Added all the variants. 

I've removed SM logic so I'm not sure if there is anything else you'd like me 
to change?

https://github.com/llvm/llvm-project/pull/94422
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)

2024-06-05 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/94422

>From 708374e03f1bf70006f2472f19edad1bd621e2d6 Mon Sep 17 00:00:00 2001
From: Alex MacLean 
Date: Mon, 3 Jun 2024 16:46:36 +
Subject: [PATCH 1/4] [NVPTX] Revamp NVVMIntrRange pass

---
 clang/test/CodeGenCUDA/cuda-builtin-vars.cu  |  24 +--
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp|  32 ++--
 llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp |   6 +-
 llvm/lib/Target/NVPTX/NVPTXUtilities.cpp |  58 --
 llvm/lib/Target/NVPTX/NVPTXUtilities.h   |  16 +-
 llvm/lib/Target/NVPTX/NVVMIntrRange.cpp  | 177 ++-
 llvm/test/CodeGen/NVPTX/intr-range.ll|  60 +++
 llvm/test/CodeGen/NVPTX/intrinsic-old.ll |  43 ++---
 8 files changed, 249 insertions(+), 167 deletions(-)
 create mode 100644 llvm/test/CodeGen/NVPTX/intr-range.ll

diff --git a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu 
b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu
index ba5e5f13ebe70..dba0a76af21dd 100644
--- a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu
+++ b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu
@@ -6,21 +6,21 @@
 __attribute__((global))
 void kernel(int *out) {
   int i = 0;
-  out[i++] = threadIdx.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.x()
-  out[i++] = threadIdx.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.y()
-  out[i++] = threadIdx.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.z()
+  out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.x()
+  out[i++] = threadIdx.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.y()
+  out[i++] = threadIdx.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.z()
 
-  out[i++] = blockIdx.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ctaid.x()
-  out[i++] = blockIdx.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ctaid.y()
-  out[i++] = blockIdx.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ctaid.z()
+  out[i++] = blockIdx.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.x()
+  out[i++] = blockIdx.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.y()
+  out[i++] = blockIdx.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.z()
 
-  out[i++] = blockDim.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ntid.x()
-  out[i++] = blockDim.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ntid.y()
-  out[i++] = blockDim.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.ntid.z()
+  out[i++] = blockDim.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.x()
+  out[i++] = blockDim.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.y()
+  out[i++] = blockDim.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.z()
 
-  out[i++] = gridDim.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.nctaid.x()
-  out[i++] = gridDim.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.nctaid.y()
-  out[i++] = gridDim.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.nctaid.z()
+  out[i++] = gridDim.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.x()
+  out[i++] = gridDim.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.y()
+  out[i++] = gridDim.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.z()
 
   out[i++] = warpSize; // CHECK: store i32 32,
 
diff --git a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp 
b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
index f63697916d902..82770f8660850 100644
--- a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
@@ -542,30 +542,24 @@ void NVPTXAsmPrinter::emitKernelFunctionDirectives(const 
Function &F,
   // If the NVVM IR has some of reqntid* specified, then output
   // the reqntid directive, and set the unspecified ones to 1.
   // If none of Reqntid* is specified, don't output reqntid directive.
-  unsigned Reqntidx, Reqntidy, Reqntidz;
-  Reqntidx = Reqntidy = Reqntidz = 1;
-  bool ReqSpecified = false;
-  ReqSpecified |= getReqNTIDx(F, Reqntidx);
-  ReqSpecified |= getReqNTIDy(F, Reqntidy);
-  ReqSpecified |= getReqNTIDz(F, Reqntidz);
+  std::optional Reqntidx = getReqNTIDx(F);
+  std::optional Reqntidy = getReqNTIDy(F);
+  std::optional Reqntidz = getReqNTIDz(F);
 
-  if (ReqSpecified)
-O << ".reqntid " << Reqntidx << ", " << Reqntidy << ", " << Reqntidz
-  << "\n";
+  if (Reqntidx || Reqntidy || Reqntidz)
+O << ".reqntid " << Reqntidx.value_or(1) << ", " << Reqntidy.value_or(1)
+  << ", " << Reqntidz.value_or(1) << "\n";
 
   // If the NVVM IR has some of maxntid* specified, then output
   // the maxntid directive, and set the unspecified ones to 1.
   // If none of maxntid* is specified, don't output maxntid directive.
-  unsigned Maxntidx, Maxntidy, Maxntidz;
-  Maxntidx = Maxntidy = Maxntidz = 1;
-  bool MaxSpecified = false;
-  MaxSpecified |= getMaxNTIDx(F, Max

[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)

2024-06-05 Thread Alex MacLean via cfe-commits


@@ -139,24 +138,23 @@ define ptx_device i32 @test_ctaid_w() {
 
 define ptx_device i32 @test_nctaid_y() {
 ; CHECK: mov.u32 %r{{[0-9]+}}, %nctaid.y;
-; RANGE: call i32 @llvm.nvvm.read.ptx.sreg.nctaid.y(), !range 
![[GRID_SIZE_YZ:[0-9]+]]
+; RANGE: call range(i32 1, 65536) i32 @llvm.nvvm.read.ptx.sreg.nctaid.y()
 ; CHECK: ret;
%x = call i32 @llvm.nvvm.read.ptx.sreg.nctaid.y()
ret i32 %x
 }
 
 define ptx_device i32 @test_nctaid_z() {
 ; CHECK: mov.u32 %r{{[0-9]+}}, %nctaid.z;
-; RANGE: call i32 @llvm.nvvm.read.ptx.sreg.nctaid.z(), !range ![[GRID_SIZE_YZ]]
+; RANGE: call range(i32 1, 65536) i32 @llvm.nvvm.read.ptx.sreg.nctaid.z()
 ; CHECK: ret;
%x = call i32 @llvm.nvvm.read.ptx.sreg.nctaid.z()
ret i32 %x
 }
 
 define ptx_device i32 @test_nctaid_x() {
 ; CHECK: mov.u32 %r{{[0-9]+}}, %nctaid.x;
-; RANGE_30: call i32 @llvm.nvvm.read.ptx.sreg.nctaid.x(), !range 
![[GRID_SIZE_X:[0-9]+]]
-; RANGE_20: call i32 @llvm.nvvm.read.ptx.sreg.nctaid.x(), !range 
![[GRID_SIZE_YZ]]
+; RANGE: call range(i32 1, -2147483648) i32 @llvm.nvvm.read.ptx.sreg.nctaid.x()

AlexMaclean wrote:

I agree it looks weird but my understanding as well is that it is fine, is 
there anyone else you think we should check with?

https://github.com/llvm/llvm-project/pull/94422
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] fixup cuda-builtin-vars.cu broken in IntrRange change (PR #94639)

2024-06-06 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean created 
https://github.com/llvm/llvm-project/pull/94639

None

>From 227c36f7261854a1b6f8fb12fd902ffa7380be0d Mon Sep 17 00:00:00 2001
From: Alex MacLean 
Date: Thu, 6 Jun 2024 16:36:19 +
Subject: [PATCH] fixup cuda-builtin-vars.cu broken in IntrRange change

---
 clang/test/CodeGenCUDA/cuda-builtin-vars.cu | 24 ++---
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu 
b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu
index dba0a76af21dd..7880a8036f8cd 100644
--- a/clang/test/CodeGenCUDA/cuda-builtin-vars.cu
+++ b/clang/test/CodeGenCUDA/cuda-builtin-vars.cu
@@ -6,21 +6,21 @@
 __attribute__((global))
 void kernel(int *out) {
   int i = 0;
-  out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.x()
-  out[i++] = threadIdx.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.y()
-  out[i++] = threadIdx.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.z()
+  out[i++] = threadIdx.x; // CHECK: call noundef{{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.x()
+  out[i++] = threadIdx.y; // CHECK: call noundef{{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.y()
+  out[i++] = threadIdx.z; // CHECK: call noundef{{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.z()
 
-  out[i++] = blockIdx.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.x()
-  out[i++] = blockIdx.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.y()
-  out[i++] = blockIdx.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.z()
+  out[i++] = blockIdx.x; // CHECK: call noundef{{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.x()
+  out[i++] = blockIdx.y; // CHECK: call noundef{{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.y()
+  out[i++] = blockIdx.z; // CHECK: call noundef{{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ctaid.z()
 
-  out[i++] = blockDim.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.x()
-  out[i++] = blockDim.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.y()
-  out[i++] = blockDim.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.z()
+  out[i++] = blockDim.x; // CHECK: call noundef{{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.x()
+  out[i++] = blockDim.y; // CHECK: call noundef{{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.y()
+  out[i++] = blockDim.z; // CHECK: call noundef{{.*}} i32 
@llvm.nvvm.read.ptx.sreg.ntid.z()
 
-  out[i++] = gridDim.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.x()
-  out[i++] = gridDim.y; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.y()
-  out[i++] = gridDim.z; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.z()
+  out[i++] = gridDim.x; // CHECK: call noundef{{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.x()
+  out[i++] = gridDim.y; // CHECK: call noundef{{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.y()
+  out[i++] = gridDim.z; // CHECK: call noundef{{.*}} i32 
@llvm.nvvm.read.ptx.sreg.nctaid.z()
 
   out[i++] = warpSize; // CHECK: store i32 32,
 

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)

2024-06-06 Thread Alex MacLean via cfe-commits


@@ -6,21 +6,21 @@
 __attribute__((global))
 void kernel(int *out) {
   int i = 0;
-  out[i++] = threadIdx.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.x()
-  out[i++] = threadIdx.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.y()
-  out[i++] = threadIdx.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.z()
+  out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.x()

AlexMaclean wrote:

https://github.com/llvm/llvm-project/pull/94639

https://github.com/llvm/llvm-project/pull/94422
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] fixup cuda-builtin-vars.cu broken in IntrRange change (PR #94639)

2024-06-06 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean closed 
https://github.com/llvm/llvm-project/pull/94639
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)

2024-08-27 Thread Alex MacLean via cfe-commits


@@ -1549,30 +1549,10 @@ define amdgpu_kernel void 
@multiple_uses_fneg_select_f64(double %x, double %y, i
 define amdgpu_kernel void @fnge_select_f32_multi_use_regression(float %.i2369) 
{
 ; GCN-LABEL: fnge_select_f32_multi_use_regression:
 ; GCN:   ; %bb.0: ; %.entry
-; GCN-NEXT:s_load_dword s0, s[4:5], 0x0
-; GCN-NEXT:s_waitcnt lgkmcnt(0)
-; GCN-NEXT:v_cmp_nlt_f32_e64 s[0:1], s0, 0
-; GCN-NEXT:v_cndmask_b32_e64 v0, 0, 1, s[0:1]
-; GCN-NEXT:v_cmp_ngt_f32_e32 vcc, 0, v0
-; GCN-NEXT:v_cndmask_b32_e32 v1, 0, v0, vcc
-; GCN-NEXT:v_mul_f32_e64 v0, -v0, v1
-; GCN-NEXT:v_cmp_lt_f32_e32 vcc, 0, v0
-; GCN-NEXT:s_and_b64 vcc, exec, vcc

AlexMaclean wrote:

https://github.com/llvm/llvm-project/pull/106268 slightly adjusts this test to 
ensure it doesn't get DCE'd away after this change. 

https://github.com/llvm/llvm-project/pull/97762
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)

2024-08-27 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/97762

>From ce146e18f74e8e984ef83d152f3a5fe88e56f287 Mon Sep 17 00:00:00 2001
From: Alex MacLean 
Date: Mon, 1 Jul 2024 17:06:56 +
Subject: [PATCH 1/6] [ValueTracking] use KnownBits to compute fpclass from
 bitcast

---
 llvm/lib/Analysis/ValueTracking.cpp  |  30 ++
 llvm/test/Transforms/Attributor/nofpclass.ll | 104 +++
 2 files changed, 134 insertions(+)

diff --git a/llvm/lib/Analysis/ValueTracking.cpp 
b/llvm/lib/Analysis/ValueTracking.cpp
index 173faa32a3878d..023303aa09e362 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -5921,6 +5921,36 @@ void computeKnownFPClass(const Value *V, const APInt 
&DemandedElts,
 
 break;
   }
+  case Instruction::BitCast: {
+const Type *Ty = Op->getType();
+const Value *Casted = Op->getOperand(0);
+if (Ty->isVectorTy() || !Casted->getType()->isIntOrIntVectorTy())
+  break;
+
+KnownBits Bits(Ty->getScalarSizeInBits());
+computeKnownBits(Casted, Bits, Depth + 1, Q);
+
+// Transfer information from the sign bit.
+if (Bits.Zero.isSignBitSet())
+  Known.signBitMustBeZero();
+else if (Bits.One.isSignBitSet())
+  Known.signBitMustBeOne();
+
+if (Ty->isIEEE()) {
+  // IEEE floats are NaN when all bits of the exponent plus at least one of
+  // the fraction bits are 1. This means:
+  //   - If we assume unknown bits are 0 and the value is NaN, it will
+  // always be NaN
+  //   - If we assume unknown bits are 1 and the value is not NaN, it can
+  // never be NaN
+  if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN())
+Known.KnownFPClasses = fcNan;
+  else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN())
+Known.knownNot(fcNan);
+}
+
+break;
+  }
   default:
 break;
   }
diff --git a/llvm/test/Transforms/Attributor/nofpclass.ll 
b/llvm/test/Transforms/Attributor/nofpclass.ll
index 781ba636c3ab3c..c5d562a436b337 100644
--- a/llvm/test/Transforms/Attributor/nofpclass.ll
+++ b/llvm/test/Transforms/Attributor/nofpclass.ll
@@ -2690,6 +2690,110 @@ entry:
   ret double %abs
 }
 
+define float @bitcast_to_float_sign_0(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float 
@bitcast_to_float_sign_0
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 1
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = lshr i32 %arg, 1
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_nnan(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float 
@bitcast_to_float_nnan
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 2
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = lshr i32 %arg, 2
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_sign_1(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) float 
@bitcast_to_float_sign_1
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], -2147483648
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = or i32 %arg, -2147483648
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_nan(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(inf zero sub norm) float @bitcast_to_float_nan
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], 2139095041
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = or i32 %arg, 2139095041
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define double @bitcast_to_double_sign_0(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) double 
@bitcast_to_double_sign_0
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = lshr i64 [[ARG]], 1
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i64 [[TMP1]] to double
+; CHECK-NEXT:ret double [[TMP2]]
+;
+  %1 = lshr i64 %arg, 1
+  %2 = bitcast i64 %1 to double
+  ret double %2
+}
+
+define double @bitcast_to_double_nnan(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: defin

[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)

2024-08-27 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/97762

>From ddb38bd6c86e36ab8b46a4fb5f97390d140f4aa1 Mon Sep 17 00:00:00 2001
From: Alex MacLean 
Date: Mon, 1 Jul 2024 17:06:56 +
Subject: [PATCH 1/6] [ValueTracking] use KnownBits to compute fpclass from
 bitcast

---
 llvm/lib/Analysis/ValueTracking.cpp  |  30 ++
 llvm/test/Transforms/Attributor/nofpclass.ll | 104 +++
 2 files changed, 134 insertions(+)

diff --git a/llvm/lib/Analysis/ValueTracking.cpp 
b/llvm/lib/Analysis/ValueTracking.cpp
index 173faa32a3878d..023303aa09e362 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -5921,6 +5921,36 @@ void computeKnownFPClass(const Value *V, const APInt 
&DemandedElts,
 
 break;
   }
+  case Instruction::BitCast: {
+const Type *Ty = Op->getType();
+const Value *Casted = Op->getOperand(0);
+if (Ty->isVectorTy() || !Casted->getType()->isIntOrIntVectorTy())
+  break;
+
+KnownBits Bits(Ty->getScalarSizeInBits());
+computeKnownBits(Casted, Bits, Depth + 1, Q);
+
+// Transfer information from the sign bit.
+if (Bits.Zero.isSignBitSet())
+  Known.signBitMustBeZero();
+else if (Bits.One.isSignBitSet())
+  Known.signBitMustBeOne();
+
+if (Ty->isIEEE()) {
+  // IEEE floats are NaN when all bits of the exponent plus at least one of
+  // the fraction bits are 1. This means:
+  //   - If we assume unknown bits are 0 and the value is NaN, it will
+  // always be NaN
+  //   - If we assume unknown bits are 1 and the value is not NaN, it can
+  // never be NaN
+  if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN())
+Known.KnownFPClasses = fcNan;
+  else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN())
+Known.knownNot(fcNan);
+}
+
+break;
+  }
   default:
 break;
   }
diff --git a/llvm/test/Transforms/Attributor/nofpclass.ll 
b/llvm/test/Transforms/Attributor/nofpclass.ll
index 781ba636c3ab3c..c5d562a436b337 100644
--- a/llvm/test/Transforms/Attributor/nofpclass.ll
+++ b/llvm/test/Transforms/Attributor/nofpclass.ll
@@ -2690,6 +2690,110 @@ entry:
   ret double %abs
 }
 
+define float @bitcast_to_float_sign_0(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float 
@bitcast_to_float_sign_0
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 1
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = lshr i32 %arg, 1
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_nnan(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float 
@bitcast_to_float_nnan
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 2
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = lshr i32 %arg, 2
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_sign_1(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) float 
@bitcast_to_float_sign_1
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], -2147483648
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = or i32 %arg, -2147483648
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_nan(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(inf zero sub norm) float @bitcast_to_float_nan
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], 2139095041
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = or i32 %arg, 2139095041
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define double @bitcast_to_double_sign_0(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) double 
@bitcast_to_double_sign_0
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = lshr i64 [[ARG]], 1
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i64 [[TMP1]] to double
+; CHECK-NEXT:ret double [[TMP2]]
+;
+  %1 = lshr i64 %arg, 1
+  %2 = bitcast i64 %1 to double
+  ret double %2
+}
+
+define double @bitcast_to_double_nnan(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: defin

[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)

2024-08-27 Thread Alex MacLean via cfe-commits

AlexMaclean wrote:

@arsenm, @goldsteinn when you have a minute could you take another look at 
this? I think I've addressed all the issues you've raised. 

https://github.com/llvm/llvm-project/pull/97762
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)

2024-08-28 Thread Alex MacLean via cfe-commits


@@ -5921,6 +5921,61 @@ void computeKnownFPClass(const Value *V, const APInt 
&DemandedElts,
 
 break;
   }
+  case Instruction::BitCast: {
+const Value *Src;
+if (!match(Op, m_ElementWiseBitCast(m_Value(Src))) ||
+!Src->getType()->isIntOrIntVectorTy())
+  break;
+
+const Type *Ty = Op->getType()->getScalarType();
+KnownBits Bits(Ty->getScalarSizeInBits());
+computeKnownBits(Src, DemandedElts, Bits, Depth + 1, Q);
+
+// Transfer information from the sign bit.
+if (Bits.isNonNegative())
+  Known.signBitMustBeZero();
+else if (Bits.isNegative())
+  Known.signBitMustBeOne();
+
+if (Ty->isIEEE()) {
+  // IEEE floats are NaN when all bits of the exponent plus at least one of
+  // the fraction bits are 1. This means:
+  //   - If we assume unknown bits are 0 and the value is NaN, it will
+  // always be NaN
+  //   - If we assume unknown bits are 1 and the value is not NaN, it can
+  // never be NaN
+  if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN())
+Known.KnownFPClasses = fcNan;
+  else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN())
+Known.knownNot(fcNan);
+
+  // Build KnownBits representing Inf and check if it must be equal or
+  // unequal to this value.
+  auto InfKB = KnownBits::makeConstant(
+  APFloat::getInf(Ty->getFltSemantics()).bitcastToAPInt());
+  InfKB.Zero.clearSignBit();
+  if (const auto InfResult = KnownBits::eq(Bits, InfKB)) {

AlexMaclean wrote:

I don't think so. `KnownBits::eq` will return `false` if the inputs cannot be 
equal and `std::nullopt` if the may or may not be equal (in this case it cannot 
return `true` because `InfKB` is not fully known). Clearing the sign bit of 
`Bits` won't change the result either way. 

https://github.com/llvm/llvm-project/pull/97762
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)

2024-08-29 Thread Alex MacLean via cfe-commits


@@ -5921,6 +5921,61 @@ void computeKnownFPClass(const Value *V, const APInt 
&DemandedElts,
 
 break;
   }
+  case Instruction::BitCast: {
+const Value *Src;
+if (!match(Op, m_ElementWiseBitCast(m_Value(Src))) ||
+!Src->getType()->isIntOrIntVectorTy())
+  break;
+
+const Type *Ty = Op->getType()->getScalarType();
+KnownBits Bits(Ty->getScalarSizeInBits());
+computeKnownBits(Src, DemandedElts, Bits, Depth + 1, Q);
+
+// Transfer information from the sign bit.
+if (Bits.isNonNegative())
+  Known.signBitMustBeZero();
+else if (Bits.isNegative())
+  Known.signBitMustBeOne();
+
+if (Ty->isIEEE()) {
+  // IEEE floats are NaN when all bits of the exponent plus at least one of
+  // the fraction bits are 1. This means:
+  //   - If we assume unknown bits are 0 and the value is NaN, it will
+  // always be NaN
+  //   - If we assume unknown bits are 1 and the value is not NaN, it can
+  // never be NaN
+  if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN())
+Known.KnownFPClasses = fcNan;
+  else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN())
+Known.knownNot(fcNan);
+
+  // Build KnownBits representing Inf and check if it must be equal or
+  // unequal to this value.
+  auto InfKB = KnownBits::makeConstant(
+  APFloat::getInf(Ty->getFltSemantics()).bitcastToAPInt());
+  InfKB.Zero.clearSignBit();
+  if (const auto InfResult = KnownBits::eq(Bits, InfKB)) {

AlexMaclean wrote:

If that is the case the values will be:
```
Bits  = 1  000
InfKB = ?  000
```
These may or may not be equal so `std::nullopt` will be returned and no 
information will be added to the fpclass. I suppose we could handle this case 
but it will be constant folded anyway so I don't think it is really necessary. 


https://github.com/llvm/llvm-project/pull/97762
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)

2024-08-29 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/97762

>From 0477447f29b2889f92abf44cacd5e0f2c4e7f387 Mon Sep 17 00:00:00 2001
From: Alex MacLean 
Date: Mon, 1 Jul 2024 17:06:56 +
Subject: [PATCH 1/6] [ValueTracking] use KnownBits to compute fpclass from
 bitcast

---
 llvm/lib/Analysis/ValueTracking.cpp  |  30 ++
 llvm/test/Transforms/Attributor/nofpclass.ll | 104 +++
 2 files changed, 134 insertions(+)

diff --git a/llvm/lib/Analysis/ValueTracking.cpp 
b/llvm/lib/Analysis/ValueTracking.cpp
index 173faa32a3878d..023303aa09e362 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -5921,6 +5921,36 @@ void computeKnownFPClass(const Value *V, const APInt 
&DemandedElts,
 
 break;
   }
+  case Instruction::BitCast: {
+const Type *Ty = Op->getType();
+const Value *Casted = Op->getOperand(0);
+if (Ty->isVectorTy() || !Casted->getType()->isIntOrIntVectorTy())
+  break;
+
+KnownBits Bits(Ty->getScalarSizeInBits());
+computeKnownBits(Casted, Bits, Depth + 1, Q);
+
+// Transfer information from the sign bit.
+if (Bits.Zero.isSignBitSet())
+  Known.signBitMustBeZero();
+else if (Bits.One.isSignBitSet())
+  Known.signBitMustBeOne();
+
+if (Ty->isIEEE()) {
+  // IEEE floats are NaN when all bits of the exponent plus at least one of
+  // the fraction bits are 1. This means:
+  //   - If we assume unknown bits are 0 and the value is NaN, it will
+  // always be NaN
+  //   - If we assume unknown bits are 1 and the value is not NaN, it can
+  // never be NaN
+  if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN())
+Known.KnownFPClasses = fcNan;
+  else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN())
+Known.knownNot(fcNan);
+}
+
+break;
+  }
   default:
 break;
   }
diff --git a/llvm/test/Transforms/Attributor/nofpclass.ll 
b/llvm/test/Transforms/Attributor/nofpclass.ll
index 781ba636c3ab3c..c5d562a436b337 100644
--- a/llvm/test/Transforms/Attributor/nofpclass.ll
+++ b/llvm/test/Transforms/Attributor/nofpclass.ll
@@ -2690,6 +2690,110 @@ entry:
   ret double %abs
 }
 
+define float @bitcast_to_float_sign_0(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float 
@bitcast_to_float_sign_0
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 1
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = lshr i32 %arg, 1
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_nnan(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float 
@bitcast_to_float_nnan
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 2
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = lshr i32 %arg, 2
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_sign_1(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) float 
@bitcast_to_float_sign_1
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], -2147483648
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = or i32 %arg, -2147483648
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_nan(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(inf zero sub norm) float @bitcast_to_float_nan
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], 2139095041
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = or i32 %arg, 2139095041
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define double @bitcast_to_double_sign_0(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) double 
@bitcast_to_double_sign_0
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = lshr i64 [[ARG]], 1
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i64 [[TMP1]] to double
+; CHECK-NEXT:ret double [[TMP2]]
+;
+  %1 = lshr i64 %arg, 1
+  %2 = bitcast i64 %1 to double
+  ret double %2
+}
+
+define double @bitcast_to_double_nnan(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: defin

[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)

2024-08-30 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean closed 
https://github.com/llvm/llvm-project/pull/97762
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)

2024-08-26 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/97762

>From 2dc91ada9078e5c7344e74d5b549e896056f89ad Mon Sep 17 00:00:00 2001
From: Alex MacLean 
Date: Mon, 1 Jul 2024 17:06:56 +
Subject: [PATCH 1/5] [ValueTracking] use KnownBits to compute fpclass from
 bitcast

---
 llvm/lib/Analysis/ValueTracking.cpp  |  30 ++
 llvm/test/Transforms/Attributor/nofpclass.ll | 104 +++
 2 files changed, 134 insertions(+)

diff --git a/llvm/lib/Analysis/ValueTracking.cpp 
b/llvm/lib/Analysis/ValueTracking.cpp
index 173faa32a3878d..023303aa09e362 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -5921,6 +5921,36 @@ void computeKnownFPClass(const Value *V, const APInt 
&DemandedElts,
 
 break;
   }
+  case Instruction::BitCast: {
+const Type *Ty = Op->getType();
+const Value *Casted = Op->getOperand(0);
+if (Ty->isVectorTy() || !Casted->getType()->isIntOrIntVectorTy())
+  break;
+
+KnownBits Bits(Ty->getScalarSizeInBits());
+computeKnownBits(Casted, Bits, Depth + 1, Q);
+
+// Transfer information from the sign bit.
+if (Bits.Zero.isSignBitSet())
+  Known.signBitMustBeZero();
+else if (Bits.One.isSignBitSet())
+  Known.signBitMustBeOne();
+
+if (Ty->isIEEE()) {
+  // IEEE floats are NaN when all bits of the exponent plus at least one of
+  // the fraction bits are 1. This means:
+  //   - If we assume unknown bits are 0 and the value is NaN, it will
+  // always be NaN
+  //   - If we assume unknown bits are 1 and the value is not NaN, it can
+  // never be NaN
+  if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN())
+Known.KnownFPClasses = fcNan;
+  else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN())
+Known.knownNot(fcNan);
+}
+
+break;
+  }
   default:
 break;
   }
diff --git a/llvm/test/Transforms/Attributor/nofpclass.ll 
b/llvm/test/Transforms/Attributor/nofpclass.ll
index 781ba636c3ab3c..c5d562a436b337 100644
--- a/llvm/test/Transforms/Attributor/nofpclass.ll
+++ b/llvm/test/Transforms/Attributor/nofpclass.ll
@@ -2690,6 +2690,110 @@ entry:
   ret double %abs
 }
 
+define float @bitcast_to_float_sign_0(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float 
@bitcast_to_float_sign_0
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 1
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = lshr i32 %arg, 1
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_nnan(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float 
@bitcast_to_float_nnan
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = lshr i32 [[ARG]], 2
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = lshr i32 %arg, 2
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_sign_1(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) float 
@bitcast_to_float_sign_1
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], -2147483648
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = or i32 %arg, -2147483648
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_nan(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(inf zero sub norm) float @bitcast_to_float_nan
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = or i32 [[ARG]], 2139095041
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:ret float [[TMP2]]
+;
+  %1 = or i32 %arg, 2139095041
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define double @bitcast_to_double_sign_0(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) double 
@bitcast_to_double_sign_0
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[TMP1:%.*]] = lshr i64 [[ARG]], 1
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i64 [[TMP1]] to double
+; CHECK-NEXT:ret double [[TMP2]]
+;
+  %1 = lshr i64 %arg, 1
+  %2 = bitcast i64 %1 to double
+  ret double %2
+}
+
+define double @bitcast_to_double_nnan(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: defin

[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)

2024-08-26 Thread Alex MacLean via cfe-commits


@@ -5805,6 +5805,37 @@ void computeKnownFPClass(const Value *V, const APInt 
&DemandedElts,
 
 break;
   }
+  case Instruction::BitCast: {
+const Value *Src;
+if (!match(Op, m_ElementWiseBitCast(m_Value(Src))) ||
+!Src->getType()->isIntOrIntVectorTy())
+  break;
+
+const Type *Ty = Op->getType()->getScalarType();
+KnownBits Bits(Ty->getScalarSizeInBits());
+computeKnownBits(Src, DemandedElts, Bits, Depth + 1, Q);
+
+// Transfer information from the sign bit.
+if (Bits.isNonNegative())
+  Known.signBitMustBeZero();
+else if (Bits.isNegative())
+  Known.signBitMustBeOne();
+
+if (Ty->isIEEE()) {
+  // IEEE floats are NaN when all bits of the exponent plus at least one of
+  // the fraction bits are 1. This means:
+  //   - If we assume unknown bits are 0 and the value is NaN, it will
+  // always be NaN
+  //   - If we assume unknown bits are 1 and the value is not NaN, it can
+  // never be NaN
+  if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN())
+Known.KnownFPClasses = fcNan;
+  else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN())
+Known.knownNot(fcNan);

AlexMaclean wrote:

Okay, I've added `inf` and also `zero` since those are both relatively simple, 
but I've left normal / subnormal  to the side for now.

https://github.com/llvm/llvm-project/pull/97762
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ValueTracking] use KnownBits to compute fpclass from bitcast (PR #97762)

2024-08-26 Thread Alex MacLean via cfe-commits


@@ -2690,6 +2690,163 @@ entry:
   ret double %abs
 }
 
+define float @bitcast_to_float_sign_0(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float 
@bitcast_to_float_sign_0
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[SHR:%.*]] = lshr i32 [[ARG]], 1
+; CHECK-NEXT:[[CAST:%.*]] = bitcast i32 [[SHR]] to float
+; CHECK-NEXT:ret float [[CAST]]
+;
+  %shr = lshr i32 %arg, 1
+  %cast = bitcast i32 %shr to float
+  ret float %cast
+}
+
+define float @bitcast_to_float_nnan(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float 
@bitcast_to_float_nnan
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[SHR:%.*]] = lshr i32 [[ARG]], 2
+; CHECK-NEXT:[[CAST:%.*]] = bitcast i32 [[SHR]] to float
+; CHECK-NEXT:ret float [[CAST]]
+;
+  %shr = lshr i32 %arg, 2
+  %cast = bitcast i32 %shr to float
+  ret float %cast
+}
+
+define float @bitcast_to_float_sign_1(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) float 
@bitcast_to_float_sign_1
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[OR:%.*]] = or i32 [[ARG]], -2147483648
+; CHECK-NEXT:[[CAST:%.*]] = bitcast i32 [[OR]] to float
+; CHECK-NEXT:ret float [[CAST]]
+;
+  %or = or i32 %arg, -2147483648
+  %cast = bitcast i32 %or to float
+  ret float %cast
+}
+
+define float @bitcast_to_float_nan(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(inf zero sub norm) float @bitcast_to_float_nan
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[OR:%.*]] = or i32 [[ARG]], 2139095041
+; CHECK-NEXT:[[CAST:%.*]] = bitcast i32 [[OR]] to float
+; CHECK-NEXT:ret float [[CAST]]
+;
+  %or = or i32 %arg, 2139095041
+  %cast = bitcast i32 %or to float
+  ret float %cast
+}
+
+define double @bitcast_to_double_sign_0(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) double 
@bitcast_to_double_sign_0
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[SHR:%.*]] = lshr i64 [[ARG]], 1
+; CHECK-NEXT:[[CAST:%.*]] = bitcast i64 [[SHR]] to double
+; CHECK-NEXT:ret double [[CAST]]
+;
+  %shr = lshr i64 %arg, 1
+  %cast = bitcast i64 %shr to double
+  ret double %cast
+}
+
+define double @bitcast_to_double_nnan(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) double 
@bitcast_to_double_nnan
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[SHR:%.*]] = lshr i64 [[ARG]], 2
+; CHECK-NEXT:[[CAST:%.*]] = bitcast i64 [[SHR]] to double
+; CHECK-NEXT:ret double [[CAST]]
+;
+  %shr = lshr i64 %arg, 2
+  %cast = bitcast i64 %shr to double
+  ret double %cast
+}
+
+define double @bitcast_to_double_sign_1(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) double 
@bitcast_to_double_sign_1
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[OR:%.*]] = or i64 [[ARG]], -9223372036854775808
+; CHECK-NEXT:[[CAST:%.*]] = bitcast i64 [[OR]] to double
+; CHECK-NEXT:ret double [[CAST]]
+;
+  %or = or i64 %arg, -9223372036854775808
+  %cast = bitcast i64 %or to double
+  ret double %cast
+}
+
+define double @bitcast_to_double_nan(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(inf zero sub norm) double 
@bitcast_to_double_nan
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[OR:%.*]] = or i64 [[ARG]], -4503599627370495
+; CHECK-NEXT:[[CAST:%.*]] = bitcast i64 [[OR]] to double
+; CHECK-NEXT:ret double [[CAST]]
+;
+  %or = or i64 %arg, -4503599627370495
+  %cast = bitcast i64 %or to double
+  ret double %cast
+}
+
+
+define <2 x float> @bitcast_to_float_vect_sign_0(<2 x i32> %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind 
willreturn memory(none)
+; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) <2 x float> 
@bitcast_to_float_vect_sign_0
+; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:[[SHR:%.*]] = lshr <2 x i32> [[ARG]], 
+; CHECK-NEXT:[[CAST:%.*]] = bitcast <2 x i32> [[SHR]] to <2 x float>
+; CHECK-NEXT:ret <2 x float> [[CAST]]
+;
+  %shr = lshr <2 x i32> %arg, 
+  %cast = bitcast <2 x i32> %shr to <2 x float>
+  ret <2 x float> %cast
+}
+
+define <2 x float> @bitcast_to_float_vec

[clang] [llvm] [NVPTX] Remove nvvm.bitcast.* intrinsics (PR #107936)

2024-09-11 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/107936

>From ff978f81e0eedbc5e7547acabe414f2f1b0fd31a Mon Sep 17 00:00:00 2001
From: Alex MacLean 
Date: Fri, 6 Sep 2024 18:35:20 +
Subject: [PATCH 1/2] [NVPTX] Remove nvvm.bitcast.* intrinsics

---
 clang/include/clang/Basic/BuiltinsNVPTX.def   |  8 
 llvm/include/llvm/IR/IntrinsicsNVVM.td| 18 -
 llvm/lib/IR/AutoUpgrade.cpp   |  8 
 llvm/lib/Target/NVPTX/NVPTXIntrinsics.td  | 14 -
 .../Assembler/auto_upgrade_nvvm_intrinsics.ll | 20 +++
 5 files changed, 32 insertions(+), 36 deletions(-)

diff --git a/clang/include/clang/Basic/BuiltinsNVPTX.def 
b/clang/include/clang/Basic/BuiltinsNVPTX.def
index 20f038a0a9bbde..6fff562165080a 100644
--- a/clang/include/clang/Basic/BuiltinsNVPTX.def
+++ b/clang/include/clang/Basic/BuiltinsNVPTX.def
@@ -599,14 +599,6 @@ TARGET_BUILTIN(__nvvm_e4m3x2_to_f16x2_rn_relu, "V2hs", "", 
AND(SM_89,PTX81))
 TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn, "V2hs", "", AND(SM_89,PTX81))
 TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn_relu, "V2hs", "", AND(SM_89,PTX81))
 
-// Bitcast
-
-BUILTIN(__nvvm_bitcast_f2i, "if", "")
-BUILTIN(__nvvm_bitcast_i2f, "fi", "")
-
-BUILTIN(__nvvm_bitcast_ll2d, "dLLi", "")
-BUILTIN(__nvvm_bitcast_d2ll, "LLid", "")
-
 // FNS
 TARGET_BUILTIN(__nvvm_fns, "UiUiUii", "n", PTX60)
 
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td 
b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 39685c920d948d..737dd6092e2183 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -30,6 +30,10 @@
 //   * llvm.nvvm.max.ui  --> select(x ule y, x, y)
 //   * llvm.nvvm.max.ull --> ibid.
 //   * llvm.nvvm.h2f --> llvm.convert.to.fp16.f32
+//   * llvm.nvvm.bitcast.f2i  --> bitcast
+//   * llvm.nvvm.bitcast.i2f  --> ibid.
+//   * llvm.nvvm.bitcast.d2ll --> ibid.
+//   * llvm.nvvm.bitcast.ll2d --> ibid.
 
 def llvm_global_ptr_ty  : LLVMQualPointerType<1>;  // (global)ptr
 def llvm_shared_ptr_ty  : LLVMQualPointerType<3>;  // (shared)ptr
@@ -1339,20 +1343,6 @@ let TargetPrefix = "nvvm" in {
   def int_nvvm_e5m2x2_to_f16x2_rn_relu : 
ClangBuiltin<"__nvvm_e5m2x2_to_f16x2_rn_relu">,
   Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>;
 
-//
-// Bitcast
-//
-
-  def int_nvvm_bitcast_f2i : ClangBuiltin<"__nvvm_bitcast_f2i">,
-  DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_float_ty], [IntrNoMem, 
IntrSpeculatable]>;
-  def int_nvvm_bitcast_i2f : ClangBuiltin<"__nvvm_bitcast_i2f">,
-  DefaultAttrsIntrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrNoMem, 
IntrSpeculatable]>;
-
-  def int_nvvm_bitcast_ll2d : ClangBuiltin<"__nvvm_bitcast_ll2d">,
-  DefaultAttrsIntrinsic<[llvm_double_ty], [llvm_i64_ty], [IntrNoMem, 
IntrSpeculatable]>;
-  def int_nvvm_bitcast_d2ll : ClangBuiltin<"__nvvm_bitcast_d2ll">,
-  DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_double_ty], [IntrNoMem, 
IntrSpeculatable]>;
-
 // FNS
 
   def int_nvvm_fns : ClangBuiltin<"__nvvm_fns">,
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 69dae5e32dbbe8..02d1d9d9f78984 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -1268,6 +1268,10 @@ static bool upgradeIntrinsicFunction1(Function *F, 
Function *&NewFn,
   else if (Name.consume_front("atomic.load.add."))
 // nvvm.atomic.load.add.{f32.p,f64.p}
 Expand = Name.starts_with("f32.p") || Name.starts_with("f64.p");
+  else if (Name.consume_front("bitcast."))
+// nvvm.bitcast.{f2i,i2f,ll2d,d2ll}
+Expand =
+Name == "f2i" || Name == "i2f" || Name == "ll2d" || Name == "d2ll";
   else
 Expand = false;
 
@@ -4258,6 +4262,10 @@ void llvm::UpgradeIntrinsicCall(CallBase *CI, Function 
*NewFn) {
F->getParent(), 
Intrinsic::convert_from_fp16,
{Builder.getFloatTy()}),
CI->getArgOperand(0), "h2f");
+  } else if (Name.consume_front("bitcast.") &&
+ (Name == "f2i" || Name == "i2f" || Name == "ll2d" ||
+  Name == "d2ll")) {
+Rep = Builder.CreateBitCast(CI->getArgOperand(0), CI->getType());
   } else {
 Intrinsic::ID IID = shouldUpgradeNVPTXBF16Intrinsic(Name);
 if (IID != Intrinsic::not_intrinsic &&
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td 
b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 0c883093dd0a54..5c2ef4fa417ac1 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -1561,20 +1561,6 @@ def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn Int16Regs:$a),
 def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn_relu Int16Regs:$a),
   (CVT_f16x2_e5m2x2 Int16Regs:$a, CvtRN_RELU)>;
 
-//
-// Bitcast
-//
-
-def INT_NVVM_BITCAST_F2I : F_MATH_1<"mov.b32 \t$dst, $src0;", Int32Regs,
-  Float32Regs, int_nvvm_bitcast_f2i>;
-def INT_NVVM_BITCAST_I2

[clang] [llvm] [NVPTX] Remove nvvm.bitcast.* intrinsics (PR #107936)

2024-09-11 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/107936

>From ff978f81e0eedbc5e7547acabe414f2f1b0fd31a Mon Sep 17 00:00:00 2001
From: Alex MacLean 
Date: Fri, 6 Sep 2024 18:35:20 +
Subject: [PATCH 1/2] [NVPTX] Remove nvvm.bitcast.* intrinsics

---
 clang/include/clang/Basic/BuiltinsNVPTX.def   |  8 
 llvm/include/llvm/IR/IntrinsicsNVVM.td| 18 -
 llvm/lib/IR/AutoUpgrade.cpp   |  8 
 llvm/lib/Target/NVPTX/NVPTXIntrinsics.td  | 14 -
 .../Assembler/auto_upgrade_nvvm_intrinsics.ll | 20 +++
 5 files changed, 32 insertions(+), 36 deletions(-)

diff --git a/clang/include/clang/Basic/BuiltinsNVPTX.def 
b/clang/include/clang/Basic/BuiltinsNVPTX.def
index 20f038a0a9bbde..6fff562165080a 100644
--- a/clang/include/clang/Basic/BuiltinsNVPTX.def
+++ b/clang/include/clang/Basic/BuiltinsNVPTX.def
@@ -599,14 +599,6 @@ TARGET_BUILTIN(__nvvm_e4m3x2_to_f16x2_rn_relu, "V2hs", "", 
AND(SM_89,PTX81))
 TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn, "V2hs", "", AND(SM_89,PTX81))
 TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn_relu, "V2hs", "", AND(SM_89,PTX81))
 
-// Bitcast
-
-BUILTIN(__nvvm_bitcast_f2i, "if", "")
-BUILTIN(__nvvm_bitcast_i2f, "fi", "")
-
-BUILTIN(__nvvm_bitcast_ll2d, "dLLi", "")
-BUILTIN(__nvvm_bitcast_d2ll, "LLid", "")
-
 // FNS
 TARGET_BUILTIN(__nvvm_fns, "UiUiUii", "n", PTX60)
 
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td 
b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 39685c920d948d..737dd6092e2183 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -30,6 +30,10 @@
 //   * llvm.nvvm.max.ui  --> select(x ule y, x, y)
 //   * llvm.nvvm.max.ull --> ibid.
 //   * llvm.nvvm.h2f --> llvm.convert.to.fp16.f32
+//   * llvm.nvvm.bitcast.f2i  --> bitcast
+//   * llvm.nvvm.bitcast.i2f  --> ibid.
+//   * llvm.nvvm.bitcast.d2ll --> ibid.
+//   * llvm.nvvm.bitcast.ll2d --> ibid.
 
 def llvm_global_ptr_ty  : LLVMQualPointerType<1>;  // (global)ptr
 def llvm_shared_ptr_ty  : LLVMQualPointerType<3>;  // (shared)ptr
@@ -1339,20 +1343,6 @@ let TargetPrefix = "nvvm" in {
   def int_nvvm_e5m2x2_to_f16x2_rn_relu : 
ClangBuiltin<"__nvvm_e5m2x2_to_f16x2_rn_relu">,
   Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>;
 
-//
-// Bitcast
-//
-
-  def int_nvvm_bitcast_f2i : ClangBuiltin<"__nvvm_bitcast_f2i">,
-  DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_float_ty], [IntrNoMem, 
IntrSpeculatable]>;
-  def int_nvvm_bitcast_i2f : ClangBuiltin<"__nvvm_bitcast_i2f">,
-  DefaultAttrsIntrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrNoMem, 
IntrSpeculatable]>;
-
-  def int_nvvm_bitcast_ll2d : ClangBuiltin<"__nvvm_bitcast_ll2d">,
-  DefaultAttrsIntrinsic<[llvm_double_ty], [llvm_i64_ty], [IntrNoMem, 
IntrSpeculatable]>;
-  def int_nvvm_bitcast_d2ll : ClangBuiltin<"__nvvm_bitcast_d2ll">,
-  DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_double_ty], [IntrNoMem, 
IntrSpeculatable]>;
-
 // FNS
 
   def int_nvvm_fns : ClangBuiltin<"__nvvm_fns">,
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 69dae5e32dbbe8..02d1d9d9f78984 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -1268,6 +1268,10 @@ static bool upgradeIntrinsicFunction1(Function *F, 
Function *&NewFn,
   else if (Name.consume_front("atomic.load.add."))
 // nvvm.atomic.load.add.{f32.p,f64.p}
 Expand = Name.starts_with("f32.p") || Name.starts_with("f64.p");
+  else if (Name.consume_front("bitcast."))
+// nvvm.bitcast.{f2i,i2f,ll2d,d2ll}
+Expand =
+Name == "f2i" || Name == "i2f" || Name == "ll2d" || Name == "d2ll";
   else
 Expand = false;
 
@@ -4258,6 +4262,10 @@ void llvm::UpgradeIntrinsicCall(CallBase *CI, Function 
*NewFn) {
F->getParent(), 
Intrinsic::convert_from_fp16,
{Builder.getFloatTy()}),
CI->getArgOperand(0), "h2f");
+  } else if (Name.consume_front("bitcast.") &&
+ (Name == "f2i" || Name == "i2f" || Name == "ll2d" ||
+  Name == "d2ll")) {
+Rep = Builder.CreateBitCast(CI->getArgOperand(0), CI->getType());
   } else {
 Intrinsic::ID IID = shouldUpgradeNVPTXBF16Intrinsic(Name);
 if (IID != Intrinsic::not_intrinsic &&
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td 
b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 0c883093dd0a54..5c2ef4fa417ac1 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -1561,20 +1561,6 @@ def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn Int16Regs:$a),
 def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn_relu Int16Regs:$a),
   (CVT_f16x2_e5m2x2 Int16Regs:$a, CvtRN_RELU)>;
 
-//
-// Bitcast
-//
-
-def INT_NVVM_BITCAST_F2I : F_MATH_1<"mov.b32 \t$dst, $src0;", Int32Regs,
-  Float32Regs, int_nvvm_bitcast_f2i>;
-def INT_NVVM_BITCAST_I2

[clang] [llvm] [NVPTX] Remove nvvm.ldg.global.* intrinsics (PR #112834)

2024-10-19 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/112834

>From 0b43fa7364bf45515905d98cd0731c5509de5196 Mon Sep 17 00:00:00 2001
From: Alex Maclean 
Date: Thu, 17 Oct 2024 16:49:24 +
Subject: [PATCH 1/2] [NVPTX] Remove nvvm.ldg.global.* intrinsics

---
 clang/lib/CodeGen/CGBuiltin.cpp   |  45 +++--
 .../builtins-nvptx-native-half-type-native.c  |   4 +-
 .../CodeGen/builtins-nvptx-native-half-type.c |   4 +-
 clang/test/CodeGen/builtins-nvptx.c   |  72 +++
 llvm/include/llvm/IR/IntrinsicsNVVM.td|  18 +-
 llvm/lib/IR/AutoUpgrade.cpp   |  14 ++
 llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp   | 189 +++---
 llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp   |  55 +
 llvm/lib/Target/NVPTX/NVPTXISelLowering.h |   2 -
 .../Assembler/auto_upgrade_nvvm_intrinsics.ll |  31 +++
 10 files changed, 188 insertions(+), 246 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index f6d7db2c204c12..3b42977b578e15 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -20473,7 +20473,7 @@ static NVPTXMmaInfo getNVPTXMmaInfo(unsigned BuiltinID) 
{
 #undef MMA_VARIANTS_B1_XOR
 }
 
-static Value *MakeLdgLdu(unsigned IntrinsicID, CodeGenFunction &CGF,
+static Value *MakeLdu(unsigned IntrinsicID, CodeGenFunction &CGF,
  const CallExpr *E) {
   Value *Ptr = CGF.EmitScalarExpr(E->getArg(0));
   QualType ArgType = E->getArg(0)->getType();
@@ -20484,6 +20484,21 @@ static Value *MakeLdgLdu(unsigned IntrinsicID, 
CodeGenFunction &CGF,
   {Ptr, ConstantInt::get(CGF.Builder.getInt32Ty(), Align.getQuantity())});
 }
 
+static Value *MakeLdg(CodeGenFunction &CGF, const CallExpr *E) {
+  Value *Ptr = CGF.EmitScalarExpr(E->getArg(0));
+  QualType ArgType = E->getArg(0)->getType();
+  clang::CharUnits AlignV = CGF.CGM.getNaturalPointeeTypeAlignment(ArgType);
+  llvm::Type *ElemTy = CGF.ConvertTypeForMem(ArgType->getPointeeType());
+
+  // Use addrspace(1) for NVPTX ADDRESS_SPACE_GLOBAL
+  auto *ASC = CGF.Builder.CreateAddrSpaceCast(Ptr, CGF.Builder.getPtrTy(1));
+  auto *LD = CGF.Builder.CreateAlignedLoad(ElemTy, ASC, AlignV.getAsAlign());
+  MDNode *MD = MDNode::get(CGF.Builder.getContext(), {});
+  LD->setMetadata(LLVMContext::MD_invariant_load, MD);
+
+  return LD;
+}
+
 static Value *MakeScopedAtomic(unsigned IntrinsicID, CodeGenFunction &CGF,
const CallExpr *E) {
   Value *Ptr = CGF.EmitScalarExpr(E->getArg(0));
@@ -20517,9 +20532,11 @@ static Value *MakeHalfType(unsigned IntrinsicID, 
unsigned BuiltinID,
 return nullptr;
   }
 
-  if (IntrinsicID == Intrinsic::nvvm_ldg_global_f ||
-  IntrinsicID == Intrinsic::nvvm_ldu_global_f)
-return MakeLdgLdu(IntrinsicID, CGF, E);
+  if (BuiltinID == NVPTX::BI__nvvm_ldg_h || BuiltinID == 
NVPTX::BI__nvvm_ldg_h2)
+return MakeLdg(CGF, E);
+
+  if (IntrinsicID == Intrinsic::nvvm_ldu_global_f)
+return MakeLdu(IntrinsicID, CGF, E);
 
   SmallVector Args;
   auto *F = CGF.CGM.getIntrinsic(IntrinsicID);
@@ -20656,16 +20673,15 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned 
BuiltinID,
   case NVPTX::BI__nvvm_ldg_ul2:
   case NVPTX::BI__nvvm_ldg_ull:
   case NVPTX::BI__nvvm_ldg_ull2:
-// PTX Interoperability section 2.2: "For a vector with an even number of
-// elements, its alignment is set to number of elements times the alignment
-// of its member: n*alignof(t)."
-return MakeLdgLdu(Intrinsic::nvvm_ldg_global_i, *this, E);
   case NVPTX::BI__nvvm_ldg_f:
   case NVPTX::BI__nvvm_ldg_f2:
   case NVPTX::BI__nvvm_ldg_f4:
   case NVPTX::BI__nvvm_ldg_d:
   case NVPTX::BI__nvvm_ldg_d2:
-return MakeLdgLdu(Intrinsic::nvvm_ldg_global_f, *this, E);
+// PTX Interoperability section 2.2: "For a vector with an even number of
+// elements, its alignment is set to number of elements times the alignment
+// of its member: n*alignof(t)."
+return MakeLdg(*this, E);
 
   case NVPTX::BI__nvvm_ldu_c:
   case NVPTX::BI__nvvm_ldu_sc:
@@ -20696,13 +20712,13 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned 
BuiltinID,
   case NVPTX::BI__nvvm_ldu_ul2:
   case NVPTX::BI__nvvm_ldu_ull:
   case NVPTX::BI__nvvm_ldu_ull2:
-return MakeLdgLdu(Intrinsic::nvvm_ldu_global_i, *this, E);
+return MakeLdu(Intrinsic::nvvm_ldu_global_i, *this, E);
   case NVPTX::BI__nvvm_ldu_f:
   case NVPTX::BI__nvvm_ldu_f2:
   case NVPTX::BI__nvvm_ldu_f4:
   case NVPTX::BI__nvvm_ldu_d:
   case NVPTX::BI__nvvm_ldu_d2:
-return MakeLdgLdu(Intrinsic::nvvm_ldu_global_f, *this, E);
+return MakeLdu(Intrinsic::nvvm_ldu_global_f, *this, E);
 
   case NVPTX::BI__nvvm_atom_cta_add_gen_i:
   case NVPTX::BI__nvvm_atom_cta_add_gen_l:
@@ -21176,14 +21192,11 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned 
BuiltinID,
 return MakeHalfType(Intrinsic::nvvm_fmin_xorsign_abs_f16x2, BuiltinID, E,
 *this);
   case NVPTX::BI__nvvm_ldg_h

[clang] [llvm] [NVPTX] Remove nvvm.ldg.global.* intrinsics (PR #112834)

2024-10-21 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/112834

>From 3c21269ad0b7be617b06cde5debe405f99ef17ef Mon Sep 17 00:00:00 2001
From: Alex Maclean 
Date: Thu, 17 Oct 2024 16:49:24 +
Subject: [PATCH 1/2] [NVPTX] Remove nvvm.ldg.global.* intrinsics

---
 clang/lib/CodeGen/CGBuiltin.cpp   |  45 +++--
 .../builtins-nvptx-native-half-type-native.c  |   4 +-
 .../CodeGen/builtins-nvptx-native-half-type.c |   4 +-
 clang/test/CodeGen/builtins-nvptx.c   |  72 +++
 llvm/include/llvm/IR/IntrinsicsNVVM.td|  18 +-
 llvm/lib/IR/AutoUpgrade.cpp   |  14 ++
 llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp   | 189 +++---
 llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp   |  55 +
 llvm/lib/Target/NVPTX/NVPTXISelLowering.h |   2 -
 .../Assembler/auto_upgrade_nvvm_intrinsics.ll |  31 +++
 10 files changed, 188 insertions(+), 246 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 1ad950798c2118..40a875ab29c900 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -20485,7 +20485,7 @@ static NVPTXMmaInfo getNVPTXMmaInfo(unsigned BuiltinID) 
{
 #undef MMA_VARIANTS_B1_XOR
 }
 
-static Value *MakeLdgLdu(unsigned IntrinsicID, CodeGenFunction &CGF,
+static Value *MakeLdu(unsigned IntrinsicID, CodeGenFunction &CGF,
  const CallExpr *E) {
   Value *Ptr = CGF.EmitScalarExpr(E->getArg(0));
   QualType ArgType = E->getArg(0)->getType();
@@ -20496,6 +20496,21 @@ static Value *MakeLdgLdu(unsigned IntrinsicID, 
CodeGenFunction &CGF,
   {Ptr, ConstantInt::get(CGF.Builder.getInt32Ty(), Align.getQuantity())});
 }
 
+static Value *MakeLdg(CodeGenFunction &CGF, const CallExpr *E) {
+  Value *Ptr = CGF.EmitScalarExpr(E->getArg(0));
+  QualType ArgType = E->getArg(0)->getType();
+  clang::CharUnits AlignV = CGF.CGM.getNaturalPointeeTypeAlignment(ArgType);
+  llvm::Type *ElemTy = CGF.ConvertTypeForMem(ArgType->getPointeeType());
+
+  // Use addrspace(1) for NVPTX ADDRESS_SPACE_GLOBAL
+  auto *ASC = CGF.Builder.CreateAddrSpaceCast(Ptr, CGF.Builder.getPtrTy(1));
+  auto *LD = CGF.Builder.CreateAlignedLoad(ElemTy, ASC, AlignV.getAsAlign());
+  MDNode *MD = MDNode::get(CGF.Builder.getContext(), {});
+  LD->setMetadata(LLVMContext::MD_invariant_load, MD);
+
+  return LD;
+}
+
 static Value *MakeScopedAtomic(unsigned IntrinsicID, CodeGenFunction &CGF,
const CallExpr *E) {
   Value *Ptr = CGF.EmitScalarExpr(E->getArg(0));
@@ -20529,9 +20544,11 @@ static Value *MakeHalfType(unsigned IntrinsicID, 
unsigned BuiltinID,
 return nullptr;
   }
 
-  if (IntrinsicID == Intrinsic::nvvm_ldg_global_f ||
-  IntrinsicID == Intrinsic::nvvm_ldu_global_f)
-return MakeLdgLdu(IntrinsicID, CGF, E);
+  if (BuiltinID == NVPTX::BI__nvvm_ldg_h || BuiltinID == 
NVPTX::BI__nvvm_ldg_h2)
+return MakeLdg(CGF, E);
+
+  if (IntrinsicID == Intrinsic::nvvm_ldu_global_f)
+return MakeLdu(IntrinsicID, CGF, E);
 
   SmallVector Args;
   auto *F = CGF.CGM.getIntrinsic(IntrinsicID);
@@ -20668,16 +20685,15 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned 
BuiltinID,
   case NVPTX::BI__nvvm_ldg_ul2:
   case NVPTX::BI__nvvm_ldg_ull:
   case NVPTX::BI__nvvm_ldg_ull2:
-// PTX Interoperability section 2.2: "For a vector with an even number of
-// elements, its alignment is set to number of elements times the alignment
-// of its member: n*alignof(t)."
-return MakeLdgLdu(Intrinsic::nvvm_ldg_global_i, *this, E);
   case NVPTX::BI__nvvm_ldg_f:
   case NVPTX::BI__nvvm_ldg_f2:
   case NVPTX::BI__nvvm_ldg_f4:
   case NVPTX::BI__nvvm_ldg_d:
   case NVPTX::BI__nvvm_ldg_d2:
-return MakeLdgLdu(Intrinsic::nvvm_ldg_global_f, *this, E);
+// PTX Interoperability section 2.2: "For a vector with an even number of
+// elements, its alignment is set to number of elements times the alignment
+// of its member: n*alignof(t)."
+return MakeLdg(*this, E);
 
   case NVPTX::BI__nvvm_ldu_c:
   case NVPTX::BI__nvvm_ldu_sc:
@@ -20708,13 +20724,13 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned 
BuiltinID,
   case NVPTX::BI__nvvm_ldu_ul2:
   case NVPTX::BI__nvvm_ldu_ull:
   case NVPTX::BI__nvvm_ldu_ull2:
-return MakeLdgLdu(Intrinsic::nvvm_ldu_global_i, *this, E);
+return MakeLdu(Intrinsic::nvvm_ldu_global_i, *this, E);
   case NVPTX::BI__nvvm_ldu_f:
   case NVPTX::BI__nvvm_ldu_f2:
   case NVPTX::BI__nvvm_ldu_f4:
   case NVPTX::BI__nvvm_ldu_d:
   case NVPTX::BI__nvvm_ldu_d2:
-return MakeLdgLdu(Intrinsic::nvvm_ldu_global_f, *this, E);
+return MakeLdu(Intrinsic::nvvm_ldu_global_f, *this, E);
 
   case NVPTX::BI__nvvm_atom_cta_add_gen_i:
   case NVPTX::BI__nvvm_atom_cta_add_gen_l:
@@ -21188,14 +21204,11 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned 
BuiltinID,
 return MakeHalfType(Intrinsic::nvvm_fmin_xorsign_abs_f16x2, BuiltinID, E,
 *this);
   case NVPTX::BI__nvvm_ldg_h

[clang] [llvm] [NVPTX] Remove nvvm.ldg.global.* intrinsics (PR #112834)

2024-10-17 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean created 
https://github.com/llvm/llvm-project/pull/112834

Remove these intrinsics which can be better represented by load instructions 
with `!invariant.load` metadata:

- llvm.nvvm.ldg.global.i
- llvm.nvvm.ldg.global.f
- llvm.nvvm.ldg.global.p

>From 0b43fa7364bf45515905d98cd0731c5509de5196 Mon Sep 17 00:00:00 2001
From: Alex Maclean 
Date: Thu, 17 Oct 2024 16:49:24 +
Subject: [PATCH] [NVPTX] Remove nvvm.ldg.global.* intrinsics

---
 clang/lib/CodeGen/CGBuiltin.cpp   |  45 +++--
 .../builtins-nvptx-native-half-type-native.c  |   4 +-
 .../CodeGen/builtins-nvptx-native-half-type.c |   4 +-
 clang/test/CodeGen/builtins-nvptx.c   |  72 +++
 llvm/include/llvm/IR/IntrinsicsNVVM.td|  18 +-
 llvm/lib/IR/AutoUpgrade.cpp   |  14 ++
 llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp   | 189 +++---
 llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp   |  55 +
 llvm/lib/Target/NVPTX/NVPTXISelLowering.h |   2 -
 .../Assembler/auto_upgrade_nvvm_intrinsics.ll |  31 +++
 10 files changed, 188 insertions(+), 246 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index f6d7db2c204c12..3b42977b578e15 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -20473,7 +20473,7 @@ static NVPTXMmaInfo getNVPTXMmaInfo(unsigned BuiltinID) 
{
 #undef MMA_VARIANTS_B1_XOR
 }
 
-static Value *MakeLdgLdu(unsigned IntrinsicID, CodeGenFunction &CGF,
+static Value *MakeLdu(unsigned IntrinsicID, CodeGenFunction &CGF,
  const CallExpr *E) {
   Value *Ptr = CGF.EmitScalarExpr(E->getArg(0));
   QualType ArgType = E->getArg(0)->getType();
@@ -20484,6 +20484,21 @@ static Value *MakeLdgLdu(unsigned IntrinsicID, 
CodeGenFunction &CGF,
   {Ptr, ConstantInt::get(CGF.Builder.getInt32Ty(), Align.getQuantity())});
 }
 
+static Value *MakeLdg(CodeGenFunction &CGF, const CallExpr *E) {
+  Value *Ptr = CGF.EmitScalarExpr(E->getArg(0));
+  QualType ArgType = E->getArg(0)->getType();
+  clang::CharUnits AlignV = CGF.CGM.getNaturalPointeeTypeAlignment(ArgType);
+  llvm::Type *ElemTy = CGF.ConvertTypeForMem(ArgType->getPointeeType());
+
+  // Use addrspace(1) for NVPTX ADDRESS_SPACE_GLOBAL
+  auto *ASC = CGF.Builder.CreateAddrSpaceCast(Ptr, CGF.Builder.getPtrTy(1));
+  auto *LD = CGF.Builder.CreateAlignedLoad(ElemTy, ASC, AlignV.getAsAlign());
+  MDNode *MD = MDNode::get(CGF.Builder.getContext(), {});
+  LD->setMetadata(LLVMContext::MD_invariant_load, MD);
+
+  return LD;
+}
+
 static Value *MakeScopedAtomic(unsigned IntrinsicID, CodeGenFunction &CGF,
const CallExpr *E) {
   Value *Ptr = CGF.EmitScalarExpr(E->getArg(0));
@@ -20517,9 +20532,11 @@ static Value *MakeHalfType(unsigned IntrinsicID, 
unsigned BuiltinID,
 return nullptr;
   }
 
-  if (IntrinsicID == Intrinsic::nvvm_ldg_global_f ||
-  IntrinsicID == Intrinsic::nvvm_ldu_global_f)
-return MakeLdgLdu(IntrinsicID, CGF, E);
+  if (BuiltinID == NVPTX::BI__nvvm_ldg_h || BuiltinID == 
NVPTX::BI__nvvm_ldg_h2)
+return MakeLdg(CGF, E);
+
+  if (IntrinsicID == Intrinsic::nvvm_ldu_global_f)
+return MakeLdu(IntrinsicID, CGF, E);
 
   SmallVector Args;
   auto *F = CGF.CGM.getIntrinsic(IntrinsicID);
@@ -20656,16 +20673,15 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned 
BuiltinID,
   case NVPTX::BI__nvvm_ldg_ul2:
   case NVPTX::BI__nvvm_ldg_ull:
   case NVPTX::BI__nvvm_ldg_ull2:
-// PTX Interoperability section 2.2: "For a vector with an even number of
-// elements, its alignment is set to number of elements times the alignment
-// of its member: n*alignof(t)."
-return MakeLdgLdu(Intrinsic::nvvm_ldg_global_i, *this, E);
   case NVPTX::BI__nvvm_ldg_f:
   case NVPTX::BI__nvvm_ldg_f2:
   case NVPTX::BI__nvvm_ldg_f4:
   case NVPTX::BI__nvvm_ldg_d:
   case NVPTX::BI__nvvm_ldg_d2:
-return MakeLdgLdu(Intrinsic::nvvm_ldg_global_f, *this, E);
+// PTX Interoperability section 2.2: "For a vector with an even number of
+// elements, its alignment is set to number of elements times the alignment
+// of its member: n*alignof(t)."
+return MakeLdg(*this, E);
 
   case NVPTX::BI__nvvm_ldu_c:
   case NVPTX::BI__nvvm_ldu_sc:
@@ -20696,13 +20712,13 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned 
BuiltinID,
   case NVPTX::BI__nvvm_ldu_ul2:
   case NVPTX::BI__nvvm_ldu_ull:
   case NVPTX::BI__nvvm_ldu_ull2:
-return MakeLdgLdu(Intrinsic::nvvm_ldu_global_i, *this, E);
+return MakeLdu(Intrinsic::nvvm_ldu_global_i, *this, E);
   case NVPTX::BI__nvvm_ldu_f:
   case NVPTX::BI__nvvm_ldu_f2:
   case NVPTX::BI__nvvm_ldu_f4:
   case NVPTX::BI__nvvm_ldu_d:
   case NVPTX::BI__nvvm_ldu_d2:
-return MakeLdgLdu(Intrinsic::nvvm_ldu_global_f, *this, E);
+return MakeLdu(Intrinsic::nvvm_ldu_global_f, *this, E);
 
   case NVPTX::BI__nvvm_atom_cta_add_gen_i:
   case NVPTX::BI__nvvm_atom_cta_add_gen_l:
@@ -21176,14 +21192,11 @@ Value *CodeGenFunction:

[clang] [llvm] [NVPTX] Remove nvvm.ldg.global.* intrinsics (PR #112834)

2024-10-27 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean closed 
https://github.com/llvm/llvm-project/pull/112834
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Remove nvvm.bitcast.* intrinsics (PR #107936)

2024-09-23 Thread Alex MacLean via cfe-commits


@@ -599,14 +599,6 @@ TARGET_BUILTIN(__nvvm_e4m3x2_to_f16x2_rn_relu, "V2hs", "", 
AND(SM_89,PTX81))
 TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn, "V2hs", "", AND(SM_89,PTX81))
 TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn_relu, "V2hs", "", AND(SM_89,PTX81))
 
-// Bitcast

AlexMaclean wrote:

Thanks! I've just confirmed these do not work in nvcc.

https://github.com/llvm/llvm-project/pull/107936
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Remove nvvm.bitcast.* intrinsics (PR #107936)

2024-09-23 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean closed 
https://github.com/llvm/llvm-project/pull/107936
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Remove nvvm.bitcast.* intrinsics (PR #107936)

2024-09-23 Thread Alex MacLean via cfe-commits


@@ -599,14 +599,6 @@ TARGET_BUILTIN(__nvvm_e4m3x2_to_f16x2_rn_relu, "V2hs", "", 
AND(SM_89,PTX81))
 TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn, "V2hs", "", AND(SM_89,PTX81))
 TARGET_BUILTIN(__nvvm_e5m2x2_to_f16x2_rn_relu, "V2hs", "", AND(SM_89,PTX81))
 
-// Bitcast

AlexMaclean wrote:

@jlebar can you confirm it is okay to remove builtins like this? I'm doing this 
based on your commit 46624a822d3a3df4a4b6dff0d231acb45d269853. Just want to 
make sure I'm not missing something.

https://github.com/llvm/llvm-project/pull/107936
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)

2025-01-07 Thread Alex MacLean via cfe-commits

AlexMaclean wrote:

@Artem-B ping for review

https://github.com/llvm/llvm-project/pull/120806
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)

2025-01-07 Thread Alex MacLean via cfe-commits


@@ -10,8 +10,14 @@
 // CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[RET_ADDR]], align 8
 // CHECK-NEXT:store i32 1, ptr [[TMP0]], align 4
 // CHECK-NEXT:ret void
+//
 __attribute__((nvptx_kernel)) void foo(int *ret) {
   *ret = 1;
 }
 
-// CHECK: !0 = !{ptr @foo, !"kernel", i32 1}
+//.
+// CHECK: attributes #[[ATTR0]] = { convergent noinline nounwind optnone 
"no-trapping-math"="true" "stack-protector-buffer-size"="8" 
"target-cpu"="sm_61" "target-features"="+ptx32,+sm_61" }
+//.

AlexMaclean wrote:

Yep

https://github.com/llvm/llvm-project/pull/120806
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)

2025-01-07 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean closed 
https://github.com/llvm/llvm-project/pull/120806
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)

2025-02-05 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/125908

>From 12bdf8bfa72b10d1e8ccc305cd57c337f2799e52 Mon Sep 17 00:00:00 2001
From: Alex Maclean 
Date: Wed, 5 Feb 2025 18:46:03 +
Subject: [PATCH 1/2] [NVPTX] Convert scalar function nvvm.annotations to
 attributes

---
 clang/lib/CodeGen/Targets/NVPTX.cpp   | 15 ++---
 clang/test/CodeGenCUDA/launch-bounds.cu   | 32 ++
 llvm/docs/NVPTXUsage.rst  | 37 +++
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |  2 +-
 llvm/lib/IR/AutoUpgrade.cpp   | 16 +
 .../Target/NVPTX/NVPTXCtorDtorLowering.cpp|  9 +--
 llvm/lib/Target/NVPTX/NVPTXUtilities.cpp  | 13 +++-
 .../KernelInfo/launch-bounds/nvptx.ll |  4 +-
 llvm/test/CodeGen/NVPTX/annotations.ll| 12 +---
 llvm/test/CodeGen/NVPTX/lower-ctor-dtor.ll| 16 +++--
 llvm/test/CodeGen/NVPTX/maxclusterrank.ll |  8 +--
 .../CodeGen/NVPTX/upgrade-nvvm-annotations.ll | 64 +++
 .../Dialect/NVVM/NVVMToLLVMIRTranslation.cpp  |  7 +-
 .../LLVMIR/external-func-dialect-attr.mlir|  4 +-
 mlir/test/Target/LLVMIR/nvvmir.mlir   | 21 +++---
 15 files changed, 160 insertions(+), 100 deletions(-)

diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp 
b/clang/lib/CodeGen/Targets/NVPTX.cpp
index b82e4ddb9f3f2b..f89d32d4e13fe9 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -375,11 +375,8 @@ void 
CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F,
 if (MinBlocks > 0) {
   if (MinBlocksVal)
 *MinBlocksVal = MinBlocks.getExtValue();
-  if (F) {
-// Create !{, metadata !"minctasm", i32 } node
-NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "minctasm",
-MinBlocks.getExtValue());
-  }
+  if (F)
+F->addFnAttr("nvvm.minctasm", llvm::utostr(MinBlocks.getExtValue()));
 }
   }
   if (Attr->getMaxBlocks()) {
@@ -388,11 +385,9 @@ void 
CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F,
 if (MaxBlocks > 0) {
   if (MaxClusterRankVal)
 *MaxClusterRankVal = MaxBlocks.getExtValue();
-  if (F) {
-// Create !{, metadata !"maxclusterrank", i32 } node
-NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "maxclusterrank",
-MaxBlocks.getExtValue());
-  }
+  if (F)
+F->addFnAttr("nvvm.maxclusterrank",
+ llvm::utostr(MaxBlocks.getExtValue()));
 }
   }
 }
diff --git a/clang/test/CodeGenCUDA/launch-bounds.cu 
b/clang/test/CodeGenCUDA/launch-bounds.cu
index 31ca9216b413e9..72f7857264f8cf 100644
--- a/clang/test/CodeGenCUDA/launch-bounds.cu
+++ b/clang/test/CodeGenCUDA/launch-bounds.cu
@@ -9,6 +9,25 @@
 #define MAX_BLOCKS_PER_MP 4
 #endif
 
+// CHECK: @Kernel1() #[[ATTR0:[0-9]+]]
+// CHECK: @{{.*}}Kernel4{{.*}}() #[[ATTR0]]
+// CHECK: @{{.*}}Kernel5{{.*}}() #[[ATTR1:[0-9]+]]
+// CHECK: @{{.*}}Kernel6{{.*}}() #[[ATTR0]]
+// CHECK: @{{.*}}Kernel8{{.*}}() #[[ATTR3:[0-9]+]]
+
+// CHECK: attributes #[[ATTR0]] = {{{.*}} "nvvm.minctasm"="2" {{.*}}}
+// CHECK: attributes #[[ATTR1]] = {{{.*}} "nvvm.minctasm"="258" {{.*}}}
+// CHECK: attributes #[[ATTR3]] = {{{.*}} "nvvm.minctasm"="12" {{.*}}}
+
+// CHECK_MAX_BLOCKS: @Kernel1_sm_90() #[[ATTR4:[0-9]+]]
+// CHECK_MAX_BLOCKS: @{{.*}}Kernel4_sm_90{{.*}} #[[ATTR4]]
+// CHECK_MAX_BLOCKS: @{{.*}}Kernel5_sm_90{{.*}} #[[ATTR5:[0-9]+]]
+// CHECK_MAX_BLOCKS: @{{.*}}Kernel8_sm_90{{.*}} #[[ATTR6:[0-9]+]]
+
+// CHECK_MAX_BLOCKS: attributes #[[ATTR4]] = {{{.*}} "nvvm.maxclusterrank"="4" 
"nvvm.minctasm"="2" {{.*}}}
+// CHECK_MAX_BLOCKS: attributes #[[ATTR5]] = {{{.*}} 
"nvvm.maxclusterrank"="260" "nvvm.minctasm"="258" {{.*}}}
+// CHECK_MAX_BLOCKS: attributes #[[ATTR6]] = {{{.*}} 
"nvvm.maxclusterrank"="14" "nvvm.minctasm"="12" {{.*}}}
+
 // Test both max threads per block and Min cta per sm.
 extern "C" {
 __global__ void
@@ -19,7 +38,6 @@ Kernel1()
 }
 
 // CHECK: !{{[0-9]+}} = !{ptr @Kernel1, !"maxntidx", i32 256}
-// CHECK: !{{[0-9]+}} = !{ptr @Kernel1, !"minctasm", i32 2}
 
 #ifdef USE_MAX_BLOCKS
 // Test max threads per block and min/max cta per sm.
@@ -32,8 +50,6 @@ Kernel1_sm_90()
 }
 
 // CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"maxntidx", i32 256}
-// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"minctasm", i32 2}
-// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"maxclusterrank", 
i32 4}
 #endif // USE_MAX_BLOCKS
 
 // Test only max threads per block. Min cta per sm defaults to 0, and
@@ -67,7 +83,6 @@ Kernel4()
 template __global__ void Kernel4();
 
 // CHECK: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4{{.*}}, !"maxntidx", i32 256}
-// CHECK: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4{{.*}}, !"minctasm", i32 2}
 
 #ifdef USE_MAX_BLOCKS
 template 
@@ -79,8 +94,6 @@ Kernel4_sm_90()
 template __global__ void Kernel4_sm_90();
 
 // CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4_s

[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)

2025-02-05 Thread Alex MacLean via cfe-commits


@@ -179,6 +179,13 @@ static bool argHasNVVMAnnotation(const Value &Val,
   return false;
 }
 
+static std::optional getFnAttrParsedIntOrNull(const Function &F,
+StringRef Attr) {
+  if (F.hasFnAttribute(Attr))
+return F.getFnAttributeAsParsedInteger(Attr);
+  return std::nullopt;

AlexMaclean wrote:

Had to be a little more explicit to make the compiler happy but I've switched 
to a ternary as requested. 

https://github.com/llvm/llvm-project/pull/125908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)

2025-02-05 Thread Alex MacLean via cfe-commits


@@ -179,6 +179,13 @@ static bool argHasNVVMAnnotation(const Value &Val,
   return false;
 }
 
+static std::optional getFnAttrParsedIntOrNull(const Function &F,

AlexMaclean wrote:

Removed

https://github.com/llvm/llvm-project/pull/125908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)

2025-02-05 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean created 
https://github.com/llvm/llvm-project/pull/125908

Replace some more nvvm.annotations with function attributes, auto-upgrading the 
annotations as needed. These new attributes will be more idiomatic and 
compile-time efficient than the annotations. 

- !"maxclusterrank" / !"cluster_max_blocks" -> "nvvm.maxclusterrank"
- !"minctasm" -> "nvvm.minctasm"
- !"maxnreg" -> "nvvm.maxnreg"

>From 8dd9f3bbd91678ca8a56c5c62d65008faf5ff21f Mon Sep 17 00:00:00 2001
From: Alex Maclean 
Date: Wed, 5 Feb 2025 18:46:03 +
Subject: [PATCH] [NVPTX] Convert scalar function nvvm.annotations to
 attributes

---
 clang/lib/CodeGen/Targets/NVPTX.cpp   | 15 ++---
 clang/test/CodeGenCUDA/launch-bounds.cu   | 32 ++
 llvm/docs/NVPTXUsage.rst  | 37 +++
 llvm/lib/IR/AutoUpgrade.cpp   | 16 +
 .../Target/NVPTX/NVPTXCtorDtorLowering.cpp|  9 +--
 llvm/lib/Target/NVPTX/NVPTXUtilities.cpp  | 13 +++-
 .../KernelInfo/launch-bounds/nvptx.ll |  4 +-
 llvm/test/CodeGen/NVPTX/annotations.ll| 12 +---
 llvm/test/CodeGen/NVPTX/lower-ctor-dtor.ll| 16 +++--
 llvm/test/CodeGen/NVPTX/maxclusterrank.ll |  8 +--
 .../CodeGen/NVPTX/upgrade-nvvm-annotations.ll | 64 +++
 .../Dialect/NVVM/NVVMToLLVMIRTranslation.cpp  |  7 +-
 mlir/test/Target/LLVMIR/nvvmir.mlir   | 21 +++---
 13 files changed, 157 insertions(+), 97 deletions(-)

diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp 
b/clang/lib/CodeGen/Targets/NVPTX.cpp
index b82e4ddb9f3f2b..f89d32d4e13fe9 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -375,11 +375,8 @@ void 
CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F,
 if (MinBlocks > 0) {
   if (MinBlocksVal)
 *MinBlocksVal = MinBlocks.getExtValue();
-  if (F) {
-// Create !{, metadata !"minctasm", i32 } node
-NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "minctasm",
-MinBlocks.getExtValue());
-  }
+  if (F)
+F->addFnAttr("nvvm.minctasm", llvm::utostr(MinBlocks.getExtValue()));
 }
   }
   if (Attr->getMaxBlocks()) {
@@ -388,11 +385,9 @@ void 
CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F,
 if (MaxBlocks > 0) {
   if (MaxClusterRankVal)
 *MaxClusterRankVal = MaxBlocks.getExtValue();
-  if (F) {
-// Create !{, metadata !"maxclusterrank", i32 } node
-NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "maxclusterrank",
-MaxBlocks.getExtValue());
-  }
+  if (F)
+F->addFnAttr("nvvm.maxclusterrank",
+ llvm::utostr(MaxBlocks.getExtValue()));
 }
   }
 }
diff --git a/clang/test/CodeGenCUDA/launch-bounds.cu 
b/clang/test/CodeGenCUDA/launch-bounds.cu
index 31ca9216b413e9..72f7857264f8cf 100644
--- a/clang/test/CodeGenCUDA/launch-bounds.cu
+++ b/clang/test/CodeGenCUDA/launch-bounds.cu
@@ -9,6 +9,25 @@
 #define MAX_BLOCKS_PER_MP 4
 #endif
 
+// CHECK: @Kernel1() #[[ATTR0:[0-9]+]]
+// CHECK: @{{.*}}Kernel4{{.*}}() #[[ATTR0]]
+// CHECK: @{{.*}}Kernel5{{.*}}() #[[ATTR1:[0-9]+]]
+// CHECK: @{{.*}}Kernel6{{.*}}() #[[ATTR0]]
+// CHECK: @{{.*}}Kernel8{{.*}}() #[[ATTR3:[0-9]+]]
+
+// CHECK: attributes #[[ATTR0]] = {{{.*}} "nvvm.minctasm"="2" {{.*}}}
+// CHECK: attributes #[[ATTR1]] = {{{.*}} "nvvm.minctasm"="258" {{.*}}}
+// CHECK: attributes #[[ATTR3]] = {{{.*}} "nvvm.minctasm"="12" {{.*}}}
+
+// CHECK_MAX_BLOCKS: @Kernel1_sm_90() #[[ATTR4:[0-9]+]]
+// CHECK_MAX_BLOCKS: @{{.*}}Kernel4_sm_90{{.*}} #[[ATTR4]]
+// CHECK_MAX_BLOCKS: @{{.*}}Kernel5_sm_90{{.*}} #[[ATTR5:[0-9]+]]
+// CHECK_MAX_BLOCKS: @{{.*}}Kernel8_sm_90{{.*}} #[[ATTR6:[0-9]+]]
+
+// CHECK_MAX_BLOCKS: attributes #[[ATTR4]] = {{{.*}} "nvvm.maxclusterrank"="4" 
"nvvm.minctasm"="2" {{.*}}}
+// CHECK_MAX_BLOCKS: attributes #[[ATTR5]] = {{{.*}} 
"nvvm.maxclusterrank"="260" "nvvm.minctasm"="258" {{.*}}}
+// CHECK_MAX_BLOCKS: attributes #[[ATTR6]] = {{{.*}} 
"nvvm.maxclusterrank"="14" "nvvm.minctasm"="12" {{.*}}}
+
 // Test both max threads per block and Min cta per sm.
 extern "C" {
 __global__ void
@@ -19,7 +38,6 @@ Kernel1()
 }
 
 // CHECK: !{{[0-9]+}} = !{ptr @Kernel1, !"maxntidx", i32 256}
-// CHECK: !{{[0-9]+}} = !{ptr @Kernel1, !"minctasm", i32 2}
 
 #ifdef USE_MAX_BLOCKS
 // Test max threads per block and min/max cta per sm.
@@ -32,8 +50,6 @@ Kernel1_sm_90()
 }
 
 // CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"maxntidx", i32 256}
-// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"minctasm", i32 2}
-// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"maxclusterrank", 
i32 4}
 #endif // USE_MAX_BLOCKS
 
 // Test only max threads per block. Min cta per sm defaults to 0, and
@@ -67,7 +83,6 @@ Kernel4()
 template __global__ void Kernel4();
 
 // CHECK: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4{{.*}}, !"maxntidx", i32 256}
-// CHECK: !{{[0-9]+}} = !{

[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)

2025-02-05 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/125908

>From d66d8adac5cf32f7f9f5878799c0167d39f41df7 Mon Sep 17 00:00:00 2001
From: Alex Maclean 
Date: Wed, 5 Feb 2025 18:46:03 +
Subject: [PATCH] [NVPTX] Convert scalar function nvvm.annotations to
 attributes

---
 clang/lib/CodeGen/Targets/NVPTX.cpp   | 15 ++---
 clang/test/CodeGenCUDA/launch-bounds.cu   | 32 ++
 llvm/docs/NVPTXUsage.rst  | 37 +++
 llvm/lib/IR/AutoUpgrade.cpp   | 16 +
 .../Target/NVPTX/NVPTXCtorDtorLowering.cpp|  9 +--
 llvm/lib/Target/NVPTX/NVPTXUtilities.cpp  | 13 +++-
 .../KernelInfo/launch-bounds/nvptx.ll |  4 +-
 llvm/test/CodeGen/NVPTX/annotations.ll| 12 +---
 llvm/test/CodeGen/NVPTX/lower-ctor-dtor.ll| 16 +++--
 llvm/test/CodeGen/NVPTX/maxclusterrank.ll |  8 +--
 .../CodeGen/NVPTX/upgrade-nvvm-annotations.ll | 64 +++
 .../Dialect/NVVM/NVVMToLLVMIRTranslation.cpp  |  7 +-
 mlir/test/Target/LLVMIR/nvvmir.mlir   | 21 +++---
 13 files changed, 157 insertions(+), 97 deletions(-)

diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp 
b/clang/lib/CodeGen/Targets/NVPTX.cpp
index b82e4ddb9f3f2b..f89d32d4e13fe9 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -375,11 +375,8 @@ void 
CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F,
 if (MinBlocks > 0) {
   if (MinBlocksVal)
 *MinBlocksVal = MinBlocks.getExtValue();
-  if (F) {
-// Create !{, metadata !"minctasm", i32 } node
-NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "minctasm",
-MinBlocks.getExtValue());
-  }
+  if (F)
+F->addFnAttr("nvvm.minctasm", llvm::utostr(MinBlocks.getExtValue()));
 }
   }
   if (Attr->getMaxBlocks()) {
@@ -388,11 +385,9 @@ void 
CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F,
 if (MaxBlocks > 0) {
   if (MaxClusterRankVal)
 *MaxClusterRankVal = MaxBlocks.getExtValue();
-  if (F) {
-// Create !{, metadata !"maxclusterrank", i32 } node
-NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "maxclusterrank",
-MaxBlocks.getExtValue());
-  }
+  if (F)
+F->addFnAttr("nvvm.maxclusterrank",
+ llvm::utostr(MaxBlocks.getExtValue()));
 }
   }
 }
diff --git a/clang/test/CodeGenCUDA/launch-bounds.cu 
b/clang/test/CodeGenCUDA/launch-bounds.cu
index 31ca9216b413e9..72f7857264f8cf 100644
--- a/clang/test/CodeGenCUDA/launch-bounds.cu
+++ b/clang/test/CodeGenCUDA/launch-bounds.cu
@@ -9,6 +9,25 @@
 #define MAX_BLOCKS_PER_MP 4
 #endif
 
+// CHECK: @Kernel1() #[[ATTR0:[0-9]+]]
+// CHECK: @{{.*}}Kernel4{{.*}}() #[[ATTR0]]
+// CHECK: @{{.*}}Kernel5{{.*}}() #[[ATTR1:[0-9]+]]
+// CHECK: @{{.*}}Kernel6{{.*}}() #[[ATTR0]]
+// CHECK: @{{.*}}Kernel8{{.*}}() #[[ATTR3:[0-9]+]]
+
+// CHECK: attributes #[[ATTR0]] = {{{.*}} "nvvm.minctasm"="2" {{.*}}}
+// CHECK: attributes #[[ATTR1]] = {{{.*}} "nvvm.minctasm"="258" {{.*}}}
+// CHECK: attributes #[[ATTR3]] = {{{.*}} "nvvm.minctasm"="12" {{.*}}}
+
+// CHECK_MAX_BLOCKS: @Kernel1_sm_90() #[[ATTR4:[0-9]+]]
+// CHECK_MAX_BLOCKS: @{{.*}}Kernel4_sm_90{{.*}} #[[ATTR4]]
+// CHECK_MAX_BLOCKS: @{{.*}}Kernel5_sm_90{{.*}} #[[ATTR5:[0-9]+]]
+// CHECK_MAX_BLOCKS: @{{.*}}Kernel8_sm_90{{.*}} #[[ATTR6:[0-9]+]]
+
+// CHECK_MAX_BLOCKS: attributes #[[ATTR4]] = {{{.*}} "nvvm.maxclusterrank"="4" 
"nvvm.minctasm"="2" {{.*}}}
+// CHECK_MAX_BLOCKS: attributes #[[ATTR5]] = {{{.*}} 
"nvvm.maxclusterrank"="260" "nvvm.minctasm"="258" {{.*}}}
+// CHECK_MAX_BLOCKS: attributes #[[ATTR6]] = {{{.*}} 
"nvvm.maxclusterrank"="14" "nvvm.minctasm"="12" {{.*}}}
+
 // Test both max threads per block and Min cta per sm.
 extern "C" {
 __global__ void
@@ -19,7 +38,6 @@ Kernel1()
 }
 
 // CHECK: !{{[0-9]+}} = !{ptr @Kernel1, !"maxntidx", i32 256}
-// CHECK: !{{[0-9]+}} = !{ptr @Kernel1, !"minctasm", i32 2}
 
 #ifdef USE_MAX_BLOCKS
 // Test max threads per block and min/max cta per sm.
@@ -32,8 +50,6 @@ Kernel1_sm_90()
 }
 
 // CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"maxntidx", i32 256}
-// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"minctasm", i32 2}
-// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"maxclusterrank", 
i32 4}
 #endif // USE_MAX_BLOCKS
 
 // Test only max threads per block. Min cta per sm defaults to 0, and
@@ -67,7 +83,6 @@ Kernel4()
 template __global__ void Kernel4();
 
 // CHECK: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4{{.*}}, !"maxntidx", i32 256}
-// CHECK: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4{{.*}}, !"minctasm", i32 2}
 
 #ifdef USE_MAX_BLOCKS
 template 
@@ -79,8 +94,6 @@ Kernel4_sm_90()
 template __global__ void Kernel4_sm_90();
 
 // CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4_sm_90{{.*}}, 
!"maxntidx", i32 256}
-// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4_sm_90{{.*}}, 
!"minctas

[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)

2025-02-06 Thread Alex MacLean via cfe-commits


@@ -179,6 +179,13 @@ static bool argHasNVVMAnnotation(const Value &Val,
   return false;
 }
 
+static std::optional getFnAttrParsedInt(const Function &F,
+  StringRef Attr) {
+  return F.hasFnAttribute(Attr)
+ ? std::optional(F.getFnAttributeAsParsedInteger(Attr))
+ : std::nullopt;

AlexMaclean wrote:

No worries! I agree it is basically a wash and will leave it as it currently 
is. 

https://github.com/llvm/llvm-project/pull/125908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)

2025-02-11 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/125908

>From cb6ac07e72cc1361343470842793cf9bc4995a19 Mon Sep 17 00:00:00 2001
From: Alex Maclean 
Date: Wed, 5 Feb 2025 18:46:03 +
Subject: [PATCH 1/2] [NVPTX] Convert scalar function nvvm.annotations to
 attributes

---
 clang/lib/CodeGen/Targets/NVPTX.cpp   | 15 ++---
 clang/test/CodeGenCUDA/launch-bounds.cu   | 32 ++
 llvm/docs/NVPTXUsage.rst  | 37 +++
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |  2 +-
 llvm/lib/IR/AutoUpgrade.cpp   | 16 +
 .../Target/NVPTX/NVPTXCtorDtorLowering.cpp|  9 +--
 llvm/lib/Target/NVPTX/NVPTXUtilities.cpp  | 13 +++-
 .../KernelInfo/launch-bounds/nvptx.ll |  4 +-
 llvm/test/CodeGen/NVPTX/annotations.ll| 12 +---
 llvm/test/CodeGen/NVPTX/lower-ctor-dtor.ll| 16 +++--
 llvm/test/CodeGen/NVPTX/maxclusterrank.ll |  8 +--
 .../CodeGen/NVPTX/upgrade-nvvm-annotations.ll | 64 +++
 .../Dialect/NVVM/NVVMToLLVMIRTranslation.cpp  |  7 +-
 .../LLVMIR/external-func-dialect-attr.mlir|  4 +-
 mlir/test/Target/LLVMIR/nvvmir.mlir   | 21 +++---
 15 files changed, 160 insertions(+), 100 deletions(-)

diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp 
b/clang/lib/CodeGen/Targets/NVPTX.cpp
index b82e4ddb9f3f2..f89d32d4e13fe 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -375,11 +375,8 @@ void 
CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F,
 if (MinBlocks > 0) {
   if (MinBlocksVal)
 *MinBlocksVal = MinBlocks.getExtValue();
-  if (F) {
-// Create !{, metadata !"minctasm", i32 } node
-NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "minctasm",
-MinBlocks.getExtValue());
-  }
+  if (F)
+F->addFnAttr("nvvm.minctasm", llvm::utostr(MinBlocks.getExtValue()));
 }
   }
   if (Attr->getMaxBlocks()) {
@@ -388,11 +385,9 @@ void 
CodeGenModule::handleCUDALaunchBoundsAttr(llvm::Function *F,
 if (MaxBlocks > 0) {
   if (MaxClusterRankVal)
 *MaxClusterRankVal = MaxBlocks.getExtValue();
-  if (F) {
-// Create !{, metadata !"maxclusterrank", i32 } node
-NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "maxclusterrank",
-MaxBlocks.getExtValue());
-  }
+  if (F)
+F->addFnAttr("nvvm.maxclusterrank",
+ llvm::utostr(MaxBlocks.getExtValue()));
 }
   }
 }
diff --git a/clang/test/CodeGenCUDA/launch-bounds.cu 
b/clang/test/CodeGenCUDA/launch-bounds.cu
index 31ca9216b413e..72f7857264f8c 100644
--- a/clang/test/CodeGenCUDA/launch-bounds.cu
+++ b/clang/test/CodeGenCUDA/launch-bounds.cu
@@ -9,6 +9,25 @@
 #define MAX_BLOCKS_PER_MP 4
 #endif
 
+// CHECK: @Kernel1() #[[ATTR0:[0-9]+]]
+// CHECK: @{{.*}}Kernel4{{.*}}() #[[ATTR0]]
+// CHECK: @{{.*}}Kernel5{{.*}}() #[[ATTR1:[0-9]+]]
+// CHECK: @{{.*}}Kernel6{{.*}}() #[[ATTR0]]
+// CHECK: @{{.*}}Kernel8{{.*}}() #[[ATTR3:[0-9]+]]
+
+// CHECK: attributes #[[ATTR0]] = {{{.*}} "nvvm.minctasm"="2" {{.*}}}
+// CHECK: attributes #[[ATTR1]] = {{{.*}} "nvvm.minctasm"="258" {{.*}}}
+// CHECK: attributes #[[ATTR3]] = {{{.*}} "nvvm.minctasm"="12" {{.*}}}
+
+// CHECK_MAX_BLOCKS: @Kernel1_sm_90() #[[ATTR4:[0-9]+]]
+// CHECK_MAX_BLOCKS: @{{.*}}Kernel4_sm_90{{.*}} #[[ATTR4]]
+// CHECK_MAX_BLOCKS: @{{.*}}Kernel5_sm_90{{.*}} #[[ATTR5:[0-9]+]]
+// CHECK_MAX_BLOCKS: @{{.*}}Kernel8_sm_90{{.*}} #[[ATTR6:[0-9]+]]
+
+// CHECK_MAX_BLOCKS: attributes #[[ATTR4]] = {{{.*}} "nvvm.maxclusterrank"="4" 
"nvvm.minctasm"="2" {{.*}}}
+// CHECK_MAX_BLOCKS: attributes #[[ATTR5]] = {{{.*}} 
"nvvm.maxclusterrank"="260" "nvvm.minctasm"="258" {{.*}}}
+// CHECK_MAX_BLOCKS: attributes #[[ATTR6]] = {{{.*}} 
"nvvm.maxclusterrank"="14" "nvvm.minctasm"="12" {{.*}}}
+
 // Test both max threads per block and Min cta per sm.
 extern "C" {
 __global__ void
@@ -19,7 +38,6 @@ Kernel1()
 }
 
 // CHECK: !{{[0-9]+}} = !{ptr @Kernel1, !"maxntidx", i32 256}
-// CHECK: !{{[0-9]+}} = !{ptr @Kernel1, !"minctasm", i32 2}
 
 #ifdef USE_MAX_BLOCKS
 // Test max threads per block and min/max cta per sm.
@@ -32,8 +50,6 @@ Kernel1_sm_90()
 }
 
 // CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"maxntidx", i32 256}
-// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"minctasm", i32 2}
-// CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @Kernel1_sm_90, !"maxclusterrank", 
i32 4}
 #endif // USE_MAX_BLOCKS
 
 // Test only max threads per block. Min cta per sm defaults to 0, and
@@ -67,7 +83,6 @@ Kernel4()
 template __global__ void Kernel4();
 
 // CHECK: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4{{.*}}, !"maxntidx", i32 256}
-// CHECK: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4{{.*}}, !"minctasm", i32 2}
 
 #ifdef USE_MAX_BLOCKS
 template 
@@ -79,8 +94,6 @@ Kernel4_sm_90()
 template __global__ void Kernel4_sm_90();
 
 // CHECK_MAX_BLOCKS: !{{[0-9]+}} = !{ptr @{{.*}}Kernel4_sm_90

[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)

2025-02-12 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean closed 
https://github.com/llvm/llvm-project/pull/125908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)

2024-12-10 Thread Alex MacLean via cfe-commits


@@ -1270,77 +1270,21 @@ exit:
 ; MODULE: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind }
 ; MODULE: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree 
nounwind willreturn }
 ; MODULE: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind 
willreturn memory(inaccessiblemem: write) }
-; MODULE: attributes #[[ATTR4]] = { "kernel" }
-; MODULE: attributes #[[ATTR5]] = { nosync memory(none) }
+; MODULE: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" }
+; MODULE: attributes #[[ATTR5]] = { "kernel" }
+; MODULE: attributes #[[ATTR6]] = { nosync memory(none) }
 ;.
 ; CGSCC: attributes #[[ATTR0]] = { "llvm.assume"="ompx_aligned_barrier" }
 ; CGSCC: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind }
 ; CGSCC: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree 
nounwind willreturn }
 ; CGSCC: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind 
willreturn memory(inaccessiblemem: write) }
-; CGSCC: attributes #[[ATTR4]] = { "kernel" }
-; CGSCC: attributes #[[ATTR5]] = { nosync memory(none) }
+; CGSCC: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" }

AlexMaclean wrote:

The problem is that OpenMP seems to need to be able to draw a distinction 
between OpenMP kernels and nvvm kernels. For example here it seems like OpenMP 
only wants to look at "kernel" not "nvvm.kernel". As a result it seems like 
these attributes cannot be easily unified. 
https://github.com/llvm/llvm-project/blob/c835b48a4d72227b174bcd86f071238a1583803a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp#L5932-L5938

https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)

2024-12-10 Thread Alex MacLean via cfe-commits


@@ -324,14 +326,15 @@ MaybeAlign getAlign(const Function &F, unsigned Index) {
   F.getAttributes().getAttributes(Index).getStackAlignment())
 return StackAlign;
 
-  // If that is missing, check the legacy nvvm metadata
-  std::vector Vs;
-  bool retval = findAllNVVMAnnotation(&F, "align", Vs);
-  if (!retval)
-return std::nullopt;
-  for (unsigned V : Vs)
-if ((V >> 16) == Index)
-  return Align(V & 0x);
+  // check the legacy nvvm metadata only for the return value since llvm does
+  // not support stackalign attribute for this.
+  if (Index == 0) {
+std::vector Vs;
+if (findAllNVVMAnnotation(&F, "align", Vs))

AlexMaclean wrote:

Yea, I agree the NVVM annotation APIs could be cleaned up significantly, 
hopefully this work will remove the need for them altogether though.

https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)

2024-12-10 Thread Alex MacLean via cfe-commits


@@ -5022,6 +5022,69 @@ bool llvm::UpgradeDebugInfo(Module &M) {
   return Modified;
 }
 
+bool static upgradeSingleNVVMAnnotation(GlobalValue *GV, StringRef K,
+const Metadata *V) {
+  if (K == "kernel") {
+assert(mdconst::extract(V)->getZExtValue() == 1);
+cast(GV)->addFnAttr("nvvm.kernel");
+return true;
+  }
+  if (K == "align") {
+const uint64_t AlignBits = 
mdconst::extract(V)->getZExtValue();
+const unsigned Idx = (AlignBits >> 16);
+const Align StackAlign = Align(AlignBits & 0x);

AlexMaclean wrote:

Fixed

https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)

2024-12-09 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/119261

>From f9f30a77f5e7232f968a3063c34338c9dfc7bac5 Mon Sep 17 00:00:00 2001
From: Alex Maclean 
Date: Fri, 8 Nov 2024 22:39:34 +
Subject: [PATCH 1/3] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy
 annotations

---
 llvm/lib/Target/NVPTX/CMakeLists.txt  |   1 +
 llvm/lib/Target/NVPTX/NVPTX.h |   5 +
 llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp  |   4 +
 llvm/lib/Target/NVPTX/NVPTXUtilities.cpp  |   9 +-
 .../Target/NVPTX/NVVMUpgradeAnnotations.cpp   | 130 ++
 .../CodeGen/NVPTX/upgrade-nvvm-annotations.ll |  30 
 6 files changed, 177 insertions(+), 2 deletions(-)
 create mode 100644 llvm/lib/Target/NVPTX/NVVMUpgradeAnnotations.cpp
 create mode 100644 llvm/test/CodeGen/NVPTX/upgrade-nvvm-annotations.ll

diff --git a/llvm/lib/Target/NVPTX/CMakeLists.txt 
b/llvm/lib/Target/NVPTX/CMakeLists.txt
index 693365161330f5..bb2e4ad48b51d8 100644
--- a/llvm/lib/Target/NVPTX/CMakeLists.txt
+++ b/llvm/lib/Target/NVPTX/CMakeLists.txt
@@ -39,6 +39,7 @@ set(NVPTXCodeGen_sources
   NVVMReflect.cpp
   NVPTXProxyRegErasure.cpp
   NVPTXCtorDtorLowering.cpp
+  NVVMUpgradeAnnotations.cpp
   )
 
 add_llvm_target(NVPTXCodeGen
diff --git a/llvm/lib/Target/NVPTX/NVPTX.h b/llvm/lib/Target/NVPTX/NVPTX.h
index ca915cd3f3732f..53418148be3615 100644
--- a/llvm/lib/Target/NVPTX/NVPTX.h
+++ b/llvm/lib/Target/NVPTX/NVPTX.h
@@ -52,6 +52,7 @@ FunctionPass *createNVPTXLowerUnreachablePass(bool 
TrapUnreachable,
   bool NoTrapAfterNoreturn);
 MachineFunctionPass *createNVPTXPeephole();
 MachineFunctionPass *createNVPTXProxyRegErasurePass();
+ModulePass *createNVVMUpgradeAnnotationsPass();
 
 struct NVVMIntrRangePass : PassInfoMixin {
   PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
@@ -74,6 +75,10 @@ struct NVPTXCopyByValArgsPass : 
PassInfoMixin {
   PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
 };
 
+struct NVVMUpgradeAnnotationsPass : PassInfoMixin {
+  PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
+};
+
 namespace NVPTX {
 enum DrvInterface {
   NVCL,
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp 
b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
index a5c5e9420ee737..b4fd36625adc9c 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
@@ -254,6 +254,8 @@ void 
NVPTXTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {
 
   PB.registerPipelineStartEPCallback(
   [this](ModulePassManager &PM, OptimizationLevel Level) {
+PM.addPass(NVVMUpgradeAnnotationsPass());
+
 FunctionPassManager FPM;
 FPM.addPass(NVVMReflectPass(Subtarget.getSmVersion()));
 // Note: NVVMIntrRangePass was causing numerical discrepancies at one
@@ -349,6 +351,8 @@ void NVPTXPassConfig::addIRPasses() {
   AAR.addAAResult(WrapperPass->getResult());
   }));
 
+  addPass(createNVVMUpgradeAnnotationsPass());
+
   // NVVMReflectPass is added in addEarlyAsPossiblePasses, so hopefully running
   // it here does nothing.  But since we need it for correctness when lowering
   // to NVPTX, run it here too, in case whoever built our pass pipeline didn't
diff --git a/llvm/lib/Target/NVPTX/NVPTXUtilities.cpp 
b/llvm/lib/Target/NVPTX/NVPTXUtilities.cpp
index 98bffd92a087b6..04e83576cbf958 100644
--- a/llvm/lib/Target/NVPTX/NVPTXUtilities.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXUtilities.cpp
@@ -311,11 +311,16 @@ std::optional getMaxNReg(const Function &F) {
 }
 
 bool isKernelFunction(const Function &F) {
+  if (F.getCallingConv() == CallingConv::PTX_Kernel)
+return true;
+
+  if (F.hasFnAttribute("nvvm.kernel"))
+return true;
+
   if (const auto X = findOneNVVMAnnotation(&F, "kernel"))
 return (*X == 1);
 
-  // There is no NVVM metadata, check the calling convention
-  return F.getCallingConv() == CallingConv::PTX_Kernel;
+  return false;
 }
 
 MaybeAlign getAlign(const Function &F, unsigned Index) {
diff --git a/llvm/lib/Target/NVPTX/NVVMUpgradeAnnotations.cpp 
b/llvm/lib/Target/NVPTX/NVVMUpgradeAnnotations.cpp
new file mode 100644
index 00..ca550434835a2c
--- /dev/null
+++ b/llvm/lib/Target/NVPTX/NVVMUpgradeAnnotations.cpp
@@ -0,0 +1,130 @@
+//===- NVVMUpgradeAnnotations.cpp - Upgrade NVVM Annotations 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This pass replaces deprecated metadata in nvvm.annotation with a more modern
+// IR representation.
+//
+//===--===//
+
+#include "NVPTX.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/

[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)

2024-12-09 Thread Alex MacLean via cfe-commits


@@ -302,6 +299,19 @@ void NVPTXTargetCodeGenInfo::addNVVMMetadata(
   llvm::ConstantAsMetadata::get(GV), llvm::MDString::get(Ctx, Name),
   llvm::ConstantAsMetadata::get(
   llvm::ConstantInt::get(llvm::Type::getInt32Ty(Ctx), Operand))};
+  // Append metadata to nvvm.annotations
+  MD->addOperand(llvm::MDNode::get(Ctx, MDVals));
+}
+
+void NVPTXTargetCodeGenInfo::addNVVMGridConstantMetadata(
+llvm::GlobalValue *GV, const SmallVectorImpl &GridConstantArgs) {
+  llvm::Module *M = GV->getParent();
+  llvm::LLVMContext &Ctx = M->getContext();
+
+  // Get "nvvm.annotations" metadata node
+  llvm::NamedMDNode *MD = M->getOrInsertNamedMetadata("nvvm.annotations");

AlexMaclean wrote:

Yea, I completely agree, I think almost all the other nvvm.annotations can be 
converted as well. For this MR I want to lay down the framework for a couple 
and once that is setup it should be fairly trivial to convert all the others.

Specifically for grid_constant can we just upgrade to the existing `readonly` 
parameter attribute? 

https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)

2024-12-10 Thread Alex MacLean via cfe-commits


@@ -1270,77 +1270,21 @@ exit:
 ; MODULE: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind }
 ; MODULE: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree 
nounwind willreturn }
 ; MODULE: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind 
willreturn memory(inaccessiblemem: write) }
-; MODULE: attributes #[[ATTR4]] = { "kernel" }
-; MODULE: attributes #[[ATTR5]] = { nosync memory(none) }
+; MODULE: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" }
+; MODULE: attributes #[[ATTR5]] = { "kernel" }
+; MODULE: attributes #[[ATTR6]] = { nosync memory(none) }
 ;.
 ; CGSCC: attributes #[[ATTR0]] = { "llvm.assume"="ompx_aligned_barrier" }
 ; CGSCC: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind }
 ; CGSCC: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree 
nounwind willreturn }
 ; CGSCC: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind 
willreturn memory(inaccessiblemem: write) }
-; CGSCC: attributes #[[ATTR4]] = { "kernel" }
-; CGSCC: attributes #[[ATTR5]] = { nosync memory(none) }
+; CGSCC: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" }

AlexMaclean wrote:

Unfortunately, I think we do. "kernel" is really more like "OpenMP kernel" and 
the semantics for this do not seem to be a perfect match for "nvvm.kernel". For 
example, `@multiple_blocks_functions_non_kernel_effects_2` in this test has 
"kernel" but is not an nvvm kernel. I'm vary unfamiliar with the OpenMP 
semantics so I thought keeping it separate would be the safest approach, it 
also may be clearest to have a common "nvvm.*" prefix for all attributes 
currently represented as nvvm.annotations. 

https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)

2024-12-10 Thread Alex MacLean via cfe-commits


@@ -5022,6 +5022,69 @@ bool llvm::UpgradeDebugInfo(Module &M) {
   return Modified;
 }
 
+bool static upgradeSingleNVVMAnnotation(GlobalValue *GV, StringRef K,
+const Metadata *V) {
+  if (K == "kernel") {
+assert(mdconst::extract(V)->getZExtValue() == 1);
+cast(GV)->addFnAttr("nvvm.kernel");

AlexMaclean wrote:

Some annotations (such as "texture") are applied to global variables, not 
functions. I cannot unconditionally cast to a Function until confirming the 
annotation kind.

https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)

2024-12-10 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean edited 
https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)

2024-12-10 Thread Alex MacLean via cfe-commits


@@ -10,7 +10,7 @@
 extern "C"
 __device__ void device_function() {}
 
-// CHECK-LABEL: define{{.*}} void @global_function
+// CHECK: define{{.*}} void @global_function{{.*}} #[[ATTR0:[0-9]+]]

AlexMaclean wrote:

Fixed

https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)

2024-12-10 Thread Alex MacLean via cfe-commits


@@ -5022,6 +5022,69 @@ bool llvm::UpgradeDebugInfo(Module &M) {
   return Modified;
 }
 
+bool static upgradeSingleNVVMAnnotation(GlobalValue *GV, StringRef K,
+const Metadata *V) {
+  if (K == "kernel") {
+assert(mdconst::extract(V)->getZExtValue() == 1);
+cast(GV)->addFnAttr("nvvm.kernel");
+return true;
+  }
+  if (K == "align") {
+const uint64_t AlignBits = 
mdconst::extract(V)->getZExtValue();
+const unsigned Idx = (AlignBits >> 16);
+const Align StackAlign = Align(AlignBits & 0x);
+// TODO: Skip adding the stackalign attribute for returns, for now.
+if (!Idx)
+  return false;
+cast(GV)->addAttributeAtIndex(
+Idx, Attribute::getWithStackAlignment(GV->getContext(), StackAlign));
+return true;
+  }
+
+  return false;
+}
+
+void llvm::UpgradeNVVMAnnotations(Module &M) {
+  NamedMDNode *NamedMD = M.getNamedMetadata("nvvm.annotations");
+  if (!NamedMD)
+return;
+
+  SmallVector NewNodes;
+  SmallSet SeenNodes;
+  for (MDNode *MD : NamedMD->operands()) {
+if (SeenNodes.contains(MD))
+  continue;
+SeenNodes.insert(MD);
+
+auto *F = mdconst::dyn_extract_or_null(MD->getOperand(0));
+if (!F)
+  continue;
+
+assert(MD && "Invalid MDNode for annotation");
+assert((MD->getNumOperands() % 2) == 1 && "Invalid number of operands");
+
+SmallVector NewOperands;
+// start index = 1, to skip the global variable key
+// increment = 2, to skip the value for each property-value pairs

AlexMaclean wrote:

Fixed

https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)

2024-12-10 Thread Alex MacLean via cfe-commits


@@ -5022,6 +5022,69 @@ bool llvm::UpgradeDebugInfo(Module &M) {
   return Modified;
 }
 
+bool static upgradeSingleNVVMAnnotation(GlobalValue *GV, StringRef K,
+const Metadata *V) {
+  if (K == "kernel") {
+assert(mdconst::extract(V)->getZExtValue() == 1);
+cast(GV)->addFnAttr("nvvm.kernel");
+return true;
+  }
+  if (K == "align") {
+const uint64_t AlignBits = 
mdconst::extract(V)->getZExtValue();
+const unsigned Idx = (AlignBits >> 16);
+const Align StackAlign = Align(AlignBits & 0x);
+// TODO: Skip adding the stackalign attribute for returns, for now.
+if (!Idx)
+  return false;
+cast(GV)->addAttributeAtIndex(
+Idx, Attribute::getWithStackAlignment(GV->getContext(), StackAlign));
+return true;
+  }
+
+  return false;
+}
+
+void llvm::UpgradeNVVMAnnotations(Module &M) {
+  NamedMDNode *NamedMD = M.getNamedMetadata("nvvm.annotations");
+  if (!NamedMD)
+return;
+
+  SmallVector NewNodes;
+  SmallSet SeenNodes;
+  for (MDNode *MD : NamedMD->operands()) {
+if (SeenNodes.contains(MD))
+  continue;
+SeenNodes.insert(MD);
+
+auto *F = mdconst::dyn_extract_or_null(MD->getOperand(0));
+if (!F)
+  continue;
+
+assert(MD && "Invalid MDNode for annotation");
+assert((MD->getNumOperands() % 2) == 1 && "Invalid number of operands");
+
+SmallVector NewOperands;
+// start index = 1, to skip the global variable key
+// increment = 2, to skip the value for each property-value pairs
+for (unsigned j = 1, je = MD->getNumOperands(); j < je; j += 2) {
+  MDString *K = cast(MD->getOperand(j));
+  const MDOperand &V = MD->getOperand(j + 1);
+  bool Upgraded = upgradeSingleNVVMAnnotation(F, K->getString(), V);
+  if (!Upgraded)
+NewOperands.append({K, V});
+}
+
+if (!NewOperands.empty()) {
+  NewOperands.insert(NewOperands.begin(), MD->getOperand(0));

AlexMaclean wrote:

Fixed

https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)

2024-12-10 Thread Alex MacLean via cfe-commits


@@ -5911,31 +5911,21 @@ bool llvm::omp::isOpenMPKernel(Function &Fn) {
 
 KernelSet llvm::omp::getDeviceKernels(Module &M) {
   // TODO: Create a more cross-platform way of determining device kernels.
-  NamedMDNode *MD = M.getNamedMetadata("nvvm.annotations");
   KernelSet Kernels;
 
-  if (!MD)
-return Kernels;
-
-  for (auto *Op : MD->operands()) {
-if (Op->getNumOperands() < 2)
-  continue;
-MDString *KindID = dyn_cast(Op->getOperand(1));
-if (!KindID || KindID->getString() != "kernel")
-  continue;
-
-Function *KernelFn =
-mdconst::dyn_extract_or_null(Op->getOperand(0));
-if (!KernelFn)
-  continue;
-
-// We are only interested in OpenMP target regions. Others, such as kernels
-// generated by CUDA but linked together, are not interesting to this pass.
-if (isOpenMPKernel(*KernelFn)) {
-  ++NumOpenMPTargetRegionKernels;
-  Kernels.insert(KernelFn);
-} else
-  ++NumNonOpenMPTargetRegionKernels;
+  for (auto &F : M) {
+// TODO: unify this check with isKernelFunction in NVPTXUtilities.
+if (F.hasFnAttribute("nvvm.kernel")) {
+

AlexMaclean wrote:

Fixed

https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)

2024-12-10 Thread Alex MacLean via cfe-commits


@@ -324,14 +326,17 @@ MaybeAlign getAlign(const Function &F, unsigned Index) {
   F.getAttributes().getAttributes(Index).getStackAlignment())
 return StackAlign;
 
-  // If that is missing, check the legacy nvvm metadata
-  std::vector Vs;
-  bool retval = findAllNVVMAnnotation(&F, "align", Vs);
-  if (!retval)
-return std::nullopt;
-  for (unsigned V : Vs)
-if ((V >> 16) == Index)
-  return Align(V & 0x);
+  // check the legacy nvvm metadata only for the return value since llvm does
+  // not support stackalign attribute for this.
+  if (Index == 0) {
+std::vector Vs;
+bool retval = findAllNVVMAnnotation(&F, "align", Vs);
+if (!retval)

AlexMaclean wrote:

Fixed

https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations (PR #119261)

2024-12-10 Thread Alex MacLean via cfe-commits


@@ -5022,6 +5022,69 @@ bool llvm::UpgradeDebugInfo(Module &M) {
   return Modified;
 }
 
+bool static upgradeSingleNVVMAnnotation(GlobalValue *GV, StringRef K,
+const Metadata *V) {
+  if (K == "kernel") {
+assert(mdconst::extract(V)->getZExtValue() == 1);

AlexMaclean wrote:

Sounds good, fixed

https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Auto-Upgrade some nvvm.annotations to attributes (PR #119261)

2024-12-12 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean edited 
https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Auto-Upgrade some nvvm.annotations to attributes (PR #119261)

2024-12-12 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean edited 
https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Auto-Upgrade some nvvm.annotations to attributes (PR #119261)

2024-12-13 Thread Alex MacLean via cfe-commits


@@ -1270,77 +1270,21 @@ exit:
 ; MODULE: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind }
 ; MODULE: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree 
nounwind willreturn }
 ; MODULE: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind 
willreturn memory(inaccessiblemem: write) }
-; MODULE: attributes #[[ATTR4]] = { "kernel" }
-; MODULE: attributes #[[ATTR5]] = { nosync memory(none) }
+; MODULE: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" }
+; MODULE: attributes #[[ATTR5]] = { "kernel" }
+; MODULE: attributes #[[ATTR6]] = { nosync memory(none) }
 ;.
 ; CGSCC: attributes #[[ATTR0]] = { "llvm.assume"="ompx_aligned_barrier" }
 ; CGSCC: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind }
 ; CGSCC: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree 
nounwind willreturn }
 ; CGSCC: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind 
willreturn memory(inaccessiblemem: write) }
-; CGSCC: attributes #[[ATTR4]] = { "kernel" }
-; CGSCC: attributes #[[ATTR5]] = { nosync memory(none) }
+; CGSCC: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" }

AlexMaclean wrote:

There is a `ptx_kernel` calling convention which is an alternative to 
`nvvm.annoations` `!"kernel"` already. However, I don't think we can safely 
auto-upgrade to this in all cases, in the openMP example @jhuber6 provided 
above the function has both `amdgpu_kernel` and `"nvvm.kernel"` which would not 
be possible with `ptx_kernel` CC. Is there any way around this? if not an 
attribute seems like the only option. 

> The metadata use useful if we have cases where we really want fast lookup of 
> all the kernels in the TU.

I don't think there are any cases where we do this, there isn't even a function 
to traverse the metadata and find all the kernels (that I know of).  It's far 
more important to be able to quickly check if a function is a kernel, which the 
metadata solution is fairly slow for (there is a cache hacked on to try to 
mitigate this but that has other issues). In addition metadata should not be 
used to carry semantic information like this. 

https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Auto-Upgrade some nvvm.annotations to attributes (PR #119261)

2024-12-13 Thread Alex MacLean via cfe-commits


@@ -1270,77 +1270,21 @@ exit:
 ; MODULE: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind }
 ; MODULE: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree 
nounwind willreturn }
 ; MODULE: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind 
willreturn memory(inaccessiblemem: write) }
-; MODULE: attributes #[[ATTR4]] = { "kernel" }
-; MODULE: attributes #[[ATTR5]] = { nosync memory(none) }
+; MODULE: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" }
+; MODULE: attributes #[[ATTR5]] = { "kernel" }
+; MODULE: attributes #[[ATTR6]] = { nosync memory(none) }
 ;.
 ; CGSCC: attributes #[[ATTR0]] = { "llvm.assume"="ompx_aligned_barrier" }
 ; CGSCC: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind }
 ; CGSCC: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree 
nounwind willreturn }
 ; CGSCC: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind 
willreturn memory(inaccessiblemem: write) }
-; CGSCC: attributes #[[ATTR4]] = { "kernel" }
-; CGSCC: attributes #[[ATTR5]] = { nosync memory(none) }
+; CGSCC: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" }

AlexMaclean wrote:

I agree that `"omp_kernel"` seems like a much better name for the meaning we're 
currently signifying with the `"kernel"` attribute. 

> Realistically this should be a calling convention and not an attribute, but 
> there's a lot of historical cruft around it.

@jhuber6 are you saying that the `"kernel"` attribute should be a calling 
convention? or that `"nvvm.kernel"` should be (similar to `amdgpu_kernel`)?

https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)

2024-12-20 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean edited 
https://github.com/llvm/llvm-project/pull/120806
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)

2024-12-20 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean ready_for_review 
https://github.com/llvm/llvm-project/pull/120806
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)

2024-12-20 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean edited 
https://github.com/llvm/llvm-project/pull/120806
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Auto-Upgrade some nvvm.annotations to attributes (PR #119261)

2024-12-20 Thread Alex MacLean via cfe-commits


@@ -1270,77 +1270,21 @@ exit:
 ; MODULE: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind }
 ; MODULE: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree 
nounwind willreturn }
 ; MODULE: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind 
willreturn memory(inaccessiblemem: write) }
-; MODULE: attributes #[[ATTR4]] = { "kernel" }
-; MODULE: attributes #[[ATTR5]] = { nosync memory(none) }
+; MODULE: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" }
+; MODULE: attributes #[[ATTR5]] = { "kernel" }
+; MODULE: attributes #[[ATTR6]] = { nosync memory(none) }
 ;.
 ; CGSCC: attributes #[[ATTR0]] = { "llvm.assume"="ompx_aligned_barrier" }
 ; CGSCC: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind }
 ; CGSCC: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree 
nounwind willreturn }
 ; CGSCC: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind 
willreturn memory(inaccessiblemem: write) }
-; CGSCC: attributes #[[ATTR4]] = { "kernel" }
-; CGSCC: attributes #[[ATTR5]] = { nosync memory(none) }
+; CGSCC: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" }

AlexMaclean wrote:

Okay, fair enough. I'll start switch us over to a calling convention in 
https://github.com/llvm/llvm-project/pull/120806

https://github.com/llvm/llvm-project/pull/119261
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)

2024-12-20 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean edited 
https://github.com/llvm/llvm-project/pull/120806
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [OpenMP] Replace nvvm.annotation usage with kernel calling conventions (PR #122320)

2025-01-24 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean closed 
https://github.com/llvm/llvm-project/pull/122320
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)

2024-12-23 Thread Alex MacLean via cfe-commits


@@ -556,19 +556,16 @@ llvm.func @kernel_func() attributes {nvvm.kernel} {
   llvm.return
 }
 
-// CHECK: !nvvm.annotations =
-// CHECK-NOT: {ptr @nvvm_special_regs, !"kernel", i32 1}
-// CHECK: {ptr @kernel_func, !"kernel", i32 1}
+// CHECK: ptx_kernel void @kernel_func

AlexMaclean wrote:

This change does not remove support for specifying a kernel via the metadata. 
It simply updates frontends and tests to use a different one of the two already 
supported methods for marking kernels. Long term I hope to remove the support 
for metadata, so downstream users should move the calling-convention, but this 
change does not yet force that. 

https://github.com/llvm/llvm-project/pull/120806
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Switch front-ends and tests to ptx_kernel cc (PR #120806)

2024-12-23 Thread Alex MacLean via cfe-commits

AlexMaclean wrote:

> In MLIR, we also have other NVVM metadata such as `reqntid` and `maxntid`, 
> among others. What is the plan for these? Will they remain as metadata, or 
> will they be expressed differently?

Eventually, I hope to migrate all !nvvm.annotations, including `reqntid` and 
`maxntid`, to a more modern mechanism such as attributes, or at least metadata 
attached directly to the function/GV. !nvvm.annotations was added around llvm 3 
when target-specific attributes were not yet present. 

> Could you please elaborate on the compile-time improvements?

Auto-upgrading kernel metadata and no longer traversing !nvvm.annotations lead 
to around a 2% improvement in compile time for several cases in nvcc. This 
change alone won't have the same impact, since we still traverse the metadata 
for functions that do not have the `ptx_kernel` cc but it at least lets up bail 
out early some of the time and lays the foundation for bigger improvements.

https://github.com/llvm/llvm-project/pull/120806
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [OpenMP] Replace nvvm.annotation usage with kernel calling conventions (PR #122320)

2025-01-09 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean edited 
https://github.com/llvm/llvm-project/pull/122320
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [OpenMP] Replace nvvm.annotation usage with kernel calling conventions (PR #122320)

2025-01-15 Thread Alex MacLean via cfe-commits

AlexMaclean wrote:

@jdoerfert / @arsenm ping for review when you have a moment

https://github.com/llvm/llvm-project/pull/122320
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Convert scalar function nvvm.annotations to attributes (PR #125908)

2025-02-12 Thread Alex MacLean via cfe-commits


@@ -227,14 +228,14 @@ class NVVMDialectLLVMIRTranslationInterface
 } else if (attribute.getName() ==

AlexMaclean wrote:

Yes, I plan to replace all !nvvm.annotations with attributes. This change is 
already fairly large and I would prefer to avoid a single monolithic PR to make 
debugging any issues easier and to prevent unnecessary churn if it needs to be 
reverted. Would it be alright to address these now and the others in separate 
follow ups?

https://github.com/llvm/llvm-project/pull/125908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean approved this pull request.

LGTM, please wait for @Artem-B's approval before landing.

https://github.com/llvm/llvm-project/pull/134416
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 (PR #134111)

2025-04-08 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean closed 
https://github.com/llvm/llvm-project/pull/134111
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-08 Thread Alex MacLean via cfe-commits


@@ -1548,6 +1548,45 @@ let TargetPrefix = "nvvm" in {
   Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>;
   def int_nvvm_e5m2x2_to_f16x2_rn_relu : 
ClangBuiltin<"__nvvm_e5m2x2_to_f16x2_rn_relu">,
   Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>;
+  
+  def int_nvvm_ff_to_e2m3x2_rn : ClangBuiltin<"__nvvm_ff_to_e2m3x2_rn">,
+  Intrinsic<[llvm_i16_ty], [llvm_float_ty, llvm_float_ty], [IntrNoMem, 
IntrNoCallback]>;
+  def int_nvvm_ff_to_e2m3x2_rn_relu : 
ClangBuiltin<"__nvvm_ff_to_e2m3x2_rn_relu">,

AlexMaclean wrote:

It seems like `f32x2` would be a clearer name than `ff` for these. This would 
also be more consistent with the affix used for f16.

https://github.com/llvm/llvm-project/pull/134345
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-08 Thread Alex MacLean via cfe-commits


@@ -1548,6 +1548,45 @@ let TargetPrefix = "nvvm" in {
   Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>;
   def int_nvvm_e5m2x2_to_f16x2_rn_relu : 
ClangBuiltin<"__nvvm_e5m2x2_to_f16x2_rn_relu">,
   Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>;
+  
+  def int_nvvm_ff_to_e2m3x2_rn : ClangBuiltin<"__nvvm_ff_to_e2m3x2_rn">,

AlexMaclean wrote:

It looks like there is a lot of copy/paste boilerplate here that can be folded 
away with a few foreach loops or multi-classes.

https://github.com/llvm/llvm-project/pull/134345
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-08 Thread Alex MacLean via cfe-commits


@@ -1944,6 +1944,62 @@ def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn Int16Regs:$a),
 def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn_relu Int16Regs:$a),
   (CVT_f16x2_e5m2x2 $a, CvtRN_RELU)>;
 
+def : Pat<(int_nvvm_ff_to_e2m3x2_rn f32:$a, f32:$b),
+  (CVT_e2m3x2_f32 $a, $b, CvtRN)>,
+  Requires<[hasPTX<86>, hasSM<100>, hasArchAccelFeatures]>;
+def : Pat<(int_nvvm_ff_to_e2m3x2_rn_relu f32:$a, f32:$b),
+  (CVT_e2m3x2_f32 $a, $b, CvtRN_RELU)>,
+  Requires<[hasPTX<86>, hasSM<100>, hasArchAccelFeatures]>;
+def : Pat<(int_nvvm_ff_to_e3m2x2_rn f32:$a, f32:$b),
+  (CVT_e3m2x2_f32 $a, $b, CvtRN)>,
+  Requires<[hasPTX<86>, hasSM<100>, hasArchAccelFeatures]>;
+def : Pat<(int_nvvm_ff_to_e3m2x2_rn_relu f32:$a, f32:$b),
+  (CVT_e3m2x2_f32 $a, $b, CvtRN_RELU)>,
+  Requires<[hasPTX<86>, hasSM<100>, hasArchAccelFeatures]>;
+
+def : Pat<(int_nvvm_e2m3x2_to_f16x2_rn Int16Regs:$a),

AlexMaclean wrote:

Instead of using a Register class in the input pattern, use whatever type we 
expect this to be, in this case `i16` I assume.

https://github.com/llvm/llvm-project/pull/134345
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-08 Thread Alex MacLean via cfe-commits


@@ -703,6 +703,53 @@ let hasSideEffects = false in {
   defm CVT_to_tf32_rz_satf : CVT_TO_TF32<"rz.satfinite", [hasPTX<86>, 
hasSM<100>]>;
   defm CVT_to_tf32_rn_relu_satf  : CVT_TO_TF32<"rn.relu.satfinite", 
[hasPTX<86>, hasSM<100>]>;
   defm CVT_to_tf32_rz_relu_satf  : CVT_TO_TF32<"rz.relu.satfinite", 
[hasPTX<86>, hasSM<100>]>;
+
+  // FP6 conversions.
+  multiclass CVT_TO_F6X2 {

AlexMaclean wrote:

Since this only has a single string parameter and is called twice, I think it 
would be a bit clearer to simply use a foreach loop here (and for the below 
cases as well).

https://github.com/llvm/llvm-project/pull/134345
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 (PR #134111)

2025-04-08 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean updated 
https://github.com/llvm/llvm-project/pull/134111

>From 46de785e801bf8ca87e01aee9ad0a13ac07a47d6 Mon Sep 17 00:00:00 2001
From: Alex Maclean 
Date: Tue, 1 Apr 2025 20:22:24 +
Subject: [PATCH] [NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32

---
 clang/lib/CodeGen/TargetBuiltins/NVPTX.cpp| 18 ++-
 clang/test/CodeGen/builtins-nvptx.c   |  4 +-
 llvm/include/llvm/IR/IntrinsicsNVVM.td| 10 +---
 .../include/llvm/Target/TargetSelectionDAG.td |  2 +
 llvm/lib/IR/AutoUpgrade.cpp   |  9 
 llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp   | 15 --
 llvm/lib/Target/NVPTX/NVPTXIntrinsics.td  |  4 +-
 .../Target/NVPTX/NVPTXTargetTransformInfo.cpp | 52 +--
 .../Assembler/auto_upgrade_nvvm_intrinsics.ll | 16 +-
 llvm/test/CodeGen/NVPTX/atomics.ll| 36 -
 10 files changed, 107 insertions(+), 59 deletions(-)

diff --git a/clang/lib/CodeGen/TargetBuiltins/NVPTX.cpp 
b/clang/lib/CodeGen/TargetBuiltins/NVPTX.cpp
index aaac19b229905..0f7ab9fd3b099 100644
--- a/clang/lib/CodeGen/TargetBuiltins/NVPTX.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/NVPTX.cpp
@@ -481,21 +481,11 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned 
BuiltinID,
AtomicOrdering::SequentiallyConsistent);
   }
 
-  case NVPTX::BI__nvvm_atom_inc_gen_ui: {
-Value *Ptr = EmitScalarExpr(E->getArg(0));
-Value *Val = EmitScalarExpr(E->getArg(1));
-Function *FnALI32 =
-CGM.getIntrinsic(Intrinsic::nvvm_atomic_load_inc_32, Ptr->getType());
-return Builder.CreateCall(FnALI32, {Ptr, Val});
-  }
+  case NVPTX::BI__nvvm_atom_inc_gen_ui:
+return MakeBinaryAtomicValue(*this, llvm::AtomicRMWInst::UIncWrap, E);
 
-  case NVPTX::BI__nvvm_atom_dec_gen_ui: {
-Value *Ptr = EmitScalarExpr(E->getArg(0));
-Value *Val = EmitScalarExpr(E->getArg(1));
-Function *FnALD32 =
-CGM.getIntrinsic(Intrinsic::nvvm_atomic_load_dec_32, Ptr->getType());
-return Builder.CreateCall(FnALD32, {Ptr, Val});
-  }
+  case NVPTX::BI__nvvm_atom_dec_gen_ui:
+return MakeBinaryAtomicValue(*this, llvm::AtomicRMWInst::UDecWrap, E);
 
   case NVPTX::BI__nvvm_ldg_c:
   case NVPTX::BI__nvvm_ldg_sc:
diff --git a/clang/test/CodeGen/builtins-nvptx.c 
b/clang/test/CodeGen/builtins-nvptx.c
index ffa41c85c2734..71b29849618b6 100644
--- a/clang/test/CodeGen/builtins-nvptx.c
+++ b/clang/test/CodeGen/builtins-nvptx.c
@@ -333,10 +333,10 @@ __device__ void nvvm_atom(float *fp, float f, double 
*dfp, double df,
   // CHECK: atomicrmw fadd ptr {{.*}} seq_cst, align 4
   __nvvm_atom_add_gen_f(fp, f);
 
-  // CHECK: call i32 @llvm.nvvm.atomic.load.inc.32.p0
+  // CHECK: atomicrmw uinc_wrap ptr {{.*}} seq_cst, align 4
   __nvvm_atom_inc_gen_ui(uip, ui);
 
-  // CHECK: call i32 @llvm.nvvm.atomic.load.dec.32.p0
+  // CHECK: atomicrmw udec_wrap ptr {{.*}} seq_cst, align 4
   __nvvm_atom_dec_gen_ui(uip, ui);
 
 
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td 
b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 3e9588a515c9e..4aeb1d8a2779e 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -124,6 +124,8 @@
 //   * llvm.nvvm.ldg.global.f--> ibid.
 //   * llvm.nvvm.ldg.global.p--> ibid.
 //   * llvm.nvvm.swap.lo.hi.b64  --> llvm.fshl(x, x, 32)
+//   * llvm.nvvm.atomic.load.inc.32  --> atomicrmw uinc_wrap
+//   * llvm.nvvm.atomic.load.dec.32  --> atomicrmw udec_wrap
 
 def llvm_global_ptr_ty  : LLVMQualPointerType<1>;  // (global)ptr
 def llvm_shared_ptr_ty  : LLVMQualPointerType<3>;  // (shared)ptr
@@ -1633,14 +1635,6 @@ let TargetPrefix = "nvvm" in {
   DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty, 
llvm_i32_ty],
 [IntrNoMem]>;
 
-// Atomics not available as llvm intrinsics.
-  def int_nvvm_atomic_load_inc_32 : Intrinsic<[llvm_i32_ty],
-  [llvm_anyptr_ty, llvm_i32_ty],
-  [IntrArgMemOnly, IntrNoCallback, 
NoCapture>]>;
-  def int_nvvm_atomic_load_dec_32 : Intrinsic<[llvm_i32_ty],
-  [llvm_anyptr_ty, llvm_i32_ty],
-  [IntrArgMemOnly, IntrNoCallback, 
NoCapture>]>;
-
   class SCOPED_ATOMIC2_impl
 : Intrinsic<[elty],
   [llvm_anyptr_ty, LLVMMatchType<0>],
diff --git a/llvm/include/llvm/Target/TargetSelectionDAG.td 
b/llvm/include/llvm/Target/TargetSelectionDAG.td
index 42a5fbec95174..9c241b6c4df0f 100644
--- a/llvm/include/llvm/Target/TargetSelectionDAG.td
+++ b/llvm/include/llvm/Target/TargetSelectionDAG.td
@@ -1825,6 +1825,8 @@ defm atomic_load_min  : binary_atomic_op;
 defm atomic_load_max  : binary_atomic_op;
 defm atomic_load_umin : binary_atomic_op;
 defm atomic_load_umax : binary_atomic_op;
+defm atomic_load_uinc_wrap : binary_atomic_op;
+defm atomic_load_udec_wrap : binary_atomic_op;
 defm atomic_cmp_swap  : ternary_atomic_op;
 
 /// Atomic load which zeroes the excess high bits.
diff --git

[clang] [llvm] [NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 (PR #134111)

2025-04-08 Thread Alex MacLean via cfe-commits


@@ -2314,6 +2317,12 @@ static Value *upgradeNVVMIntrinsicCall(StringRef Name, 
CallBase *CI,
 Value *Val = CI->getArgOperand(1);
 Rep = Builder.CreateAtomicRMW(AtomicRMWInst::FAdd, Ptr, Val, MaybeAlign(),
   AtomicOrdering::SequentiallyConsistent);
+  } else if (Name.consume_front("atomic.load.") && Name.consume_back(".32")) {
+Value *Ptr = CI->getArgOperand(0);
+Value *Val = CI->getArgOperand(1);
+auto Op = Name == "inc" ? AtomicRMWInst::UIncWrap : 
AtomicRMWInst::UDecWrap;
+Rep = Builder.CreateAtomicRMW(Op, Ptr, Val, MaybeAlign(),
+  AtomicOrdering::SequentiallyConsistent);

AlexMaclean wrote:

Okay, sounds like there is a larger issue to address around the scope and 
semantics of atomics in NVPTX. This change maintains consistency with all other 
`atomicrmw` instructions and I think the larger bug can be addressed 
separately. 

https://github.com/llvm/llvm-project/pull/134111
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang][NVVM] Support `-f[no-]cuda-prec-sqrt` and propagate precision flag to `NVVMReflect` (PR #134244)

2025-04-08 Thread Alex MacLean via cfe-commits

AlexMaclean wrote:

It seems like we already have perhaps too many mechanisms to control how sqrt 
gets lowered. There is the `__nv_sqrtf` libdevice function which chooses 
between specific (1:1 to PTX) intrinsics based on NVVMReflect and then there is 
also `llvm.sqrt` and `nvvm.sqrt.f` which are lowered and optimized based on 
command-line options and function and instruction level flags, each in its own 
way. 

I think for more fine grained responsiveness to instruction and function level 
options it makes sense to use the existing intrinsics. While, it is consistent 
with the existing design to treat NVVMReflect as operating globally across the 
entire module. I'm not sure it makes sense to introduce a new module flag and 
clang cl opt though...

I personally agree with @Artem-B that `__nv_sqrtf`+NVVMReflect may not be the 
way to go. Using one of the intrinsics seems like a better approach but I may 
be missing something.

https://github.com/llvm/llvm-project/pull/134244
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-17 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean closed 
https://github.com/llvm/llvm-project/pull/135644
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-18 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean edited 
https://github.com/llvm/llvm-project/pull/135444
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-18 Thread Alex MacLean via cfe-commits


@@ -0,0 +1,48 @@
+; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_80 | FileCheck %s 
-check-prefixes=ALL,NOPTRCONV,CLS64

AlexMaclean wrote:

Use update_llc_test_checks for this test.

https://github.com/llvm/llvm-project/pull/135444
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-18 Thread Alex MacLean via cfe-commits


@@ -2381,29 +2387,41 @@ def INT_PTX_LDG_G_v4i32_ELE : VLDG_G_ELE_V4<"u32", 
Int32Regs>;
 def INT_PTX_LDG_G_v4f32_ELE : VLDG_G_ELE_V4<"f32", Float32Regs>;
 
 
-multiclass NG_TO_G {
+multiclass NG_TO_G Preds = []> {
def "" : NVPTXInst<(outs Int32Regs:$result), (ins Int32Regs:$src),
-  "cvta." # Str # ".u32 \t$result, $src;", []>;
+  "cvta." # Str # ".u32 \t$result, $src;", []>, Requires;
+   def _64 : NVPTXInst<(outs Int64Regs:$result), (ins Int64Regs:$src),
+  "cvta." # Str # ".u64 \t$result, $src;", []>, Requires;
+}
+
+multiclass NG_TO_G_64 Preds = []> {
def _64 : NVPTXInst<(outs Int64Regs:$result), (ins Int64Regs:$src),
-  "cvta." # Str # ".u64 \t$result, $src;", []>;
+  "cvta." # Str # ".u64 \t$result, $src;", []>, Requires;
 }

AlexMaclean wrote:

I think it would be cleaner to just add a bit to the `NG_TO_G` class for 
`supports_32`

https://github.com/llvm/llvm-project/pull/135444
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-18 Thread Alex MacLean via cfe-commits


@@ -3019,8 +3019,42 @@ SDValue NVPTXTargetLowering::LowerADDRSPACECAST(SDValue 
Op,
   unsigned SrcAS = N->getSrcAddressSpace();
   unsigned DestAS = N->getDestAddressSpace();
   if (SrcAS != llvm::ADDRESS_SPACE_GENERIC &&
-  DestAS != llvm::ADDRESS_SPACE_GENERIC)
+  DestAS != llvm::ADDRESS_SPACE_GENERIC) {
+// Shared and SharedCluster can be converted to each other through generic
+// space
+if (SrcAS == llvm::ADDRESS_SPACE_SHARED &&

AlexMaclean wrote:

This `if` and the one below look essentially duplicated. Can you fold them 
together?

https://github.com/llvm/llvm-project/pull/135444
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-18 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean commented:

Getting close to ready, a couple more places to update:
- NVPTXTargetTransformInfo.cpp: evaluateIsSpace
- NVPTXUsage.rst: Address Space section, add intrinsics you're modifying, such 
as `mapa`, to the spec

https://github.com/llvm/llvm-project/pull/135444
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-18 Thread Alex MacLean via cfe-commits


@@ -0,0 +1,329 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc < %s -o - -mcpu=sm_90 -mattr=+ptx78 | FileCheck %s
+; RUN: %if ptxas-12.0 %{ llc < %s -mcpu=sm_90 -mattr=+ptx78| %ptxas-verify 
-arch=sm_90 %}
+
+target triple = "nvptx64-nvidia-cuda"
+
+@llvm.used = appending global [5 x ptr] [
+  ptr @test_distributed_shared_cluster_common,

AlexMaclean wrote:

our lit tests generally don't use `@llvm.used`, can you remove this?

https://github.com/llvm/llvm-project/pull/135444
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-22 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean approved this pull request.


https://github.com/llvm/llvm-project/pull/135444
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-22 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean commented:

llvm changes LGTM, though I'm not too familiar with the MLIR portion of this 
change. 

https://github.com/llvm/llvm-project/pull/135444
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-10 Thread Alex MacLean via cfe-commits


@@ -703,6 +703,46 @@ let hasSideEffects = false in {
   defm CVT_to_tf32_rz_satf : CVT_TO_TF32<"rz.satfinite", [hasPTX<86>, 
hasSM<100>]>;
   defm CVT_to_tf32_rn_relu_satf  : CVT_TO_TF32<"rn.relu.satfinite", 
[hasPTX<86>, hasSM<100>]>;
   defm CVT_to_tf32_rz_relu_satf  : CVT_TO_TF32<"rz.relu.satfinite", 
[hasPTX<86>, hasSM<100>]>;
+
+  // FP6 conversions.
+  class CVT_f32_to_f6x2
+: NVPTXInst<(outs Int16Regs:$dst),
+(ins Float32Regs:$src1, Float32Regs:$src2, CvtMode:$mode),
+!strconcat("cvt${mode:base}.satfinite${mode:relu}.",

AlexMaclean wrote:

Nit: for simple cases like this, `#` is preferable over `!strconcat`

https://github.com/llvm/llvm-project/pull/134345
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-10 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean edited 
https://github.com/llvm/llvm-project/pull/134345
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Alex MacLean via cfe-commits

AlexMaclean wrote:

Merging on behalf of @YonahGoldberg at his request offline. 

https://github.com/llvm/llvm-project/pull/134416
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Alex MacLean via cfe-commits

https://github.com/AlexMaclean closed 
https://github.com/llvm/llvm-project/pull/134416
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-10 Thread Alex MacLean via cfe-commits


@@ -1548,6 +1548,45 @@ let TargetPrefix = "nvvm" in {
   Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>;
   def int_nvvm_e5m2x2_to_f16x2_rn_relu : 
ClangBuiltin<"__nvvm_e5m2x2_to_f16x2_rn_relu">,
   Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>;
+  
+  def int_nvvm_ff_to_e2m3x2_rn : ClangBuiltin<"__nvvm_ff_to_e2m3x2_rn">,
+  Intrinsic<[llvm_i16_ty], [llvm_float_ty, llvm_float_ty], [IntrNoMem, 
IntrNoCallback]>;

AlexMaclean wrote:

Can all these intrinisics be made DefaultAttrsIntrinsics? 

https://github.com/llvm/llvm-project/pull/134345
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-12 Thread Alex MacLean via cfe-commits


@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) {
 case ADDRESS_SPACE_SHARED:
   Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared;
   break;
-case ADDRESS_SPACE_DSHARED:
-  Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 : NVPTX::cvta_dshared;
+case ADDRESS_SPACE_SHARED_CLUSTER:
+  Opc = TM.is64Bit() ? NVPTX::cvta_shared_cluster_64
+ : NVPTX::cvta_shared_cluster;

AlexMaclean wrote:

My understanding is that cluster is not supported until sm_90, and that sm_90+ 
do not support 32bit compilation. Is there something I'm missing? If not we 
should never select the 32-bit version here and instead check to ensure we're 
compiling for sm_90+.

https://github.com/llvm/llvm-project/pull/135444
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   >