[clang] [NVPTX][AMDGPU][CodeGen] Fix `local_space nullptr` handling for NVPTX and local/private `nullptr` value for AMDGPU. (PR #78759)

2024-01-25 Thread Victor Lomuller via cfe-commits


@@ -285,6 +289,20 @@ void 
NVPTXTargetCodeGenInfo::addNVVMMetadata(llvm::GlobalValue *GV,
 bool NVPTXTargetCodeGenInfo::shouldEmitStaticExternCAliases() const {
   return false;
 }
+
+llvm::Constant *
+NVPTXTargetCodeGenInfo::getNullPointer(const CodeGen::CodeGenModule &CGM,
+   llvm::PointerType *PT,
+   QualType QT) const {
+  auto &Ctx = CGM.getContext();
+  if (PT->getAddressSpace() != Ctx.getTargetAddressSpace(LangAS::opencl_local))
+return llvm::ConstantPointerNull::get(PT);
+
+  auto NPT = llvm::PointerType::get(
+  PT->getContext(), Ctx.getTargetAddressSpace(LangAS::opencl_generic));
+  return llvm::ConstantExpr::getAddrSpaceCast(
+  llvm::ConstantPointerNull::get(NPT), PT);
+}

Naghasan wrote:

Hi @Artem-B 

I'm shimming in at @mmoadeli's request. I advised him on the resolution of his 
issue.

> I don't quite understand what's going on here.

So it is a similar story as for the AMDGPU backend. `0` as a pointer to shared 
memory is a valid one and points to the root of the shared memory, so that's 
means we cannot use this value as `nullptr`. AMDGPU uses -1 (all bits set) for 
this, but we couldn't find anything equivalent in the CUDA/PTX documentation. 
After a few investigation, we found out the most stable way to do this is 
simply by inserting this expression.

Note that `0` as a pointer to the generic address space the expected value for 
`nullptr`

> Why are we ASC'ing all null pointers to LangAS::opencl_generic ?

The patch isn't doing this, if the pointer type *is* to the cuda shared address 
space (opencl's local address space) then we do ASC. Otherwise this emits the 
simple `llvm::ConstantPointerNull`.
We used `LangAS::opencl_generic` as a way to emphasis there is a generic to 
shared address space cast going on. The other solution here would be to use 
`LangAS::Default` to retrieve the target address space, but `Default` doesn't 
sound right to me as you have to know this maps to NVPTX's generic target 
address space. Either way, we don't have a strong opinion on what to use. But a 
comment is probably needed regardless.

> Will it work for CUDA (as in the CUDA language)? I think this code should be 
> restricted to apply the ASC only for OpenCL and leave CUDA/HIP with the 
> dafault.

So yes and no. To the `Will it work for CUDA ?` part, yes it will because you 
actually cannot hit this return. CUDA doesn't expose address spaces so you 
can't have that nullptr as an address in the cuda shared address space, so the 
`if` above will always evaluate to true in CUDA.

For the `leave CUDA/HIP with the dafault` part, you could force things and use 
target address spaces like it is done in the clang headers for CUDA and this 
change would capture that. However, as explained before, `0` in the address 
space 3 (NVPTX backend) is a valid address and it is very easy to highlight in 
SASS.

https://github.com/llvm/llvm-project/pull/78759
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NVPTX][AMDGPU][CodeGen] Fix `local_space nullptr` handling for NVPTX and local/private `nullptr` value for AMDGPU. (PR #78759)

2024-01-25 Thread Victor Lomuller via cfe-commits


@@ -418,8 +418,10 @@ class LLVM_LIBRARY_VISIBILITY AMDGPUTargetInfo final : 
public TargetInfo {
   // value ~0.
   uint64_t getNullPointerValue(LangAS AS) const override {
 // FIXME: Also should handle region.
-return (AS == LangAS::opencl_local || AS == LangAS::opencl_private)
-  ? ~0 : 0;
+return (AS == LangAS::opencl_local || AS == LangAS::opencl_private ||
+AS == LangAS::sycl_local || AS == LangAS::sycl_private)

Naghasan wrote:

The split is the result of long discussions with the OpenCL code owner

https://github.com/llvm/llvm-project/pull/78759
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][NVPTX] Define macro indicating the PTX version (PR #94934)

2024-06-09 Thread Victor Lomuller via cfe-commits

https://github.com/Naghasan created 
https://github.com/llvm/llvm-project/pull/94934

Define __PTX_VERSION__ macro to indicate the used PTX version.

Usually each new PTX version brings a new sm version and the associated 
instructions. However, some of these instructions can also be made avialable to 
older sm. This allows applications to check more accuratly for available 
instructions.

>From 52623029bf504c10ee4e8df749c1e33eaa564cc2 Mon Sep 17 00:00:00 2001
From: Victor Lomuller 
Date: Thu, 6 Jun 2024 21:53:01 +0100
Subject: [PATCH] [clang][NVPTX] Define macro indicating the PTX version

Define __PTX_VERSION__ macro to indicate the used PTX version.

Usually each new PTX version brings a new sm version and the associated 
instructions.
However, some of these instructions can also be made avialable to older sm.
This allows applications to check more accuratly for available instructions.
---
 clang/lib/Basic/Targets/NVPTX.cpp  |  1 +
 clang/test/Preprocessor/cuda-ptx-versioning.cu | 11 +++
 2 files changed, 12 insertions(+)
 create mode 100644 clang/test/Preprocessor/cuda-ptx-versioning.cu

diff --git a/clang/lib/Basic/Targets/NVPTX.cpp 
b/clang/lib/Basic/Targets/NVPTX.cpp
index ff7d2f1f92aa4..ebb1839d8cfd1 100644
--- a/clang/lib/Basic/Targets/NVPTX.cpp
+++ b/clang/lib/Basic/Targets/NVPTX.cpp
@@ -173,6 +173,7 @@ void NVPTXTargetInfo::getTargetDefines(const LangOptions 
&Opts,
MacroBuilder &Builder) const {
   Builder.defineMacro("__PTX__");
   Builder.defineMacro("__NVPTX__");
+  Builder.defineMacro("__PTX_VERSION__", Twine(PTXVersion));
 
   // Skip setting architecture dependent macros if undefined.
   if (GPU == CudaArch::UNUSED && !HostTarget)
diff --git a/clang/test/Preprocessor/cuda-ptx-versioning.cu 
b/clang/test/Preprocessor/cuda-ptx-versioning.cu
new file mode 100644
index 0..2d7eb9b172b58
--- /dev/null
+++ b/clang/test/Preprocessor/cuda-ptx-versioning.cu
@@ -0,0 +1,11 @@
+// RUN: %clang_cc1 %s -E -dM -o - -x cuda -fcuda-is-device -triple nvptx64 \
+// RUN: | FileCheck -match-full-lines %s --check-prefix=CHECK-CUDA32
+// CHECK-CUDA32: #define __PTX_VERSION__ 32
+
+// RUN: %clang_cc1 %s -E -dM -o - -x cuda -fcuda-is-device -triple nvptx64 
-target-feature +ptx78 \
+// RUN:  -target-cpu sm_90 | FileCheck -match-full-lines %s 
--check-prefix=CHECK-CUDA78
+// CHECK-CUDA78: #define __PTX_VERSION__ 78
+
+// RUN: %clang_cc1 %s -E -dM -o - -x cuda -fcuda-is-device -triple nvptx64 
-target-feature +ptx80 \
+// RUN:  -target-cpu sm_80 | FileCheck -match-full-lines %s 
--check-prefix=CHECK-CUDA80
+// CHECK-CUDA80: #define __PTX_VERSION__ 80

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][NVPTX] Define macro indicating the PTX version (PR #94934)

2024-06-18 Thread Victor Lomuller via cfe-commits

Naghasan wrote:

@Artem-B could you have a look ? I think you are the most relevant for this, 
thanks. (Sorry I can't manage assignment)

https://github.com/llvm/llvm-project/pull/94934
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][NVPTX] Define macro indicating the PTX version (PR #94934)

2024-07-09 Thread Victor Lomuller via cfe-commits

Naghasan wrote:

ping

https://github.com/llvm/llvm-project/pull/94934
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][NVPTX] Define macro indicating the PTX version (PR #94934)

2024-06-27 Thread Victor Lomuller via cfe-commits

Naghasan wrote:

Thanks for setting the reviewer

> Can you please include rationale for why this this name e.g. why not 
> __NVPTX_VERSION__?

`NVPTX` is the name of the LLVM backend, `PTX` is the name of assembly which 
have a version hence `__PTX_VERSION__`. Happy to use a better name, but it is 
pretty descriptive for what it represent. We could use `__NVPTX_PTX_VERSION__` 
to avoid a potential clash with future nvcc.

> How does it relate to __CUDA_ARCH__

Doesn't directly relates to `__CUDA_ARCH__`. `__CUDA_ARCH__` is the minimal GPU 
on which you want to run, the proposed macro indicates the assembly version 
used and so ties to the ptxas version.

> and why is __CUDA_ARCH__ not sufficient?

as said in the description: usually each new PTX version brings a new sm 
version and the associated instructions. However, some of these instructions 
can also be made available to older sm. This allows applications to check more 
accurately for available instructions.



https://github.com/llvm/llvm-project/pull/94934
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][NVPTX] Define macro indicating the PTX version (PR #94934)

2024-06-27 Thread Victor Lomuller via cfe-commits

https://github.com/Naghasan edited 
https://github.com/llvm/llvm-project/pull/94934
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][NVPTX] Define macro indicating the PTX version (PR #94934)

2024-06-27 Thread Victor Lomuller via cfe-commits

Naghasan wrote:

Still I forgot to answer this point as well...

> Are there ever point releases that might mea +ptx78 should actually expand to 
> 780 rather than 78?

not sure what you exactly mean with your question. I guess we can mirror the 
cuda arch macro so, major * 100 + minor * 10, no opinion here. 

https://github.com/llvm/llvm-project/pull/94934
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-10-16 Thread Victor Lomuller via cfe-commits


@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, 
const Triple &TT,
   setRequiresStructuredCFG(false);
 }
 
+enum AddressSpace {
+  Function = storageClassToAddressSpace(SPIRV::StorageClass::Function),
+  CrossWorkgroup =
+  storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup),
+  UniformConstant =
+  storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant),
+  Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup),
+  Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic)
+};
+
+unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const {
+  const auto *LD = dyn_cast(V);
+  if (!LD)
+return UINT32_MAX;
+
+  // It must be a load from a pointer to Generic.
+  assert(V->getType()->isPointerTy() &&
+ V->getType()->getPointerAddressSpace() == AddressSpace::Generic);
+
+  const auto *Ptr = LD->getPointerOperand();
+  if (Ptr->getType()->getPointerAddressSpace() != 
AddressSpace::UniformConstant)
+return UINT32_MAX;
+  // For a loaded from a pointer to UniformConstant, we can infer 
CrossWorkgroup
+  // storage, as this could only have been legally initialised with a
+  // CrossWorkgroup (aka device) constant pointer.
+  return AddressSpace::CrossWorkgroup;
+}
+
+std::pair
+SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const {
+  using namespace PatternMatch;
+
+  if (auto *II = dyn_cast(V)) {
+switch (II->getIntrinsicID()) {
+case Intrinsic::amdgcn_is_shared:
+  return std::pair(II->getArgOperand(0), AddressSpace::Workgroup);
+case Intrinsic::amdgcn_is_private:
+  return std::pair(II->getArgOperand(0), AddressSpace::Function);
+default:
+  break;
+}
+return std::pair(nullptr, UINT32_MAX);
+  }
+  // Check the global pointer predication based on
+  // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and
+  // the order of 'is_shared' and 'is_private' is not significant.
+  Value *Ptr;
+  if (getTargetTriple().getVendor() == Triple::VendorType::AMD &&
+  match(
+  const_cast(V),
+  
m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))),
+m_Not(m_Intrinsic(
+m_Deferred(Ptr))

Naghasan wrote:

> I do think we need to add a poison-if-known-invalid-cast flag to addrspacecast

+1 (so does SPIR-V but that's another story)

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-11-28 Thread Victor Lomuller via cfe-commits


@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, 
const Triple &TT,
   setRequiresStructuredCFG(false);
 }
 
+enum AddressSpace {
+  Function = storageClassToAddressSpace(SPIRV::StorageClass::Function),
+  CrossWorkgroup =
+  storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup),
+  UniformConstant =
+  storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant),
+  Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup),
+  Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic)
+};
+
+unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const {
+  const auto *LD = dyn_cast(V);
+  if (!LD)
+return UINT32_MAX;
+
+  // It must be a load from a pointer to Generic.
+  assert(V->getType()->isPointerTy() &&
+ V->getType()->getPointerAddressSpace() == AddressSpace::Generic);
+
+  const auto *Ptr = LD->getPointerOperand();
+  if (Ptr->getType()->getPointerAddressSpace() != 
AddressSpace::UniformConstant)
+return UINT32_MAX;
+  // For a loaded from a pointer to UniformConstant, we can infer 
CrossWorkgroup
+  // storage, as this could only have been legally initialised with a
+  // CrossWorkgroup (aka device) constant pointer.
+  return AddressSpace::CrossWorkgroup;
+}
+
+std::pair
+SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const {
+  using namespace PatternMatch;
+
+  if (auto *II = dyn_cast(V)) {
+switch (II->getIntrinsicID()) {
+case Intrinsic::amdgcn_is_shared:
+  return std::pair(II->getArgOperand(0), AddressSpace::Workgroup);
+case Intrinsic::amdgcn_is_private:
+  return std::pair(II->getArgOperand(0), AddressSpace::Function);
+default:
+  break;
+}
+return std::pair(nullptr, UINT32_MAX);
+  }
+  // Check the global pointer predication based on
+  // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and
+  // the order of 'is_shared' and 'is_private' is not significant.
+  Value *Ptr;
+  if (getTargetTriple().getVendor() == Triple::VendorType::AMD &&
+  match(
+  const_cast(V),
+  
m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))),
+m_Not(m_Intrinsic(
+m_Deferred(Ptr))

Naghasan wrote:

Oh just seeing this comment @AlexVlx

> I think that we just need to implement the AS predicates (is_local / 
> is_private & friends) atop `OpGenericPtrMemSemantics`

is that for AMDGCN or something more general ? If the latter, the spec doesn't 
offer enough guarantee to do that.

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-11-28 Thread Victor Lomuller via cfe-commits


@@ -92,6 +98,63 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, 
const Triple &TT,
   setRequiresStructuredCFG(false);
 }
 
+enum AddressSpace {
+  Function = storageClassToAddressSpace(SPIRV::StorageClass::Function),
+  CrossWorkgroup =
+  storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup),
+  UniformConstant =
+  storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant),
+  Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup),
+  Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic)
+};
+
+unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const {
+  // TODO: we only enable this for AMDGCN flavoured SPIR-V, where we know it to
+  //   be correct; this might be relaxed in the future.
+  if (getTargetTriple().getVendor() != Triple::VendorType::AMD)
+return UINT32_MAX;
+
+  const auto *LD = dyn_cast(V);
+  if (!LD)
+return UINT32_MAX;
+
+  // It must be a load from a pointer to Generic.
+  assert(V->getType()->isPointerTy() &&
+ V->getType()->getPointerAddressSpace() == AddressSpace::Generic);
+
+  const auto *Ptr = LD->getPointerOperand();
+  if (Ptr->getType()->getPointerAddressSpace() != 
AddressSpace::UniformConstant)
+return UINT32_MAX;
+  // For a loaded from a pointer to UniformConstant, we can infer 
CrossWorkgroup

Naghasan wrote:

```suggestion
  // For a load from a pointer to UniformConstant, we can infer CrossWorkgroup
```

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-11-28 Thread Victor Lomuller via cfe-commits


@@ -92,6 +98,63 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, 
const Triple &TT,
   setRequiresStructuredCFG(false);
 }
 
+enum AddressSpace {
+  Function = storageClassToAddressSpace(SPIRV::StorageClass::Function),
+  CrossWorkgroup =
+  storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup),
+  UniformConstant =
+  storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant),
+  Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup),
+  Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic)
+};
+
+unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const {
+  // TODO: we only enable this for AMDGCN flavoured SPIR-V, where we know it to
+  //   be correct; this might be relaxed in the future.
+  if (getTargetTriple().getVendor() != Triple::VendorType::AMD)
+return UINT32_MAX;
+
+  const auto *LD = dyn_cast(V);
+  if (!LD)
+return UINT32_MAX;
+
+  // It must be a load from a pointer to Generic.
+  assert(V->getType()->isPointerTy() &&
+ V->getType()->getPointerAddressSpace() == AddressSpace::Generic);
+
+  const auto *Ptr = LD->getPointerOperand();
+  if (Ptr->getType()->getPointerAddressSpace() != 
AddressSpace::UniformConstant)
+return UINT32_MAX;
+  // For a loaded from a pointer to UniformConstant, we can infer 
CrossWorkgroup
+  // storage, as this could only have been legally initialised with a
+  // CrossWorkgroup (aka device) constant pointer.
+  return AddressSpace::CrossWorkgroup;
+}
+
+bool SPIRVTargetMachine::isNoopAddrSpaceCast(unsigned SrcAS,
+ unsigned DestAS) const {
+  if (SrcAS != AddressSpace::Generic && SrcAS != AddressSpace::CrossWorkgroup)
+return false;
+  return DestAS == AddressSpace::Generic ||

Naghasan wrote:

This only makes sense for the AMDGCN flavoured version

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-11-28 Thread Victor Lomuller via cfe-commits


@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, 
const Triple &TT,
   setRequiresStructuredCFG(false);
 }
 
+enum AddressSpace {
+  Function = storageClassToAddressSpace(SPIRV::StorageClass::Function),
+  CrossWorkgroup =
+  storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup),
+  UniformConstant =
+  storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant),
+  Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup),
+  Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic)
+};
+
+unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const {

Naghasan wrote:

I think the routine is ok for a vanilla OpenCL environment but extensions may 
make it invalid.

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-12-02 Thread Victor Lomuller via cfe-commits


@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, 
const Triple &TT,
   setRequiresStructuredCFG(false);
 }
 
+enum AddressSpace {
+  Function = storageClassToAddressSpace(SPIRV::StorageClass::Function),
+  CrossWorkgroup =
+  storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup),
+  UniformConstant =
+  storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant),
+  Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup),
+  Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic)
+};
+
+unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const {
+  const auto *LD = dyn_cast(V);
+  if (!LD)
+return UINT32_MAX;
+
+  // It must be a load from a pointer to Generic.
+  assert(V->getType()->isPointerTy() &&
+ V->getType()->getPointerAddressSpace() == AddressSpace::Generic);
+
+  const auto *Ptr = LD->getPointerOperand();
+  if (Ptr->getType()->getPointerAddressSpace() != 
AddressSpace::UniformConstant)
+return UINT32_MAX;
+  // For a loaded from a pointer to UniformConstant, we can infer 
CrossWorkgroup
+  // storage, as this could only have been legally initialised with a
+  // CrossWorkgroup (aka device) constant pointer.
+  return AddressSpace::CrossWorkgroup;
+}
+
+std::pair
+SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const {
+  using namespace PatternMatch;
+
+  if (auto *II = dyn_cast(V)) {
+switch (II->getIntrinsicID()) {
+case Intrinsic::amdgcn_is_shared:
+  return std::pair(II->getArgOperand(0), AddressSpace::Workgroup);
+case Intrinsic::amdgcn_is_private:
+  return std::pair(II->getArgOperand(0), AddressSpace::Function);
+default:
+  break;
+}
+return std::pair(nullptr, UINT32_MAX);
+  }
+  // Check the global pointer predication based on
+  // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and
+  // the order of 'is_shared' and 'is_private' is not significant.
+  Value *Ptr;
+  if (getTargetTriple().getVendor() == Triple::VendorType::AMD &&
+  match(
+  const_cast(V),
+  
m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))),
+m_Not(m_Intrinsic(
+m_Deferred(Ptr))

Naghasan wrote:

> My interpretation (which could be wrong) is that the bits returned in the 
> mask actually indicate the pointee's AS, so the generic predicates would 
> lower to (handwavium alert) OpGenericPtrMemSemantics + bitwise AND.

The returned value is guaranteed to be a valid combination for the AS but an 
impl can use the same combination for different AS.

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-12-10 Thread Victor Lomuller via cfe-commits


@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, 
const Triple &TT,
   setRequiresStructuredCFG(false);
 }
 
+enum AddressSpace {
+  Function = storageClassToAddressSpace(SPIRV::StorageClass::Function),
+  CrossWorkgroup =
+  storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup),
+  UniformConstant =
+  storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant),
+  Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup),
+  Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic)
+};
+
+unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const {
+  const auto *LD = dyn_cast(V);
+  if (!LD)
+return UINT32_MAX;
+
+  // It must be a load from a pointer to Generic.
+  assert(V->getType()->isPointerTy() &&
+ V->getType()->getPointerAddressSpace() == AddressSpace::Generic);
+
+  const auto *Ptr = LD->getPointerOperand();
+  if (Ptr->getType()->getPointerAddressSpace() != 
AddressSpace::UniformConstant)
+return UINT32_MAX;
+  // For a loaded from a pointer to UniformConstant, we can infer 
CrossWorkgroup
+  // storage, as this could only have been legally initialised with a
+  // CrossWorkgroup (aka device) constant pointer.
+  return AddressSpace::CrossWorkgroup;
+}
+
+std::pair
+SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const {
+  using namespace PatternMatch;
+
+  if (auto *II = dyn_cast(V)) {
+switch (II->getIntrinsicID()) {
+case Intrinsic::amdgcn_is_shared:
+  return std::pair(II->getArgOperand(0), AddressSpace::Workgroup);
+case Intrinsic::amdgcn_is_private:
+  return std::pair(II->getArgOperand(0), AddressSpace::Function);
+default:
+  break;
+}
+return std::pair(nullptr, UINT32_MAX);
+  }
+  // Check the global pointer predication based on
+  // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and
+  // the order of 'is_shared' and 'is_private' is not significant.
+  Value *Ptr;
+  if (getTargetTriple().getVendor() == Triple::VendorType::AMD &&
+  match(
+  const_cast(V),
+  
m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))),
+m_Not(m_Intrinsic(
+m_Deferred(Ptr))

Naghasan wrote:

> what is an implementation in this case?

a tool consuming the SPIR-V module like an opencl driver

> It would be rather odd to have a valid implementation use e.g. setting the 
> WorkGroup bit to denote CrossWorkGroup, would it not?

it is, but it may not make a difference for all platforms (e.g. CPUs don't 
typically have a dedicated workgroup memory) and checking what you are dealing 
can be somehow expensive or complex for no clear benefit down the line.

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][SYCL] Add AOT compilation support for Intel GPUs in clang-sycl-linker (PR #133194)

2025-04-04 Thread Victor Lomuller via cfe-commits


@@ -0,0 +1,131 @@
+//===--- SYCL.h -*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_CLANG_BASIC_SYCL_H
+#define LLVM_CLANG_BASIC_SYCL_H
+
+#include "clang/Basic/Cuda.h"
+
+namespace llvm {
+class StringRef;
+template  class SmallString;
+} // namespace llvm
+
+namespace clang {
+// List of architectures (Intel CPUs and Intel GPUs)
+// that support SYCL offloading.
+enum class SYCLSupportedIntelArchs {

Naghasan wrote:

> perhaps the file it is defined in (Cuda.h) should be named to something more 
> appropriate

+1, it would probably make sense to move the non cuda stuff in a `Offloading.h` 
file (there is SYCL, OpenMP, CUDA and HIP after all)

https://github.com/llvm/llvm-project/pull/133194
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][SYCL] Add AOT compilation support for Intel GPUs in clang-sycl-linker (PR #133194)

2025-04-03 Thread Victor Lomuller via cfe-commits


@@ -0,0 +1,131 @@
+//===--- SYCL.h -*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_CLANG_BASIC_SYCL_H
+#define LLVM_CLANG_BASIC_SYCL_H
+
+#include "clang/Basic/Cuda.h"
+
+namespace llvm {
+class StringRef;
+template  class SmallString;
+} // namespace llvm
+
+namespace clang {
+// List of architectures (Intel CPUs and Intel GPUs)
+// that support SYCL offloading.
+enum class SYCLSupportedIntelArchs {
+  // Intel CPUs
+  UNKNOWN,
+  SKYLAKEAVX512,
+  COREAVX2,
+  COREI7AVX,
+  COREI7,
+  WESTMERE,
+  SANDYBRIDGE,
+  IVYBRIDGE,
+  BROADWELL,
+  COFFEELAKE,
+  ALDERLAKE,
+  SKYLAKE,
+  SKX,
+  CASCADELAKE,
+  ICELAKECLIENT,
+  ICELAKESERVER,
+  SAPPHIRERAPIDS,
+  GRANITERAPIDS,
+  // Intel GPUs
+  BDW,
+  SKL,
+  KBL,
+  CFL,
+  APL,
+  BXT,
+  GLK,
+  WHL,
+  AML,
+  CML,
+  ICLLP,
+  ICL,
+  EHL,
+  JSL,
+  TGLLP,
+  TGL,
+  RKL,
+  ADL_S,
+  RPL_S,
+  ADL_P,
+  ADL_N,
+  DG1,
+  ACM_G10,
+  DG2_G10,
+  ACM_G11,
+  DG2_G11,
+  ACM_G12,
+  DG2_G12,
+  PVC,
+  PVC_VG,
+  MTL_U,
+  MTL_S,
+  ARL_U,
+  ARL_S,
+  MTL_H,
+  ARL_H,
+  BMG_G21,
+  LNL_M,
+};
+
+// Check if the given Arch value is a Generic AMD GPU.
+// Currently GFX*_GENERIC AMD GPUs do not support SYCL offloading.
+// This list is used to filter out GFX*_GENERIC AMD GPUs in
+// `IsSYCLSupportedAMDGPUArch`.
+static inline bool IsAMDGenericGPUArch(OffloadArch Arch) {

Naghasan wrote:

(purely FYI) I don't think this has been tested at all. AFAIK, generic version 
will prevent a few instructions from being used, so not totally a runtime thing 
and I can see a few potential issues with the builtin implementations in libclc 
(in the intel/llvm repo).

https://github.com/llvm/llvm-project/pull/133194
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits