[clang] [NVPTX][AMDGPU][CodeGen] Fix `local_space nullptr` handling for NVPTX and local/private `nullptr` value for AMDGPU. (PR #78759)
@@ -285,6 +289,20 @@ void NVPTXTargetCodeGenInfo::addNVVMMetadata(llvm::GlobalValue *GV, bool NVPTXTargetCodeGenInfo::shouldEmitStaticExternCAliases() const { return false; } + +llvm::Constant * +NVPTXTargetCodeGenInfo::getNullPointer(const CodeGen::CodeGenModule &CGM, + llvm::PointerType *PT, + QualType QT) const { + auto &Ctx = CGM.getContext(); + if (PT->getAddressSpace() != Ctx.getTargetAddressSpace(LangAS::opencl_local)) +return llvm::ConstantPointerNull::get(PT); + + auto NPT = llvm::PointerType::get( + PT->getContext(), Ctx.getTargetAddressSpace(LangAS::opencl_generic)); + return llvm::ConstantExpr::getAddrSpaceCast( + llvm::ConstantPointerNull::get(NPT), PT); +} Naghasan wrote: Hi @Artem-B I'm shimming in at @mmoadeli's request. I advised him on the resolution of his issue. > I don't quite understand what's going on here. So it is a similar story as for the AMDGPU backend. `0` as a pointer to shared memory is a valid one and points to the root of the shared memory, so that's means we cannot use this value as `nullptr`. AMDGPU uses -1 (all bits set) for this, but we couldn't find anything equivalent in the CUDA/PTX documentation. After a few investigation, we found out the most stable way to do this is simply by inserting this expression. Note that `0` as a pointer to the generic address space the expected value for `nullptr` > Why are we ASC'ing all null pointers to LangAS::opencl_generic ? The patch isn't doing this, if the pointer type *is* to the cuda shared address space (opencl's local address space) then we do ASC. Otherwise this emits the simple `llvm::ConstantPointerNull`. We used `LangAS::opencl_generic` as a way to emphasis there is a generic to shared address space cast going on. The other solution here would be to use `LangAS::Default` to retrieve the target address space, but `Default` doesn't sound right to me as you have to know this maps to NVPTX's generic target address space. Either way, we don't have a strong opinion on what to use. But a comment is probably needed regardless. > Will it work for CUDA (as in the CUDA language)? I think this code should be > restricted to apply the ASC only for OpenCL and leave CUDA/HIP with the > dafault. So yes and no. To the `Will it work for CUDA ?` part, yes it will because you actually cannot hit this return. CUDA doesn't expose address spaces so you can't have that nullptr as an address in the cuda shared address space, so the `if` above will always evaluate to true in CUDA. For the `leave CUDA/HIP with the dafault` part, you could force things and use target address spaces like it is done in the clang headers for CUDA and this change would capture that. However, as explained before, `0` in the address space 3 (NVPTX backend) is a valid address and it is very easy to highlight in SASS. https://github.com/llvm/llvm-project/pull/78759 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [NVPTX][AMDGPU][CodeGen] Fix `local_space nullptr` handling for NVPTX and local/private `nullptr` value for AMDGPU. (PR #78759)
@@ -418,8 +418,10 @@ class LLVM_LIBRARY_VISIBILITY AMDGPUTargetInfo final : public TargetInfo { // value ~0. uint64_t getNullPointerValue(LangAS AS) const override { // FIXME: Also should handle region. -return (AS == LangAS::opencl_local || AS == LangAS::opencl_private) - ? ~0 : 0; +return (AS == LangAS::opencl_local || AS == LangAS::opencl_private || +AS == LangAS::sycl_local || AS == LangAS::sycl_private) Naghasan wrote: The split is the result of long discussions with the OpenCL code owner https://github.com/llvm/llvm-project/pull/78759 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][NVPTX] Define macro indicating the PTX version (PR #94934)
https://github.com/Naghasan created https://github.com/llvm/llvm-project/pull/94934 Define __PTX_VERSION__ macro to indicate the used PTX version. Usually each new PTX version brings a new sm version and the associated instructions. However, some of these instructions can also be made avialable to older sm. This allows applications to check more accuratly for available instructions. >From 52623029bf504c10ee4e8df749c1e33eaa564cc2 Mon Sep 17 00:00:00 2001 From: Victor Lomuller Date: Thu, 6 Jun 2024 21:53:01 +0100 Subject: [PATCH] [clang][NVPTX] Define macro indicating the PTX version Define __PTX_VERSION__ macro to indicate the used PTX version. Usually each new PTX version brings a new sm version and the associated instructions. However, some of these instructions can also be made avialable to older sm. This allows applications to check more accuratly for available instructions. --- clang/lib/Basic/Targets/NVPTX.cpp | 1 + clang/test/Preprocessor/cuda-ptx-versioning.cu | 11 +++ 2 files changed, 12 insertions(+) create mode 100644 clang/test/Preprocessor/cuda-ptx-versioning.cu diff --git a/clang/lib/Basic/Targets/NVPTX.cpp b/clang/lib/Basic/Targets/NVPTX.cpp index ff7d2f1f92aa4..ebb1839d8cfd1 100644 --- a/clang/lib/Basic/Targets/NVPTX.cpp +++ b/clang/lib/Basic/Targets/NVPTX.cpp @@ -173,6 +173,7 @@ void NVPTXTargetInfo::getTargetDefines(const LangOptions &Opts, MacroBuilder &Builder) const { Builder.defineMacro("__PTX__"); Builder.defineMacro("__NVPTX__"); + Builder.defineMacro("__PTX_VERSION__", Twine(PTXVersion)); // Skip setting architecture dependent macros if undefined. if (GPU == CudaArch::UNUSED && !HostTarget) diff --git a/clang/test/Preprocessor/cuda-ptx-versioning.cu b/clang/test/Preprocessor/cuda-ptx-versioning.cu new file mode 100644 index 0..2d7eb9b172b58 --- /dev/null +++ b/clang/test/Preprocessor/cuda-ptx-versioning.cu @@ -0,0 +1,11 @@ +// RUN: %clang_cc1 %s -E -dM -o - -x cuda -fcuda-is-device -triple nvptx64 \ +// RUN: | FileCheck -match-full-lines %s --check-prefix=CHECK-CUDA32 +// CHECK-CUDA32: #define __PTX_VERSION__ 32 + +// RUN: %clang_cc1 %s -E -dM -o - -x cuda -fcuda-is-device -triple nvptx64 -target-feature +ptx78 \ +// RUN: -target-cpu sm_90 | FileCheck -match-full-lines %s --check-prefix=CHECK-CUDA78 +// CHECK-CUDA78: #define __PTX_VERSION__ 78 + +// RUN: %clang_cc1 %s -E -dM -o - -x cuda -fcuda-is-device -triple nvptx64 -target-feature +ptx80 \ +// RUN: -target-cpu sm_80 | FileCheck -match-full-lines %s --check-prefix=CHECK-CUDA80 +// CHECK-CUDA80: #define __PTX_VERSION__ 80 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][NVPTX] Define macro indicating the PTX version (PR #94934)
Naghasan wrote: @Artem-B could you have a look ? I think you are the most relevant for this, thanks. (Sorry I can't manage assignment) https://github.com/llvm/llvm-project/pull/94934 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][NVPTX] Define macro indicating the PTX version (PR #94934)
Naghasan wrote: ping https://github.com/llvm/llvm-project/pull/94934 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][NVPTX] Define macro indicating the PTX version (PR #94934)
Naghasan wrote: Thanks for setting the reviewer > Can you please include rationale for why this this name e.g. why not > __NVPTX_VERSION__? `NVPTX` is the name of the LLVM backend, `PTX` is the name of assembly which have a version hence `__PTX_VERSION__`. Happy to use a better name, but it is pretty descriptive for what it represent. We could use `__NVPTX_PTX_VERSION__` to avoid a potential clash with future nvcc. > How does it relate to __CUDA_ARCH__ Doesn't directly relates to `__CUDA_ARCH__`. `__CUDA_ARCH__` is the minimal GPU on which you want to run, the proposed macro indicates the assembly version used and so ties to the ptxas version. > and why is __CUDA_ARCH__ not sufficient? as said in the description: usually each new PTX version brings a new sm version and the associated instructions. However, some of these instructions can also be made available to older sm. This allows applications to check more accurately for available instructions. https://github.com/llvm/llvm-project/pull/94934 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][NVPTX] Define macro indicating the PTX version (PR #94934)
https://github.com/Naghasan edited https://github.com/llvm/llvm-project/pull/94934 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][NVPTX] Define macro indicating the PTX version (PR #94934)
Naghasan wrote: Still I forgot to answer this point as well... > Are there ever point releases that might mea +ptx78 should actually expand to > 780 rather than 78? not sure what you exactly mean with your question. I guess we can mirror the cuda arch macro so, major * 100 + minor * 10, no opinion here. https://github.com/llvm/llvm-project/pull/94934 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, const Triple &TT, setRequiresStructuredCFG(false); } +enum AddressSpace { + Function = storageClassToAddressSpace(SPIRV::StorageClass::Function), + CrossWorkgroup = + storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup), + UniformConstant = + storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant), + Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup), + Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic) +}; + +unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const { + const auto *LD = dyn_cast(V); + if (!LD) +return UINT32_MAX; + + // It must be a load from a pointer to Generic. + assert(V->getType()->isPointerTy() && + V->getType()->getPointerAddressSpace() == AddressSpace::Generic); + + const auto *Ptr = LD->getPointerOperand(); + if (Ptr->getType()->getPointerAddressSpace() != AddressSpace::UniformConstant) +return UINT32_MAX; + // For a loaded from a pointer to UniformConstant, we can infer CrossWorkgroup + // storage, as this could only have been legally initialised with a + // CrossWorkgroup (aka device) constant pointer. + return AddressSpace::CrossWorkgroup; +} + +std::pair +SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const { + using namespace PatternMatch; + + if (auto *II = dyn_cast(V)) { +switch (II->getIntrinsicID()) { +case Intrinsic::amdgcn_is_shared: + return std::pair(II->getArgOperand(0), AddressSpace::Workgroup); +case Intrinsic::amdgcn_is_private: + return std::pair(II->getArgOperand(0), AddressSpace::Function); +default: + break; +} +return std::pair(nullptr, UINT32_MAX); + } + // Check the global pointer predication based on + // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and + // the order of 'is_shared' and 'is_private' is not significant. + Value *Ptr; + if (getTargetTriple().getVendor() == Triple::VendorType::AMD && + match( + const_cast(V), + m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))), +m_Not(m_Intrinsic( +m_Deferred(Ptr)) Naghasan wrote: > I do think we need to add a poison-if-known-invalid-cast flag to addrspacecast +1 (so does SPIR-V but that's another story) https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, const Triple &TT, setRequiresStructuredCFG(false); } +enum AddressSpace { + Function = storageClassToAddressSpace(SPIRV::StorageClass::Function), + CrossWorkgroup = + storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup), + UniformConstant = + storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant), + Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup), + Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic) +}; + +unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const { + const auto *LD = dyn_cast(V); + if (!LD) +return UINT32_MAX; + + // It must be a load from a pointer to Generic. + assert(V->getType()->isPointerTy() && + V->getType()->getPointerAddressSpace() == AddressSpace::Generic); + + const auto *Ptr = LD->getPointerOperand(); + if (Ptr->getType()->getPointerAddressSpace() != AddressSpace::UniformConstant) +return UINT32_MAX; + // For a loaded from a pointer to UniformConstant, we can infer CrossWorkgroup + // storage, as this could only have been legally initialised with a + // CrossWorkgroup (aka device) constant pointer. + return AddressSpace::CrossWorkgroup; +} + +std::pair +SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const { + using namespace PatternMatch; + + if (auto *II = dyn_cast(V)) { +switch (II->getIntrinsicID()) { +case Intrinsic::amdgcn_is_shared: + return std::pair(II->getArgOperand(0), AddressSpace::Workgroup); +case Intrinsic::amdgcn_is_private: + return std::pair(II->getArgOperand(0), AddressSpace::Function); +default: + break; +} +return std::pair(nullptr, UINT32_MAX); + } + // Check the global pointer predication based on + // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and + // the order of 'is_shared' and 'is_private' is not significant. + Value *Ptr; + if (getTargetTriple().getVendor() == Triple::VendorType::AMD && + match( + const_cast(V), + m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))), +m_Not(m_Intrinsic( +m_Deferred(Ptr)) Naghasan wrote: Oh just seeing this comment @AlexVlx > I think that we just need to implement the AS predicates (is_local / > is_private & friends) atop `OpGenericPtrMemSemantics` is that for AMDGCN or something more general ? If the latter, the spec doesn't offer enough guarantee to do that. https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -92,6 +98,63 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, const Triple &TT, setRequiresStructuredCFG(false); } +enum AddressSpace { + Function = storageClassToAddressSpace(SPIRV::StorageClass::Function), + CrossWorkgroup = + storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup), + UniformConstant = + storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant), + Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup), + Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic) +}; + +unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const { + // TODO: we only enable this for AMDGCN flavoured SPIR-V, where we know it to + // be correct; this might be relaxed in the future. + if (getTargetTriple().getVendor() != Triple::VendorType::AMD) +return UINT32_MAX; + + const auto *LD = dyn_cast(V); + if (!LD) +return UINT32_MAX; + + // It must be a load from a pointer to Generic. + assert(V->getType()->isPointerTy() && + V->getType()->getPointerAddressSpace() == AddressSpace::Generic); + + const auto *Ptr = LD->getPointerOperand(); + if (Ptr->getType()->getPointerAddressSpace() != AddressSpace::UniformConstant) +return UINT32_MAX; + // For a loaded from a pointer to UniformConstant, we can infer CrossWorkgroup Naghasan wrote: ```suggestion // For a load from a pointer to UniformConstant, we can infer CrossWorkgroup ``` https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -92,6 +98,63 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, const Triple &TT, setRequiresStructuredCFG(false); } +enum AddressSpace { + Function = storageClassToAddressSpace(SPIRV::StorageClass::Function), + CrossWorkgroup = + storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup), + UniformConstant = + storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant), + Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup), + Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic) +}; + +unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const { + // TODO: we only enable this for AMDGCN flavoured SPIR-V, where we know it to + // be correct; this might be relaxed in the future. + if (getTargetTriple().getVendor() != Triple::VendorType::AMD) +return UINT32_MAX; + + const auto *LD = dyn_cast(V); + if (!LD) +return UINT32_MAX; + + // It must be a load from a pointer to Generic. + assert(V->getType()->isPointerTy() && + V->getType()->getPointerAddressSpace() == AddressSpace::Generic); + + const auto *Ptr = LD->getPointerOperand(); + if (Ptr->getType()->getPointerAddressSpace() != AddressSpace::UniformConstant) +return UINT32_MAX; + // For a loaded from a pointer to UniformConstant, we can infer CrossWorkgroup + // storage, as this could only have been legally initialised with a + // CrossWorkgroup (aka device) constant pointer. + return AddressSpace::CrossWorkgroup; +} + +bool SPIRVTargetMachine::isNoopAddrSpaceCast(unsigned SrcAS, + unsigned DestAS) const { + if (SrcAS != AddressSpace::Generic && SrcAS != AddressSpace::CrossWorkgroup) +return false; + return DestAS == AddressSpace::Generic || Naghasan wrote: This only makes sense for the AMDGCN flavoured version https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, const Triple &TT, setRequiresStructuredCFG(false); } +enum AddressSpace { + Function = storageClassToAddressSpace(SPIRV::StorageClass::Function), + CrossWorkgroup = + storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup), + UniformConstant = + storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant), + Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup), + Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic) +}; + +unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const { Naghasan wrote: I think the routine is ok for a vanilla OpenCL environment but extensions may make it invalid. https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, const Triple &TT, setRequiresStructuredCFG(false); } +enum AddressSpace { + Function = storageClassToAddressSpace(SPIRV::StorageClass::Function), + CrossWorkgroup = + storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup), + UniformConstant = + storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant), + Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup), + Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic) +}; + +unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const { + const auto *LD = dyn_cast(V); + if (!LD) +return UINT32_MAX; + + // It must be a load from a pointer to Generic. + assert(V->getType()->isPointerTy() && + V->getType()->getPointerAddressSpace() == AddressSpace::Generic); + + const auto *Ptr = LD->getPointerOperand(); + if (Ptr->getType()->getPointerAddressSpace() != AddressSpace::UniformConstant) +return UINT32_MAX; + // For a loaded from a pointer to UniformConstant, we can infer CrossWorkgroup + // storage, as this could only have been legally initialised with a + // CrossWorkgroup (aka device) constant pointer. + return AddressSpace::CrossWorkgroup; +} + +std::pair +SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const { + using namespace PatternMatch; + + if (auto *II = dyn_cast(V)) { +switch (II->getIntrinsicID()) { +case Intrinsic::amdgcn_is_shared: + return std::pair(II->getArgOperand(0), AddressSpace::Workgroup); +case Intrinsic::amdgcn_is_private: + return std::pair(II->getArgOperand(0), AddressSpace::Function); +default: + break; +} +return std::pair(nullptr, UINT32_MAX); + } + // Check the global pointer predication based on + // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and + // the order of 'is_shared' and 'is_private' is not significant. + Value *Ptr; + if (getTargetTriple().getVendor() == Triple::VendorType::AMD && + match( + const_cast(V), + m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))), +m_Not(m_Intrinsic( +m_Deferred(Ptr)) Naghasan wrote: > My interpretation (which could be wrong) is that the bits returned in the > mask actually indicate the pointee's AS, so the generic predicates would > lower to (handwavium alert) OpGenericPtrMemSemantics + bitwise AND. The returned value is guaranteed to be a valid combination for the AS but an impl can use the same combination for different AS. https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, const Triple &TT, setRequiresStructuredCFG(false); } +enum AddressSpace { + Function = storageClassToAddressSpace(SPIRV::StorageClass::Function), + CrossWorkgroup = + storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup), + UniformConstant = + storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant), + Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup), + Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic) +}; + +unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const { + const auto *LD = dyn_cast(V); + if (!LD) +return UINT32_MAX; + + // It must be a load from a pointer to Generic. + assert(V->getType()->isPointerTy() && + V->getType()->getPointerAddressSpace() == AddressSpace::Generic); + + const auto *Ptr = LD->getPointerOperand(); + if (Ptr->getType()->getPointerAddressSpace() != AddressSpace::UniformConstant) +return UINT32_MAX; + // For a loaded from a pointer to UniformConstant, we can infer CrossWorkgroup + // storage, as this could only have been legally initialised with a + // CrossWorkgroup (aka device) constant pointer. + return AddressSpace::CrossWorkgroup; +} + +std::pair +SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const { + using namespace PatternMatch; + + if (auto *II = dyn_cast(V)) { +switch (II->getIntrinsicID()) { +case Intrinsic::amdgcn_is_shared: + return std::pair(II->getArgOperand(0), AddressSpace::Workgroup); +case Intrinsic::amdgcn_is_private: + return std::pair(II->getArgOperand(0), AddressSpace::Function); +default: + break; +} +return std::pair(nullptr, UINT32_MAX); + } + // Check the global pointer predication based on + // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and + // the order of 'is_shared' and 'is_private' is not significant. + Value *Ptr; + if (getTargetTriple().getVendor() == Triple::VendorType::AMD && + match( + const_cast(V), + m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))), +m_Not(m_Intrinsic( +m_Deferred(Ptr)) Naghasan wrote: > what is an implementation in this case? a tool consuming the SPIR-V module like an opencl driver > It would be rather odd to have a valid implementation use e.g. setting the > WorkGroup bit to denote CrossWorkGroup, would it not? it is, but it may not make a difference for all platforms (e.g. CPUs don't typically have a dedicated workgroup memory) and checking what you are dealing can be somehow expensive or complex for no clear benefit down the line. https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][SYCL] Add AOT compilation support for Intel GPUs in clang-sycl-linker (PR #133194)
@@ -0,0 +1,131 @@ +//===--- SYCL.h -*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef LLVM_CLANG_BASIC_SYCL_H +#define LLVM_CLANG_BASIC_SYCL_H + +#include "clang/Basic/Cuda.h" + +namespace llvm { +class StringRef; +template class SmallString; +} // namespace llvm + +namespace clang { +// List of architectures (Intel CPUs and Intel GPUs) +// that support SYCL offloading. +enum class SYCLSupportedIntelArchs { Naghasan wrote: > perhaps the file it is defined in (Cuda.h) should be named to something more > appropriate +1, it would probably make sense to move the non cuda stuff in a `Offloading.h` file (there is SYCL, OpenMP, CUDA and HIP after all) https://github.com/llvm/llvm-project/pull/133194 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][SYCL] Add AOT compilation support for Intel GPUs in clang-sycl-linker (PR #133194)
@@ -0,0 +1,131 @@ +//===--- SYCL.h -*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef LLVM_CLANG_BASIC_SYCL_H +#define LLVM_CLANG_BASIC_SYCL_H + +#include "clang/Basic/Cuda.h" + +namespace llvm { +class StringRef; +template class SmallString; +} // namespace llvm + +namespace clang { +// List of architectures (Intel CPUs and Intel GPUs) +// that support SYCL offloading. +enum class SYCLSupportedIntelArchs { + // Intel CPUs + UNKNOWN, + SKYLAKEAVX512, + COREAVX2, + COREI7AVX, + COREI7, + WESTMERE, + SANDYBRIDGE, + IVYBRIDGE, + BROADWELL, + COFFEELAKE, + ALDERLAKE, + SKYLAKE, + SKX, + CASCADELAKE, + ICELAKECLIENT, + ICELAKESERVER, + SAPPHIRERAPIDS, + GRANITERAPIDS, + // Intel GPUs + BDW, + SKL, + KBL, + CFL, + APL, + BXT, + GLK, + WHL, + AML, + CML, + ICLLP, + ICL, + EHL, + JSL, + TGLLP, + TGL, + RKL, + ADL_S, + RPL_S, + ADL_P, + ADL_N, + DG1, + ACM_G10, + DG2_G10, + ACM_G11, + DG2_G11, + ACM_G12, + DG2_G12, + PVC, + PVC_VG, + MTL_U, + MTL_S, + ARL_U, + ARL_S, + MTL_H, + ARL_H, + BMG_G21, + LNL_M, +}; + +// Check if the given Arch value is a Generic AMD GPU. +// Currently GFX*_GENERIC AMD GPUs do not support SYCL offloading. +// This list is used to filter out GFX*_GENERIC AMD GPUs in +// `IsSYCLSupportedAMDGPUArch`. +static inline bool IsAMDGenericGPUArch(OffloadArch Arch) { Naghasan wrote: (purely FYI) I don't think this has been tested at all. AFAIK, generic version will prevent a few instructions from being used, so not totally a runtime thing and I can see a few potential issues with the builtin implementations in libclc (in the intel/llvm repo). https://github.com/llvm/llvm-project/pull/133194 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits