[llvm-branch-commits] [llvm] InferAddressSpaces: Handle llvm.is.constant (PR #102010)
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/102010 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] InferAddressSpaces: Handle llvm.is.constant (PR #102010)
@@ -429,6 +430,15 @@ void InferAddressSpacesImpl::collectRewritableIntrinsicOperands( appendsFlatAddressExpressionToPostorderStack(II->getArgOperand(0), PostorderStack, Visited); break; + case Intrinsic::is_constant: { +Value *Ptr = II->getArgOperand(0); +if (Ptr->getType()->isPtrOrPtrVectorTy()) { + appendsFlatAddressExpressionToPostorderStack(Ptr, PostorderStack, + Visited); +} Artem-B wrote: > It should never be wrong to include braces. Having to deal with google style that demands braces everywhere, and LLVM which does not want them, my personal choice is "whatever the style guide says". It may not always be a perfect choice, but it's not worth anyone's time to argue over specific instances, where the right choice is ambiguous or is a matter of personal preference. I wish we could delegate braces/no-braces decisions to clang-format, too, but I don't think it currently handles that. I'd stick with the style guide defaults and either have the braces removed, or a comment added to the body. Perhaps, making the function name shorter, and avoiding line-wrapping would address your readability concerns about braces/no-braces here, too. https://github.com/llvm/llvm-project/pull/102010 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] InferAddressSpaces: Handle masked load and store intrinsics (PR #102007)
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/102007 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] InferAddressSpaces: Handle llvm.is.constant (PR #102010)
Artem-B wrote: > From the clang-format documentation: > https://clang.llvm.org/docs/ClangFormatStyleOptions.html#removebracesllvm Thank you for the pointer. Unfortunately, it looks like there's a good reason it does not seem to be on by default: > Setting this option to true could lead to incorrect code formatting due to > clang-format’s lack of complete semantic information. https://github.com/llvm/llvm-project/pull/102010 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] clang/AMDGPU: Set noalias.addrspace metadata on atomicrmw (PR #102462)
@@ -647,6 +647,14 @@ class LangOptions : public LangOptionsBase { return ConvergentFunctions; } + /// Return true if atomicrmw operations targeting allocations in private + /// memory are undefined. + bool threadPrivateMemoryAtomicsAreUndefined() const { +// Should be false for OpenMP. +// TODO: Should this be true for SYCL? +return OpenCL || CUDA; Artem-B wrote: @gonzalobg -- Does NVIDIA define what happens if atomics are used on local address space? @arsenm atomics/and AS relationship seems to be a property of the target, not the language. I.e. we could potentially have a different answer for HIP on AMDGPU and CUDA on NVPTX, even though both would have `LangOpts.CUDA=true`. https://github.com/llvm/llvm-project/pull/102462 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] clang/AMDGPU: Set noalias.addrspace metadata on atomicrmw (PR #102462)
@@ -550,6 +551,16 @@ AMDGPUTargetCodeGenInfo::getLLVMSyncScopeID(const LangOptions &LangOpts, void AMDGPUTargetCodeGenInfo::setTargetAtomicMetadata( CodeGenFunction &CGF, llvm::AtomicRMWInst &RMW) const { + + if (RMW.getPointerAddressSpace() == llvm::AMDGPUAS::FLAT_ADDRESS && + CGF.CGM.getLangOpts().threadPrivateMemoryAtomicsAreUndefined()) { +llvm::MDBuilder MDHelper(CGF.getLLVMContext()); +llvm::MDNode *ASRange = MDHelper.createRange( +llvm::APInt(32, llvm::AMDGPUAS::PRIVATE_ADDRESS), +llvm::APInt(32, llvm::AMDGPUAS::PRIVATE_ADDRESS + 1)); +RMW.setMetadata(llvm::LLVMContext::MD_noalias_addrspace, ASRange); Artem-B wrote: What's the plan for this metadata? It does not seem to exist or be used for anything yet. I do not see it in https://github.com/llvm/llvm-project/blob/6b5308b7924108d63149d7c521f21c5e90da7a09/llvm/include/llvm/IR/FixedMetadataKinds.def#L21 Does this patch miss some changes or depends on other patches? https://github.com/llvm/llvm-project/pull/102462 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] 127091b - [CUDA] Normalize handling of defauled dtor.
Author: Artem Belevich Date: 2021-01-21T10:48:07-08:00 New Revision: 127091bfd5edf10495fee4724fd21c666e5d79c1 URL: https://github.com/llvm/llvm-project/commit/127091bfd5edf10495fee4724fd21c666e5d79c1 DIFF: https://github.com/llvm/llvm-project/commit/127091bfd5edf10495fee4724fd21c666e5d79c1.diff LOG: [CUDA] Normalize handling of defauled dtor. Defaulted destructor was treated inconsistently, compared to other compiler-generated functions. When Sema::IdentifyCUDATarget() got called on just-created dtor which didn't have implicit __host__ __device__ attributes applied yet, it would treat it as a host function. That happened to (sometimes) hide the error when dtor referred to a host-only functions. Even when we had identified defaulted dtor as a HD function, we still treated it inconsistently during selection of usual deallocators, where we did not allow referring to wrong-side functions, while it is allowed for other HD functions. This change brings handling of defaulted dtors in line with other HD functions. Differential Revision: https://reviews.llvm.org/D94732 Added: Modified: clang/lib/Sema/SemaCUDA.cpp clang/lib/Sema/SemaExprCXX.cpp clang/test/CodeGenCUDA/usual-deallocators.cu clang/test/SemaCUDA/usual-deallocators.cu Removed: diff --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp index 0f06adf38f7a..ee91eb4c5deb 100644 --- a/clang/lib/Sema/SemaCUDA.cpp +++ b/clang/lib/Sema/SemaCUDA.cpp @@ -123,7 +123,8 @@ Sema::CUDAFunctionTarget Sema::IdentifyCUDATarget(const FunctionDecl *D, return CFT_Device; } else if (hasAttr(D, IgnoreImplicitHDAttr)) { return CFT_Host; - } else if (D->isImplicit() && !IgnoreImplicitHDAttr) { + } else if ((D->isImplicit() || !D->isUserProvided()) && + !IgnoreImplicitHDAttr) { // Some implicit declarations (like intrinsic functions) are not marked. // Set the most lenient target on them for maximal flexibility. return CFT_HostDevice; diff --git a/clang/lib/Sema/SemaExprCXX.cpp b/clang/lib/Sema/SemaExprCXX.cpp index 1ee52107c3da..d91db60f17a0 100644 --- a/clang/lib/Sema/SemaExprCXX.cpp +++ b/clang/lib/Sema/SemaExprCXX.cpp @@ -1527,9 +1527,24 @@ Sema::BuildCXXTypeConstructExpr(TypeSourceInfo *TInfo, bool Sema::isUsualDeallocationFunction(const CXXMethodDecl *Method) { // [CUDA] Ignore this function, if we can't call it. const FunctionDecl *Caller = dyn_cast(CurContext); - if (getLangOpts().CUDA && - IdentifyCUDAPreference(Caller, Method) <= CFP_WrongSide) -return false; + if (getLangOpts().CUDA) { +auto CallPreference = IdentifyCUDAPreference(Caller, Method); +// If it's not callable at all, it's not the right function. +if (CallPreference < CFP_WrongSide) + return false; +if (CallPreference == CFP_WrongSide) { + // Maybe. We have to check if there are better alternatives. + DeclContext::lookup_result R = + Method->getDeclContext()->lookup(Method->getDeclName()); + for (const auto *D : R) { +if (const auto *FD = dyn_cast(D)) { + if (IdentifyCUDAPreference(Caller, FD) > CFP_WrongSide) +return false; +} + } + // We've found no better variants. +} + } SmallVector PreventedBy; bool Result = Method->isUsualDeallocationFunction(PreventedBy); diff --git a/clang/test/CodeGenCUDA/usual-deallocators.cu b/clang/test/CodeGenCUDA/usual-deallocators.cu index 7e7752497f34..6f4cc267a23f 100644 --- a/clang/test/CodeGenCUDA/usual-deallocators.cu +++ b/clang/test/CodeGenCUDA/usual-deallocators.cu @@ -12,6 +12,19 @@ extern "C" __host__ void host_fn(); extern "C" __device__ void dev_fn(); extern "C" __host__ __device__ void hd_fn(); +// Destructors are handled a bit diff erently, compared to regular functions. +// Make sure we do trigger kernel generation on the GPU side even if it's only +// referenced by the destructor. +template __global__ void f(T) {} +template struct A { + ~A() { f<<<1, 1>>>(T()); } +}; + +// HOST-LABEL: @a +A a; +// HOST-LABEL: define linkonce_odr void @_ZN1AIiED1Ev +// search further down for the deice-side checks for @_Z1fIiEvT_ + struct H1D1 { __host__ void operator delete(void *) { host_fn(); }; __device__ void operator delete(void *) { dev_fn(); }; @@ -95,6 +108,9 @@ __host__ __device__ void tests_hd(void *t) { test_hd(t); } +// Make sure that we've generated the kernel used by A::~A. +// DEVICE-LABEL: define dso_local void @_Z1fIiEvT_ + // Make sure we've picked deallocator for the correct side of compilation. // COMMON-LABEL: define linkonce_odr void @_ZN4H1D1dlEPv(i8* %0) @@ -131,3 +147,5 @@ __host__ __device__ void tests_hd(void *t) { // COMMON-LABEL: define linkonce_odr void @_ZN8H1H2D1D2dlEPv(i8* %0) // DEVICE: call void @dev_fn() // HOST: call void @host_fn() + +// DEVICE: !0 = !{void (i32)* @_Z1fIiEvT_, !"kernel", i32 1} diff --git
[llvm-branch-commits] [clang] 0936655 - [CUDA] Do not diagnose host/device variable access in dependent types.
Author: Artem Belevich Date: 2020-12-14T11:53:18-08:00 New Revision: 0936655bac78f6e9cb84dc3feb30c32012100839 URL: https://github.com/llvm/llvm-project/commit/0936655bac78f6e9cb84dc3feb30c32012100839 DIFF: https://github.com/llvm/llvm-project/commit/0936655bac78f6e9cb84dc3feb30c32012100839.diff LOG: [CUDA] Do not diagnose host/device variable access in dependent types. `isCUDADeviceBuiltinSurfaceType()`/`isCUDADeviceBuiltinTextureType()` do not work on dependent types as they rely on specific type attributes. Differential Revision: https://reviews.llvm.org/D92893 Added: Modified: clang/include/clang/Basic/Attr.td clang/test/SemaCUDA/device-use-host-var.cu Removed: diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td index 51f654fc7613..79902c8f5b89 100644 --- a/clang/include/clang/Basic/Attr.td +++ b/clang/include/clang/Basic/Attr.td @@ -1079,6 +1079,7 @@ def CUDADeviceBuiltinSurfaceType : InheritableAttr { let LangOpts = [CUDA]; let Subjects = SubjectList<[CXXRecord]>; let Documentation = [CUDADeviceBuiltinSurfaceTypeDocs]; + let MeaningfulToClassTemplateDefinition = 1; } def CUDADeviceBuiltinTextureType : InheritableAttr { @@ -1087,6 +1088,7 @@ def CUDADeviceBuiltinTextureType : InheritableAttr { let LangOpts = [CUDA]; let Subjects = SubjectList<[CXXRecord]>; let Documentation = [CUDADeviceBuiltinTextureTypeDocs]; + let MeaningfulToClassTemplateDefinition = 1; } def CUDAGlobal : InheritableAttr { diff --git a/clang/test/SemaCUDA/device-use-host-var.cu b/clang/test/SemaCUDA/device-use-host-var.cu index cf5514610a42..c8ef7dbbb18d 100644 --- a/clang/test/SemaCUDA/device-use-host-var.cu +++ b/clang/test/SemaCUDA/device-use-host-var.cu @@ -158,3 +158,23 @@ void dev_lambda_capture_by_copy(int *out) { }); } +// Texture references are special. As far as C++ is concerned they are host +// variables that are referenced from device code. However, they are handled +// very diff erently by the compiler under the hood and such references are +// allowed. Compiler should produce no warning here, but it should diagnose the +// same case without the device_builtin_texture_type attribute. +template +struct __attribute__((device_builtin_texture_type)) texture { + static texture ref; + __device__ int c() { +auto &x = ref; + } +}; + +template +struct not_a_texture { + static not_a_texture ref; + __device__ int c() { +auto &x = ref; // dev-error {{reference to __host__ variable 'ref' in __device__ function}} + } +}; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] 4326792 - [CUDA] Another attempt to fix early inclusion of from libstdc++
Author: Artem Belevich Date: 2020-12-04T12:03:35-08:00 New Revision: 43267929423bf768bbbcc65e47a07e37af7f4e22 URL: https://github.com/llvm/llvm-project/commit/43267929423bf768bbbcc65e47a07e37af7f4e22 DIFF: https://github.com/llvm/llvm-project/commit/43267929423bf768bbbcc65e47a07e37af7f4e22.diff LOG: [CUDA] Another attempt to fix early inclusion of from libstdc++ Previous patch (9a465057a64dba) did not fix the problem. https://bugs.llvm.org/show_bug.cgi?id=48228 If the is included too early, before CUDA-specific defines are available, just include-next the standard and undo the include guard. CUDA-specific variants of operator new/delete will be declared if/when is used from the CUDA source itself, when all CUDA-related macros are available. Differential Revision: https://reviews.llvm.org/D91807 Added: Modified: clang/lib/Headers/cuda_wrappers/new Removed: diff --git a/clang/lib/Headers/cuda_wrappers/new b/clang/lib/Headers/cuda_wrappers/new index 47690f1152fe..7f255314056a 100644 --- a/clang/lib/Headers/cuda_wrappers/new +++ b/clang/lib/Headers/cuda_wrappers/new @@ -26,6 +26,13 @@ #include_next +#if !defined(__device__) +// The header has been included too early from the standard C++ library +// and CUDA-specific macros are not available yet. +// Undo the include guard and try again later. +#undef __CLANG_CUDA_WRAPPERS_NEW +#else + #pragma push_macro("CUDA_NOEXCEPT") #if __cplusplus >= 201103L #define CUDA_NOEXCEPT noexcept @@ -33,76 +40,67 @@ #define CUDA_NOEXCEPT #endif -#pragma push_macro("__DEVICE__") -#if defined __device__ -#define __DEVICE__ __device__ -#else -// has been included too early from the standard libc++ headers and the -// standard CUDA macros are not available yet. We have to define our own. -#define __DEVICE__ __attribute__((device)) -#endif - // Device overrides for non-placement new and delete. -__DEVICE__ inline void *operator new(__SIZE_TYPE__ size) { +__device__ inline void *operator new(__SIZE_TYPE__ size) { if (size == 0) { size = 1; } return ::malloc(size); } -__DEVICE__ inline void *operator new(__SIZE_TYPE__ size, +__device__ inline void *operator new(__SIZE_TYPE__ size, const std::nothrow_t &) CUDA_NOEXCEPT { return ::operator new(size); } -__DEVICE__ inline void *operator new[](__SIZE_TYPE__ size) { +__device__ inline void *operator new[](__SIZE_TYPE__ size) { return ::operator new(size); } -__DEVICE__ inline void *operator new[](__SIZE_TYPE__ size, +__device__ inline void *operator new[](__SIZE_TYPE__ size, const std::nothrow_t &) { return ::operator new(size); } -__DEVICE__ inline void operator delete(void* ptr) CUDA_NOEXCEPT { +__device__ inline void operator delete(void* ptr) CUDA_NOEXCEPT { if (ptr) { ::free(ptr); } } -__DEVICE__ inline void operator delete(void *ptr, +__device__ inline void operator delete(void *ptr, const std::nothrow_t &) CUDA_NOEXCEPT { ::operator delete(ptr); } -__DEVICE__ inline void operator delete[](void* ptr) CUDA_NOEXCEPT { +__device__ inline void operator delete[](void* ptr) CUDA_NOEXCEPT { ::operator delete(ptr); } -__DEVICE__ inline void operator delete[](void *ptr, +__device__ inline void operator delete[](void *ptr, const std::nothrow_t &) CUDA_NOEXCEPT { ::operator delete(ptr); } // Sized delete, C++14 only. #if __cplusplus >= 201402L -__DEVICE__ inline void operator delete(void *ptr, +__device__ inline void operator delete(void *ptr, __SIZE_TYPE__ size) CUDA_NOEXCEPT { ::operator delete(ptr); } -__DEVICE__ inline void operator delete[](void *ptr, +__device__ inline void operator delete[](void *ptr, __SIZE_TYPE__ size) CUDA_NOEXCEPT { ::operator delete(ptr); } #endif // Device overrides for placement new and delete. -__DEVICE__ inline void *operator new(__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT { +__device__ inline void *operator new(__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT { return __ptr; } -__DEVICE__ inline void *operator new[](__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT { +__device__ inline void *operator new[](__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT { return __ptr; } -__DEVICE__ inline void operator delete(void *, void *) CUDA_NOEXCEPT {} -__DEVICE__ inline void operator delete[](void *, void *) CUDA_NOEXCEPT {} +__device__ inline void operator delete(void *, void *) CUDA_NOEXCEPT {} +__device__ inline void operator delete[](void *, void *) CUDA_NOEXCEPT {} -#pragma pop_macro("__DEVICE__") #pragma pop_macro("CUDA_NOEXCEPT") +#endif // __device__ #endif // include guard ___ llvm-branch-commits mailing list llvm-branch-c
[llvm-branch-commits] [clang] 016e4eb - [DWARF] Allow toolchain to adjust specified DWARF version.
Author: Artem Belevich Date: 2020-12-09T16:34:34-08:00 New Revision: 016e4ebfde28d6bb1ab6399fc8abd8cfc6a1d9fd URL: https://github.com/llvm/llvm-project/commit/016e4ebfde28d6bb1ab6399fc8abd8cfc6a1d9fd DIFF: https://github.com/llvm/llvm-project/commit/016e4ebfde28d6bb1ab6399fc8abd8cfc6a1d9fd.diff LOG: [DWARF] Allow toolchain to adjust specified DWARF version. This is needed for CUDA compilation where NVPTX back-end only supports DWARF2, but host compilation should be allowed to use newer DWARF versions. Differential Revision: https://reviews.llvm.org/D92617 Added: clang/test/Driver/cuda-omp-unsupported-debug-options.cu clang/test/Driver/dwarf-target-version-clamp.cu Modified: clang/include/clang/Basic/DiagnosticDriverKinds.td clang/include/clang/Driver/ToolChain.h clang/lib/Driver/ToolChains/Clang.cpp clang/lib/Driver/ToolChains/Cuda.h Removed: clang/test/Driver/cuda-unsupported-debug-options.cu clang/test/Driver/openmp-unsupported-debug-options.c diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td b/clang/include/clang/Basic/DiagnosticDriverKinds.td index 8fd7a805589d..8ca176d3bb43 100644 --- a/clang/include/clang/Basic/DiagnosticDriverKinds.td +++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td @@ -292,6 +292,9 @@ def warn_drv_unsupported_opt_for_target : Warning< def warn_drv_unsupported_debug_info_opt_for_target : Warning< "debug information option '%0' is not supported for target '%1'">, InGroup; +def warn_drv_dwarf_version_limited_by_target : Warning< + "debug information option '%0' is not supported. It needs DWARF-%2 but target '%1' only provides DWARF-%3.">, + InGroup; def warn_c_kext : Warning< "ignoring -fapple-kext which is valid for C++ and Objective-C++ only">; def warn_ignoring_fdiscard_for_bitcode : Warning< diff --git a/clang/include/clang/Driver/ToolChain.h b/clang/include/clang/Driver/ToolChain.h index 7aa8ba7b1da9..28c37a44e1eb 100644 --- a/clang/include/clang/Driver/ToolChain.h +++ b/clang/include/clang/Driver/ToolChain.h @@ -27,6 +27,7 @@ #include "llvm/Support/VersionTuple.h" #include "llvm/Target/TargetOptions.h" #include +#include #include #include #include @@ -489,6 +490,11 @@ class ToolChain { // to the contrary. virtual unsigned GetDefaultDwarfVersion() const { return 4; } + // Some toolchains may have diff erent restrictions on the DWARF version and + // may need to adjust it. E.g. NVPTX may need to enforce DWARF2 even when host + // compilation uses DWARF5. + virtual unsigned getMaxDwarfVersion() const { return UINT_MAX; } + // True if the driver should assume "-fstandalone-debug" // in the absence of an option specifying otherwise, // provided that debugging was requested in the first place. diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 6c78a5d9555c..b092252791ff 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -3821,26 +3821,33 @@ static void RenderDebugOptions(const ToolChain &TC, const Driver &D, } } - unsigned DWARFVersion = 0; + unsigned RequestedDWARFVersion = 0; // DWARF version requested by the user + unsigned EffectiveDWARFVersion = 0; // DWARF version TC can generate. It may + // be lower than what the user wanted. unsigned DefaultDWARFVersion = ParseDebugDefaultVersion(TC, Args); if (EmitDwarf) { // Start with the platform default DWARF version -DWARFVersion = TC.GetDefaultDwarfVersion(); -assert(DWARFVersion && "toolchain default DWARF version must be nonzero"); +RequestedDWARFVersion = TC.GetDefaultDwarfVersion(); +assert(RequestedDWARFVersion && + "toolchain default DWARF version must be nonzero"); // If the user specified a default DWARF version, that takes precedence // over the platform default. if (DefaultDWARFVersion) - DWARFVersion = DefaultDWARFVersion; + RequestedDWARFVersion = DefaultDWARFVersion; // Override with a user-specified DWARF version if (GDwarfN) if (auto ExplicitVersion = DwarfVersionNum(GDwarfN->getSpelling())) -DWARFVersion = ExplicitVersion; +RequestedDWARFVersion = ExplicitVersion; +// Clamp effective DWARF version to the max supported by the toolchain. +EffectiveDWARFVersion = +std::min(RequestedDWARFVersion, TC.getMaxDwarfVersion()); } // -gline-directives-only supported only for the DWARF debug info. - if (DWARFVersion == 0 && DebugInfoKind == codegenoptions::DebugDirectivesOnly) + if (RequestedDWARFVersion == 0 && + DebugInfoKind == codegenoptions::DebugDirectivesOnly) DebugInfoKind = codegenoptions::NoDebugInfo; // We ignore flag -gstrict-dwarf for now. @@ -3900,9 +3907,15 @@ static void RenderDebugOptions(const ToolChain &TC, const Driver &D,
[llvm-branch-commits] [clang] 0df1362 - [CUDA] Fix order of memcpy arguments in __shfl_*(<64-bit type>).
Author: Artem Belevich Date: 2020-01-24T15:07:22-08:00 New Revision: 0df13627c6a4006de39e5f01d81a338793b0e82b URL: https://github.com/llvm/llvm-project/commit/0df13627c6a4006de39e5f01d81a338793b0e82b DIFF: https://github.com/llvm/llvm-project/commit/0df13627c6a4006de39e5f01d81a338793b0e82b.diff LOG: [CUDA] Fix order of memcpy arguments in __shfl_*(<64-bit type>). Wrong argument order resulted in broken shfl ops for 64-bit types. (cherry picked from commit cc14de88da27a8178976972bdc8211c31f7ca9ae) Added: Modified: clang/lib/Headers/__clang_cuda_intrinsics.h Removed: diff --git a/clang/lib/Headers/__clang_cuda_intrinsics.h b/clang/lib/Headers/__clang_cuda_intrinsics.h index b67461a146fc..c7bff6a9d8fe 100644 --- a/clang/lib/Headers/__clang_cuda_intrinsics.h +++ b/clang/lib/Headers/__clang_cuda_intrinsics.h @@ -45,7 +45,7 @@ _Static_assert(sizeof(__val) == sizeof(__Bits)); \ _Static_assert(sizeof(__Bits) == 2 * sizeof(int)); \ __Bits __tmp; \ -memcpy(&__val, &__tmp, sizeof(__val)); \ +memcpy(&__tmp, &__val, sizeof(__val));\ __tmp.__a = ::__FnName(__tmp.__a, __offset, __width); \ __tmp.__b = ::__FnName(__tmp.__b, __offset, __width); \ long long __ret; \ @@ -129,7 +129,7 @@ __MAKE_SHUFFLES(__shfl_xor, __nvvm_shfl_bfly_i32, __nvvm_shfl_bfly_f32, 0x1f, _Static_assert(sizeof(__val) == sizeof(__Bits)); \ _Static_assert(sizeof(__Bits) == 2 * sizeof(int)); \ __Bits __tmp; \ -memcpy(&__val, &__tmp, sizeof(__val)); \ +memcpy(&__tmp, &__val, sizeof(__val)); \ __tmp.__a = ::__FnName(__mask, __tmp.__a, __offset, __width); \ __tmp.__b = ::__FnName(__mask, __tmp.__b, __offset, __width); \ long long __ret; \ ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NVPTX] Promote v2i8 to v2i16 (#111189) (PR #115081)
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/115081 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NVPTX] Promote v2i8 to v2i16 (#111189) (PR #115081)
https://github.com/Artem-B updated https://github.com/llvm/llvm-project/pull/115081 >From ed0fe30e7d4da94b13018e563971524e013c512f Mon Sep 17 00:00:00 2001 From: Manasij Mukherjee Date: Fri, 4 Oct 2024 15:15:30 -0600 Subject: [PATCH] [NVPTX] Promote v2i8 to v2i16 (#89) Promote v2i8 to v2i16, fixes a crash. Re-enable a test in NVPTX/vector-returns.ll Partial cherry-pick of fda2fea w/o the test which does not exist in release/19.x https://github.com/llvm/llvm-project/issues/104864 --- llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp | 4 1 file changed, 4 insertions(+) diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp index 6975412ce5d35b..b2153a7afe7365 100644 --- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp @@ -229,6 +229,10 @@ static void ComputePTXValueVTs(const TargetLowering &TLI, const DataLayout &DL, // v*i8 are formally lowered as v4i8 EltVT = MVT::v4i8; NumElts = (NumElts + 3) / 4; + } else if (EltVT.getSimpleVT() == MVT::i8 && NumElts == 2) { +// v2i8 is promoted to v2i16 +NumElts = 1; +EltVT = MVT::v2i16; } for (unsigned j = 0; j != NumElts; ++j) { ValueVTs.push_back(EltVT); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NVPTX] Promote v2i8 to v2i16 (#111189) (PR #115081)
Artem-B wrote: The CI check was unhappy because PR was based on the tree before 19.1.4. I've rebased it on top of the most recent commit. https://github.com/llvm/llvm-project/pull/115081 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NVPTX] Promote v2i8 to v2i16 (#111189) (PR #115081)
Artem-B wrote: Thank you for merging it. I do not think the fix is interesting enough for that. https://github.com/llvm/llvm-project/pull/115081 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NVPTX] Promote v2i8 to v2i16 (#111189) (PR #115081)
Artem-B wrote: Done. https://github.com/llvm/llvm-project/pull/115081 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NVPTX] Promote v2i8 to v2i16 (#111189) (PR #115081)
https://github.com/Artem-B updated https://github.com/llvm/llvm-project/pull/115081 >From e0a9e99b9efbeae7eb975cc4ebaf1805566195f6 Mon Sep 17 00:00:00 2001 From: Manasij Mukherjee Date: Fri, 4 Oct 2024 15:15:30 -0600 Subject: [PATCH] [NVPTX] Promote v2i8 to v2i16 (#89) Promote v2i8 to v2i16, fixes a crash. Re-enable a test in NVPTX/vector-returns.ll Partial cherry-pick of fda2fea w/o the test which does not exist in release/19.x https://github.com/llvm/llvm-project/issues/104864 --- llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp | 4 1 file changed, 4 insertions(+) diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp index 6975412ce5d35b..b2153a7afe7365 100644 --- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp @@ -229,6 +229,10 @@ static void ComputePTXValueVTs(const TargetLowering &TLI, const DataLayout &DL, // v*i8 are formally lowered as v4i8 EltVT = MVT::v4i8; NumElts = (NumElts + 3) / 4; + } else if (EltVT.getSimpleVT() == MVT::i8 && NumElts == 2) { +// v2i8 is promoted to v2i16 +NumElts = 1; +EltVT = MVT::v2i16; } for (unsigned j = 0; j != NumElts; ++j) { ValueVTs.push_back(EltVT); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NVPTX] Promote v2i8 to v2i16 (#111189) (PR #115081)
https://github.com/Artem-B updated https://github.com/llvm/llvm-project/pull/115081 >From 02930b87faeb490505b22f588757a18744248b6f Mon Sep 17 00:00:00 2001 From: Manasij Mukherjee Date: Fri, 4 Oct 2024 15:15:30 -0600 Subject: [PATCH] [NVPTX] Promote v2i8 to v2i16 (#89) Promote v2i8 to v2i16, fixes a crash. Re-enable a test in NVPTX/vector-returns.ll Partial cherry-pick of fda2fea w/o the test which does not exist in release/19.x https://github.com/llvm/llvm-project/issues/104864 --- llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp | 4 1 file changed, 4 insertions(+) diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp index 6975412ce5d35b..b2153a7afe7365 100644 --- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp @@ -229,6 +229,10 @@ static void ComputePTXValueVTs(const TargetLowering &TLI, const DataLayout &DL, // v*i8 are formally lowered as v4i8 EltVT = MVT::v4i8; NumElts = (NumElts + 3) / 4; + } else if (EltVT.getSimpleVT() == MVT::i8 && NumElts == 2) { +// v2i8 is promoted to v2i16 +NumElts = 1; +EltVT = MVT::v2i16; } for (unsigned j = 0; j != NumElts; ++j) { ValueVTs.push_back(EltVT); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NVPTX] Promote v2i8 to v2i16 (#111189) (PR #115081)
https://github.com/Artem-B updated https://github.com/llvm/llvm-project/pull/115081 >From ed0fe30e7d4da94b13018e563971524e013c512f Mon Sep 17 00:00:00 2001 From: Manasij Mukherjee Date: Fri, 4 Oct 2024 15:15:30 -0600 Subject: [PATCH] [NVPTX] Promote v2i8 to v2i16 (#89) Promote v2i8 to v2i16, fixes a crash. Re-enable a test in NVPTX/vector-returns.ll Partial cherry-pick of fda2fea w/o the test which does not exist in release/19.x https://github.com/llvm/llvm-project/issues/104864 --- llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp | 4 1 file changed, 4 insertions(+) diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp index 6975412ce5d35b..b2153a7afe7365 100644 --- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp @@ -229,6 +229,10 @@ static void ComputePTXValueVTs(const TargetLowering &TLI, const DataLayout &DL, // v*i8 are formally lowered as v4i8 EltVT = MVT::v4i8; NumElts = (NumElts + 3) / 4; + } else if (EltVT.getSimpleVT() == MVT::i8 && NumElts == 2) { +// v2i8 is promoted to v2i16 +NumElts = 1; +EltVT = MVT::v2i16; } for (unsigned j = 0; j != NumElts; ++j) { ValueVTs.push_back(EltVT); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NVPTX] Promote v2i8 to v2i16 (#111189) (PR #115081)
Artem-B wrote: That would be me -- I did the review of the original patch and just doing the legwork here to cherry-pick it into the 19.x branch. If you need somebody else, I'd ask @AlexMaclean https://github.com/llvm/llvm-project/pull/115081 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 02930b8 - [NVPTX] Promote v2i8 to v2i16 (#111189)
Author: Manasij Mukherjee Date: 2024-11-19T14:15:13-08:00 New Revision: 02930b87faeb490505b22f588757a18744248b6f URL: https://github.com/llvm/llvm-project/commit/02930b87faeb490505b22f588757a18744248b6f DIFF: https://github.com/llvm/llvm-project/commit/02930b87faeb490505b22f588757a18744248b6f.diff LOG: [NVPTX] Promote v2i8 to v2i16 (#89) Promote v2i8 to v2i16, fixes a crash. Re-enable a test in NVPTX/vector-returns.ll Partial cherry-pick of fda2fea w/o the test which does not exist in release/19.x https://github.com/llvm/llvm-project/issues/104864 Added: Modified: llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp Removed: diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp index 6975412ce5d35b..b2153a7afe7365 100644 --- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp @@ -229,6 +229,10 @@ static void ComputePTXValueVTs(const TargetLowering &TLI, const DataLayout &DL, // v*i8 are formally lowered as v4i8 EltVT = MVT::v4i8; NumElts = (NumElts + 3) / 4; + } else if (EltVT.getSimpleVT() == MVT::i8 && NumElts == 2) { +// v2i8 is promoted to v2i16 +NumElts = 1; +EltVT = MVT::v2i16; } for (unsigned j = 0; j != NumElts; ++j) { ValueVTs.push_back(EltVT); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [CUDA] Add support for sm101 and sm120 target architectures (#127187) (PR #127918)
https://github.com/Artem-B approved this pull request. I was the one proposing to merge this change, so I assumed that it's the release maintainers who'd need to stamp it. I am all for merging it. https://github.com/llvm/llvm-project/pull/127918 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [CUDA] Add support for sm101 and sm120 target architectures (#127187) (PR #127918)
Artem-B wrote: > patch is first reviewed by someone familiar with the code. That would be me, as I am the maintainer of CUDA code and had reviewed the original PR. > They approve the patch, and describe how the fix meets the release branch > patch requirements > (https://llvm.org/docs/HowToReleaseLLVM.html#release-patch-rules). This patch fits item #3 on the rule list "or completion of features that were started before the branch was created. " These changes allow clang users to compile CUDA code with just-released cuda-12.8 which adds these new GPU variants. https://github.com/llvm/llvm-project/pull/127918 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [CUDA] Add support for sm101 and sm120 target architectures (#127187) (PR #127918)
Artem-B wrote: ``` # CUDA - Clang now supports CUDA compilation with CUDA SDK up to v12.8 - Clang can now target sm_100, sm_101, and sm_120 GPUs (Blackwell) ``` https://github.com/llvm/llvm-project/pull/127918 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] clang: Read the address space from the ABIArgInfo (PR #138865)
@@ -5384,16 +5384,16 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, if (!NeedCopy) { // Skip the extra memcpy call. llvm::Value *V = getAsNaturalPointerTo(Addr, I->Ty); - auto *T = llvm::PointerType::get( - CGM.getLLVMContext(), CGM.getDataLayout().getAllocaAddrSpace()); + auto *T = llvm::PointerType::get(CGM.getLLVMContext(), + ArgInfo.getIndirectAddrSpace()); // FIXME: This should not depend on the language address spaces, and // only the contextual values. If the address space mismatches, see if // we can look through a cast to a compatible address space value, // otherwise emit a copy. Artem-B wrote: Thank you for the details. It makes sense now. https://github.com/llvm/llvm-project/pull/138865 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits