[llvm] [clang] [CUDA] Add support for CUDA-12.3 and sm_90a (PR #74895)

2023-12-11 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B updated https://github.com/llvm/llvm-project/pull/74895 >From eace5f13ee62c770a84cdaae441d4c1c6eeb07c2 Mon Sep 17 00:00:00 2001 From: Artem Belevich Date: Wed, 6 Dec 2023 12:11:38 -0800 Subject: [PATCH 1/3] [CUDA] Add support for CUDA-12.3 and sm_90a --- clang/docs/

[llvm] [clang] [CUDA] Add support for CUDA-12.3 and sm_90a (PR #74895)

2023-12-11 Thread Artem Belevich via cfe-commits
Artem-B wrote: Tested the changes with cuda test-suite, with cuda-12.1 and 12.3 targeting `sm_{60,70,80,90,90a}`. https://github.com/llvm/llvm-project/pull/74895 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/ma

[llvm] [clang] [CUDA] Add support for CUDA-12.3 and sm_90a (PR #74895)

2023-12-11 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B closed https://github.com/llvm/llvm-project/pull/74895 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-14 Thread Artem Belevich via cfe-commits
Artem-B wrote: Just a FYI, that recent NVIDIA GPUs have introduced a concept of [thread block cluster](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#thread-block-clusters). We may need another level of granularity between the block and device. https://github.com/llvm/llvm-

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-14 Thread Artem Belevich via cfe-commits
Artem-B wrote: > Nvidia backend doesn't handle scoped atomics at all yet Yeah, it's on my ever growing todo. :-( https://github.com/llvm/llvm-project/pull/72280 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/m

[clang] [CUDA][HIP] make trivial ctor/dtor host device (PR #72394)

2023-11-15 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/72394 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] make trivial ctor/dtor host device (PR #72394)

2023-11-15 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM with a couple of nits. https://github.com/llvm/llvm-project/pull/72394 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] make trivial ctor/dtor host device (PR #72394)

2023-11-15 Thread Artem Belevich via cfe-commits
@@ -12,7 +12,7 @@ extern "C" void host_fn() {} struct Dummy {}; struct S { - S() {} + S() { x = 1; } Artem-B wrote: Can we make the purpose of non-trivial constructor more descriptive, here and in other places? E.g. `S() { static int nontrivial_ctor = 1; }

[clang] [CUDA][HIP] make trivial ctor/dtor host device (PR #72394)

2023-11-15 Thread Artem Belevich via cfe-commits
@@ -772,6 +772,26 @@ void Sema::maybeAddCUDAHostDeviceAttrs(FunctionDecl *NewD, NewD->addAttr(CUDADeviceAttr::CreateImplicit(Context)); } +// If a trivial ctor/dtor has no host/device +// attributes, make it implicitly host device function. +void Sema::maybeAddCUDAHostDevice

[clang] [CUDA][HIP] make trivial ctor/dtor host device (PR #72394)

2023-11-18 Thread Artem Belevich via cfe-commits
Artem-B wrote: We've found a problem with the patch. https://godbolt.org/z/jcKo34vzG ``` template class C { explicit C() {}; }; template <> C::C() {}; ``` :6:21: error: __host__ function 'C' cannot overload __host__ __device__ function 'C' 6 | template <> C::C() {}; |

[clang] [CUDA][HIP] ignore implicit host/device attr for override (PR #72815)

2023-11-19 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/72815 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] ignore implicit host/device attr for override (PR #72815)

2023-11-19 Thread Artem Belevich via cfe-commits
@@ -1000,13 +1000,9 @@ void Sema::checkCUDATargetOverload(FunctionDecl *NewFD, // should have the same implementation on both sides. if (NewTarget != OldTarget && ((NewTarget == CFT_HostDevice && - !(LangOpts.OffloadImplicitHostDeviceTemplates && -

[clang] [CUDA][HIP] ignore implicit host/device attr for override (PR #72815)

2023-11-19 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM, with one question. https://github.com/llvm/llvm-project/pull/72815 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] ignore implicit host/device attr for override (PR #72815)

2023-11-20 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/72815 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] Fix deduction guide (PR #69366)

2023-10-30 Thread Artem Belevich via cfe-commits
Artem-B wrote: @ldionne - Can you take a look if that would have unintended consequences for libc++? https://github.com/llvm/llvm-project/pull/69366 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinf

[clang] [HIP] fix stack marking for -fgpu-rdc (PR #72782)

2023-11-27 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/72782 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] allow trivial ctor/dtor in device var init (PR #73140)

2023-11-30 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/73140 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA] work around more __noinline__ conflicts with libc++ (PR #74123)

2023-12-01 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B created https://github.com/llvm/llvm-project/pull/74123 https://github.com/llvm/llvm-project/pull/73838 >From 71e24fc704c82c11162313613691d09b9a653bd5 Mon Sep 17 00:00:00 2001 From: Artem Belevich Date: Fri, 1 Dec 2023 10:37:08 -0800 Subject: [PATCH] [CUDA] work arou

[clang] [CUDA] work around more __noinline__ conflicts with libc++ (PR #74123)

2023-12-01 Thread Artem Belevich via cfe-commits
Artem-B wrote: Yes, I've mentioned that in https://github.com/llvm/llvm-project/pull/73838. However, we need something to fix the issue right now while we're figuring out a better solution. In any case `__noinline__` is unlikely to be widely used, so the wrappers may be manageable, at least

[clang] [CUDA] work around more __noinline__ conflicts with libc++ (PR #74123)

2023-12-01 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B updated https://github.com/llvm/llvm-project/pull/74123 >From 71e24fc704c82c11162313613691d09b9a653bd5 Mon Sep 17 00:00:00 2001 From: Artem Belevich Date: Fri, 1 Dec 2023 10:37:08 -0800 Subject: [PATCH 1/2] [CUDA] work around more __noinline__ conflicts with libc++

[clang] [CUDA] work around more __noinline__ conflicts with libc++ (PR #74123)

2023-12-01 Thread Artem Belevich via cfe-commits
Artem-B wrote: > I think we can find a solution to work around this in libc++ within a > reasonable timeframe OK. I'll hold off on landing the patch. I believe we're not blocked on it at the moment. https://github.com/llvm/llvm-project/pull/74123

[clang] [CUDA] work around more __noinline__ conflicts with libc++ (PR #74123)

2023-12-01 Thread Artem Belevich via cfe-commits
Artem-B wrote: > FWIW I am not thrilled about using `__config` here. That header is an > implementation detail of libc++ and defining it and relying on it is somewhat > brittle. I'm all for having it fixed in libc++ or in CUDA SDK. Barring that, working around the specific implementation deta

[clang] [CUDA] work around more __noinline__ conflicts with libc++ (PR #74123)

2023-12-01 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B updated https://github.com/llvm/llvm-project/pull/74123 >From 71e24fc704c82c11162313613691d09b9a653bd5 Mon Sep 17 00:00:00 2001 From: Artem Belevich Date: Fri, 1 Dec 2023 10:37:08 -0800 Subject: [PATCH 1/3] [CUDA] work around more __noinline__ conflicts with libc++

[clang] [CUDA][Win32] Add `fma(long double,..)` to math forward declares. (PR #73756)

2023-12-04 Thread Artem Belevich via cfe-commits
@@ -70,6 +70,9 @@ __DEVICE__ double floor(double); __DEVICE__ float floor(float); __DEVICE__ double fma(double, double, double); __DEVICE__ float fma(float, float, float); +#ifdef _MSC_VER +__DEVICE__ long double fma(long double, long double, long double); Arte

[clang] [CUDA][Win32] Add `fma(long double,..)` to math forward declares. (PR #73756)

2023-12-04 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/73756 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] add function attrbute amdgpu-lib-fun (PR #74737)

2024-01-09 Thread Artem Belevich via cfe-commits
Artem-B wrote: This sounds like it may be useful outside of AMDGPU back-end. @jhuber6 this is something that may come handy for implementing general library functions. https://github.com/llvm/llvm-project/pull/74737 ___ cfe-commits mailing list cfe-c

[clang] [AMDGPU] add function attrbute amdgpu-lib-fun (PR #74737)

2024-01-09 Thread Artem Belevich via cfe-commits
Artem-B wrote: I was thinking of implementing libm/libc for nvptx, which would produce an IR library . We'll still need to keep the functions around if they are not used explicitly, because we may need them to fulfill libcalls later in the compilation pipeline. Sort of a libdevice replacement

[clang] [AMDGPU] add function attrbute amdgpu-lib-fun (PR #74737)

2024-01-09 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/74737 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] fe528e7 - [CUDA] Don't call inferCUDATargetForImplicitSpecialMember too early.

2022-03-31 Thread Artem Belevich via cfe-commits
Author: Artem Belevich Date: 2022-03-31T13:49:12-07:00 New Revision: fe528e72163371e10242f4748dab687eef30a1f9 URL: https://github.com/llvm/llvm-project/commit/fe528e72163371e10242f4748dab687eef30a1f9 DIFF: https://github.com/llvm/llvm-project/commit/fe528e72163371e10242f4748dab687eef30a1f9.diff

[clang] 3e0e556 - [CUDA] Fixed sm version constrain for __bmma_m8n8k128_mma_and_popc_b1.

2022-08-05 Thread Artem Belevich via cfe-commits
Author: Jack Kirk Date: 2022-08-05T12:14:06-07:00 New Revision: 3e0e5568a6a8c744d26f79a1e55360fe2655867c URL: https://github.com/llvm/llvm-project/commit/3e0e5568a6a8c744d26f79a1e55360fe2655867c DIFF: https://github.com/llvm/llvm-project/commit/3e0e5568a6a8c744d26f79a1e55360fe2655867c.diff LOG

[clang] d774b4a - [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.

2021-07-15 Thread Artem Belevich via cfe-commits
Author: Artem Belevich Date: 2021-07-15T12:02:09-07:00 New Revision: d774b4aa5eac785ffe40009091667521e183df40 URL: https://github.com/llvm/llvm-project/commit/d774b4aa5eac785ffe40009091667521e183df40 DIFF: https://github.com/llvm/llvm-project/commit/d774b4aa5eac785ffe40009091667521e183df40.diff

[clang] [llvm] [InstCombine] Canonicalize `(sitofp x)` -> `(uitofp x)` if `x >= 0` (PR #82404)

2024-03-14 Thread Artem Belevich via cfe-commits
Artem-B wrote: We happen have a back-end where we do not have conversion instructions between unsigned int and FP, so this patch complicates things. Would it make sense to enable this canonicalization only if the target wants it? https://github.com/llvm/llvm-project/pull/82404 ___

[clang] [HIP][NFC] Refactor managed var codegen (PR #85976)

2024-03-20 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/85976 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP][NFC] Refactor managed var codegen (PR #85976)

2024-03-20 Thread Artem Belevich via cfe-commits
@@ -1160,9 +1152,8 @@ void CGNVCUDARuntime::createOffloadingEntries() { // Returns module constructor to be added. llvm::Function *CGNVCUDARuntime::finalizeModule() { + transformManagedVars(); Artem-B wrote: This does not look like "NFC" as we now perform th

[clang] [HIP][NFC] Refactor managed var codegen (PR #85976)

2024-03-20 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM, sans the "NFC" part in the description. https://github.com/llvm/llvm-project/pull/85976 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo

[clang] [llvm] [Offload] Change unregister library to use `atexit` instead of destructor (PR #86830)

2024-03-27 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/86830 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Offload] Change unregister library to use `atexit` instead of destructor (PR #86830)

2024-03-27 Thread Artem Belevich via cfe-commits
@@ -186,57 +186,62 @@ GlobalVariable *createBinDesc(Module &M, ArrayRef> Bufs, ".omp_offloading.descriptor" + Suffix); } -void createRegisterFunction(Module &M, GlobalVariable *BinDesc, -StringRef Suffix) { +Function *cr

[clang] [llvm] [Offload] Change unregister library to use `atexit` instead of destructor (PR #86830)

2024-03-27 Thread Artem Belevich via cfe-commits
@@ -186,57 +186,62 @@ GlobalVariable *createBinDesc(Module &M, ArrayRef> Bufs, ".omp_offloading.descriptor" + Suffix); } -void createRegisterFunction(Module &M, GlobalVariable *BinDesc, -StringRef Suffix) { +Function *cr

[clang] [llvm] [Offload] Change unregister library to use `atexit` instead of destructor (PR #86830)

2024-03-27 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/86830 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported (PR #81033)

2024-02-07 Thread Artem Belevich via cfe-commits
Artem-B wrote: > This patch, which simply makes it legal on all architectures but do nothing > is it's older than sm_70. I do not think this is the right thing to do. "do nothing" is not what one would expect from a `nanosleep`. Let's unpack your problem a bit. __nvvm_reflect() is probably c

[clang] [llvm] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported (PR #81033)

2024-02-07 Thread Artem Belevich via cfe-commits
Artem-B wrote: > Okay, `__nvvm_reflect` doesn't work fully here because the `nanosleep` > builtin I added requires `sm_70` at the clang level. Either means I'd need to > go back to inline assembly or remove that requirement at least from clang so > it's a backend failure. The question is -- w

[clang] [llvm] [LinkerWrapper] Allow 'all' as a generic bundled architecture (PR #81193)

2024-02-08 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/81193 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LinkerWrapper] Allow 'all' as a generic bundled architecture (PR #81193)

2024-02-08 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/81193 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add clang builtin for `__nvvm_reflect` intrinsic (PR #81277)

2024-02-09 Thread Artem Belevich via cfe-commits
Artem-B wrote: > We should expose it as an intrinsic I think you mean `builtin` here. https://github.com/llvm/llvm-project/pull/81277 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add clang builtin for `__nvvm_reflect` intrinsic (PR #81277)

2024-02-09 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM overall. https://github.com/llvm/llvm-project/pull/81277 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add clang builtin for `__nvvm_reflect` intrinsic (PR #81277)

2024-02-09 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/81277 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add clang builtin for `__nvvm_reflect` intrinsic (PR #81277)

2024-02-09 Thread Artem Belevich via cfe-commits
@@ -1624,8 +1624,9 @@ def int_nvvm_compiler_error : def int_nvvm_compiler_warn : Intrinsic<[], [llvm_anyptr_ty], [], "llvm.nvvm.compiler.warn">; -def int_nvvm_reflect : - Intrinsic<[llvm_i32_ty], [llvm_anyptr_ty], [IntrNoMem], "llvm.nvvm.reflect">; +def int_nvvm_reflect :

[clang] [llvm] [NVPTX] Add clang builtin for `__nvvm_reflect` intrinsic (PR #81277)

2024-02-09 Thread Artem Belevich via cfe-commits
@@ -159,6 +159,7 @@ BUILTIN(__nvvm_read_ptx_sreg_pm3, "i", "n") BUILTIN(__nvvm_prmt, "UiUiUiUi", "") BUILTIN(__nvvm_exit, "v", "r") +BUILTIN(__nvvm_reflect, "UicC*", "r") Artem-B wrote: Now that we're exposing it to the end users. We should probably document

[clang] [llvm] [NVPTX] Add clang builtin for `__nvvm_reflect` intrinsic (PR #81277)

2024-02-09 Thread Artem Belevich via cfe-commits
Artem-B wrote: LGTM https://github.com/llvm/llvm-project/pull/81277 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add builtin support for 'globaltimer' (PR #79765)

2024-02-09 Thread Artem Belevich via cfe-commits
@@ -140,6 +140,17 @@ define void @test_exit() { ret void } +; CHECK-LABEL: test_globaltimer +define i64 @test_globaltimer() { +; CHECK: mov.u64 %r{{.*}}, %globaltimer; + %a = tail call i64 @llvm.nvvm.read.ptx.sreg.globaltimer() Artem-B wrote: Thise

[clang] [llvm] [LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime clocks (PR #81331)

2024-02-12 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/81331 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime clocks (PR #81331)

2024-02-12 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B commented: LGTM with few nits for general and NVPTX parts. https://github.com/llvm/llvm-project/pull/81331 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime clocks (PR #81331)

2024-02-12 Thread Artem Belevich via cfe-commits
@@ -2764,6 +2764,37 @@ Query for this feature with ``__has_builtin(__builtin_readcyclecounter)``. Note that even if present, its use may depend on run-time privilege or other OS controlled state. +``__builtin_readsteadycounter`` +-- + +``__builtin_

[clang] [llvm] [LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime clocks (PR #81331)

2024-02-12 Thread Artem Belevich via cfe-commits
@@ -104,6 +104,7 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const { case ISD::ATOMIC_STORE: return "AtomicStore"; case ISD::PCMARKER: return "PCMarker"; case ISD::READCYCLECOUNTER: return "ReadCycleCounter"; +

[clang] [CUDA] Correctly set CUDA default architecture (PR #84017)

2024-03-05 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/84017 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][CUDA] Disable float128 diagnostics for device compilation (PR #83918)

2024-03-05 Thread Artem Belevich via cfe-commits
@@ -4877,7 +4877,9 @@ void Sema::AddModeAttr(Decl *D, const AttributeCommonInfo &CI, NewElemTy = Context.getRealTypeForBitwidth(DestWidth, ExplicitType); if (NewElemTy.isNull()) { -Diag(AttrLoc, diag::err_machine_mode) << 1 /*Unsupported*/ << Name; +// Only emit

[clang] [clang][CUDA] Disable float128 diagnostics for device compilation (PR #83918)

2024-03-05 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,9 @@ +// CPU-side compilation on x86 (no errors expected). +// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -aux-triple nvptx64 -x cuda -fsyntax-only -verify %s + +// GPU-side compilation on x86 (no errors expected) +// RUN: %clang_cc1 -triple nvptx64 -aux-triple x

[clang] [clang][CUDA] Disable float128 diagnostics for device compilation (PR #83918)

2024-03-06 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/83918 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits
@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C, DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind); OffloadAction::DeviceDependences DDep; DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind); + + //

[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits
@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C, DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind); OffloadAction::DeviceDependences DDep; DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind); + + //

[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits
Artem-B wrote: > Should I make `shouldIncludePTX` default to `false` for the new driver? Yes, I think that's a better default. https://github.com/llvm/llvm-project/pull/84367 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.o

[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits
Artem-B wrote: > > > Should I make `shouldIncludePTX` default to `false` for the new driver? > > > > > > Yes, I think that's a better default. > > Done, now requires `--cuda-include-ptx=`. This may be worth adding to the release notes. https://github.com/llvm/llvm-project/pull/84367 ___

[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits
@@ -503,18 +503,20 @@ void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA, Exec, CmdArgs, Inputs, Output)); } -static bool shouldIncludePTX(const ArgList &Args, const char *gpu_arch) { - bool includePTX = true; - for (Arg *A : Args) { -if (!(A-

[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM overall, with docs/comment nits. https://github.com/llvm/llvm-project/pull/84367 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-com

[clang] [llvm] [HIP] add --offload-compression-level= option (PR #83605)

2024-03-08 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/83605 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [HIP] add --offload-compression-level= option (PR #83605)

2024-03-08 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM. https://github.com/llvm/llvm-project/pull/83605 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [HIP] add --offload-compression-level= option (PR #83605)

2024-03-08 Thread Artem Belevich via cfe-commits
@@ -2863,3 +2863,18 @@ void tools::addOutlineAtomicsArgs(const Driver &D, const ToolChain &TC, CmdArgs.push_back("+outline-atomics"); } } + +void tools::addOffloadCompressArgs(const llvm::opt::ArgList &TCArgs, + llvm::opt::ArgStringList

[clang] [HIP] fix host min/max in header (PR #82956)

2024-02-26 Thread Artem Belevich via cfe-commits
@@ -1306,15 +1306,68 @@ float min(float __x, float __y) { return __builtin_fminf(__x, __y); } __DEVICE__ double min(double __x, double __y) { return __builtin_fmin(__x, __y); } -#if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__) -__host__ inline static int min(int __a

[clang] [HIP] fix host min/max in header (PR #82956)

2024-02-26 Thread Artem Belevich via cfe-commits
@@ -1306,15 +1306,68 @@ float min(float __x, float __y) { return __builtin_fminf(__x, __y); } __DEVICE__ double min(double __x, double __y) { return __builtin_fmin(__x, __y); } -#if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__) -__host__ inline static int min(int __a

[clang] [HIP] fix host min/max in header (PR #82956)

2024-02-26 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/82956 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] fix host min/max in header (PR #82956)

2024-02-26 Thread Artem Belevich via cfe-commits
@@ -1306,15 +1306,73 @@ float min(float __x, float __y) { return __builtin_fminf(__x, __y); } __DEVICE__ double min(double __x, double __y) { return __builtin_fmin(__x, __y); } -#if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__) -__host__ inline static int min(int __a

[clang] [HIP] fix host min/max in header (PR #82956)

2024-02-26 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/82956 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] fix host min/max in header (PR #82956)

2024-02-28 Thread Artem Belevich via cfe-commits
Artem-B wrote: > Probably I need to define those functions with mixed args by default to avoid > regressions. Are there any other regressions? Can hupCUB be fixed intsead? While their use case is probably benign, I'd rather fix the user code, than propagate CUDA bugs into HIP. https://github

[clang] [llvm] [HIP] change compress level (PR #83605)

2024-03-01 Thread Artem Belevich via cfe-commits
@@ -942,20 +942,28 @@ CompressedOffloadBundle::compress(const llvm::MemoryBuffer &Input, Input.getBuffer().size()); llvm::compression::Format CompressionFormat; + int Level; - if (llvm::compression::zstd::isAvailable()) + if (llvm::compression::zstd::isAvailable(

[clang] [llvm] [HIP] add --offload-compression-level= option (PR #83605)

2024-03-04 Thread Artem Belevich via cfe-commits
@@ -942,20 +942,28 @@ CompressedOffloadBundle::compress(const llvm::MemoryBuffer &Input, Input.getBuffer().size()); llvm::compression::Format CompressionFormat; + int Level; - if (llvm::compression::zstd::isAvailable()) + if (llvm::compression::zstd::isAvailable(

[clang] [llvm] [HIP] add --offload-compression-level= option (PR #83605)

2024-03-04 Thread Artem Belevich via cfe-commits
@@ -906,6 +906,16 @@ CreateFileHandler(MemoryBuffer &FirstInput, } OffloadBundlerConfig::OffloadBundlerConfig() { + if (llvm::compression::zstd::isAvailable()) { +CompressionFormat = llvm::compression::Format::Zstd; +// Use a high zstd compress level by default for be

[clang] [HIP] fix host-used external kernel (PR #83870)

2024-03-04 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM in principle, but I'd run it by someone with more familiarity with linking quirks. @MaskRay PTAL, when you get a chance. https://github.com/llvm/llvm-project/pull/83870 ___ cfe-commits maili

[clang] [HIP] fix host-used external kernel (PR #83870)

2024-03-04 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/83870 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] fix host-used external kernel (PR #83870)

2024-03-04 Thread Artem Belevich via cfe-commits
@@ -24,6 +24,7 @@ // NEG-NOT: @__clang_gpu_used_external = {{.*}} @_Z7kernel2v // NEG-NOT: @__clang_gpu_used_external = {{.*}} @_Z7kernel3v +// XEG-NOT: @__clang_gpu_used_external = {{.*}} @_Z7kernel5v Artem-B wrote: Did you mean `NEG-NOT` ? https://github.c

[clang] [llvm] [HIP] add --offload-compression-level= option (PR #83605)

2024-03-04 Thread Artem Belevich via cfe-commits
@@ -906,6 +906,16 @@ CreateFileHandler(MemoryBuffer &FirstInput, } OffloadBundlerConfig::OffloadBundlerConfig() { + if (llvm::compression::zstd::isAvailable()) { +CompressionFormat = llvm::compression::Format::Zstd; +// Use a high zstd compress level by default for be

[clang] [llvm] [HIP] add --offload-compression-level= option (PR #83605)

2024-03-04 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/83605 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][CUDA] Disable float128 diagnostics for device compilation (PR #83918)

2024-03-04 Thread Artem Belevich via cfe-commits
@@ -4877,7 +4877,9 @@ void Sema::AddModeAttr(Decl *D, const AttributeCommonInfo &CI, NewElemTy = Context.getRealTypeForBitwidth(DestWidth, ExplicitType); if (NewElemTy.isNull()) { -Diag(AttrLoc, diag::err_machine_mode) << 1 /*Unsupported*/ << Name; +// Only emit

[clang] [clang][CUDA] Disable float128 diagnostics for device compilation (PR #83918)

2024-03-04 Thread Artem Belevich via cfe-commits
@@ -4877,7 +4877,9 @@ void Sema::AddModeAttr(Decl *D, const AttributeCommonInfo &CI, NewElemTy = Context.getRealTypeForBitwidth(DestWidth, ExplicitType); if (NewElemTy.isNull()) { -Diag(AttrLoc, diag::err_machine_mode) << 1 /*Unsupported*/ << Name; +// Only emit

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-30 Thread Artem Belevich via cfe-commits
Artem-B wrote: Considering that it's for the stand-alone compilation only, I'm not going to block this patch. That said, please add a `TODO` somewhere to address an issue w/ explicitly targeting generic variant. https://github.com/llvm/llvm-project/pull/79873

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-30 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/79873 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Artem Belevich via cfe-commits
Artem-B wrote: So, the idea is to carry two separate embedded offloading sections -- one for already fully linked GPU executables, and another for GPU objects to be linked at the final link stage. > We also use a sepcial section called something like omp_offloading_entries Typo in 'special' i

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/80066 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM. https://github.com/llvm/llvm-project/pull/80066 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Artem Belevich via cfe-commits
@@ -20,10 +20,12 @@ using EntryArrayTy = std::pair; /// \param EntryArray Optional pair pointing to the `__start` and `__stop` /// symbols holding the `__tgt_offload_entry` array. /// \param Suffix An optional suffix appended to the emitted symbols. +/// \param Relocatable Indi

[clang] [llvm] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Artem Belevich via cfe-commits
@@ -265,6 +329,11 @@ Error runLinker(ArrayRef Files, const ArgList &Args) { LinkerArgs.push_back(Arg); if (Error Err = executeCommands(LinkerPath, LinkerArgs)) return Err; + + if (Args.hasArg(OPT_relocatable)) +if (Error Err = relocateOffloadSection(Args, Execut

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Artem Belevich via cfe-commits
Artem-B wrote: Supporting such mixed mode opens an interesting set of issues we may need to consider going forward: * who/where/how runs initializers in the fully linked parts? * Are public functions in the fully linked parts visible to the functions in partially linked parts? In the full-rdc m

[clang] [llvm] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Artem Belevich via cfe-commits
Artem-B wrote: > I'm assuming you're talking about GPU-side constructors? I don't think the > CUDA runtime supports those, but OpenMP runs them when the image is loaded, > so it would handle both independantly. Yes. I'm thinking of the expectations from a C++ user standpoint, and this is one

[clang] [llvm] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Artem Belevich via cfe-commits
Artem-B wrote: > the idea is that it would be the desired effect if someone went out of their > way to do this GPU subset linking thing. That would only be true when someone owns the whole build. That will not be the case in practice. A large enough project is usually a bunch of libraries cre

[clang] [AMDGPU] Diagnose unaligned atomic (PR #80322)

2024-02-01 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. You may want to check that we can still disable the error with `-Wno-error=atomic-alignment` passed via top-level options. Other than that LGTM. https://github.com/llvm/llvm-project/pull/80322 __

[llvm] [clang] [flang] [InstCombine] Canonicalize constant GEPs to i8 source element type (PR #68882)

2024-02-02 Thread Artem Belevich via cfe-commits
Artem-B wrote: Another corner case here. Untyped GEP resulted in SimpifyCFG producing a `load(gep(argptr, cond ? 24 : 0))` instead of `load( cond ? gep(argptr, 24) : argptr)` it produced before the patch, and that eventually prevented SROA from processing that load. While it's not a bug in th

[clang] [Driver] Test ignored target-specific options for AMDGPU/NVPTX (PR #79222)

2024-01-23 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,7 @@ +/// Some target-specific options are ignored for GPU, so %clang exits with code 0. +// DEFINE: %{gpu_opts} = --cuda-gpu-arch=sm_60 --cuda-path=%S/Inputs/CUDA/usr/local/cuda --no-cuda-version-check Artem-B wrote: +1 for merging them. I'd also re

[clang] [Driver] Test ignored target-specific options for AMDGPU/NVPTX (PR #79222)

2024-01-23 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B deleted https://github.com/llvm/llvm-project/pull/79222 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Driver] Test ignored target-specific options for AMDGPU/NVPTX (PR #79222)

2024-01-23 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,7 @@ +/// Some target-specific options are ignored for GPU, so %clang exits with code 0. +// DEFINE: %{gpu_opts} = --cuda-gpu-arch=sm_60 --cuda-path=%S/Inputs/CUDA/usr/local/cuda --no-cuda-version-check Artem-B wrote: For the purpose of warning check

[clang] [Driver] Test ignored target-specific options for AMDGPU/NVPTX (PR #79222)

2024-01-23 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,7 @@ +/// Some target-specific options are ignored for GPU, so %clang exits with code 0. +// DEFINE: %{gpu_opts} = --cuda-gpu-arch=sm_60 --cuda-path=%S/Inputs/CUDA/usr/local/cuda --no-cuda-version-check +// DEFINE: %{check} = %clang -### -c %{gpu_opts} -mcmodel=medium

[lldb] [pstl] [llvm] [mlir] [libc] [compiler-rt] [libcxx] [openmp] [clang-tools-extra] [clang] [lld] [Driver] Test ignored target-specific options for AMDGPU/NVPTX (PR #79222)

2024-01-24 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,5 @@ +/// Some target-specific options are ignored for GPU, so %clang exits with code 0. +// DEFINE: %{check} = %clang -### -c -mcmodel=medium Artem-B wrote: > Also, what exactly are we checking here? With `-###` CC1 sub-compilations do > not run and

<    1   2   3   4   5   6   7   8   9   10   >