[PATCH] D151363: [NVPTX, CUDA] barrier intrinsics and builtins for sm_90

2023-05-24 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. Herald added subscribers: mattd, gchakrabarti, asavonic, bixia, hiraditya, yaxunl. Herald added a project: All. tra updated this revision to Diff 525307. tra added a comment. tra published this revision for review. tra added a reviewer: jlebar. Herald added subscribers:

[PATCH] D151363: [NVPTX, CUDA] barrier intrinsics and builtins for sm_90

2023-05-24 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 525309. tra added a comment. whitespace fix. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D151363/new/ https://reviews.llvm.org/D151363 Files: clang/include/clang/Basic/BuiltinsNVPTX.def clang/lib/CodeGen/CGBu

[PATCH] D151362: [CUDA] Add CUDA wrappers over clang builtins for sm_90.

2023-05-24 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. Herald added subscribers: mattd, bixia, yaxunl. Herald added a project: All. tra updated this revision to Diff 525338. tra added a comment. tra updated this revision to Diff 525340. tra published this revision for review. tra added a reviewer: jlebar. Herald added a proje

[PATCH] D151359: [CUDA] Relax restrictions on variadics in host-side compilation.

2023-05-25 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rG0ad5d40fa19f: [CUDA] Relax restrictions on variadics in host-side compilation. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D151359/new/

[PATCH] D151361: [CUDA] bump supported CUDA version to 12.1/11.8

2023-05-25 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rGffb635cb2d4e: [CUDA] bump supported CUDA version to 12.1/11.8 (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D151361/new/ https://reviews.l

[PATCH] D151168: [CUDA] plumb through new sm_90-specific builtins.

2023-05-25 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG0a0bae1e9f94: [CUDA] plumb through new sm_90-specific builtins. (authored by tra). Changed prior to commit: https://reviews.llvm.org/D151168?vs=52

[PATCH] D151363: [NVPTX, CUDA] barrier intrinsics and builtins for sm_90

2023-05-25 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rG25708b3df6e3: [NVPTX, CUDA] barrier intrinsics and builtins for sm_90 (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D151363/new/ https://r

[PATCH] D151362: [CUDA] Add CUDA wrappers over clang builtins for sm_90.

2023-05-25 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rG5c082e7e15e3: [CUDA] Add CUDA wrappers over clang builtins for sm_90. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D151362/new/ https://r

[PATCH] D151503: [CUDA] correctly install cuda_wrappers/bits/shared_ptr_base.h

2023-05-25 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. Herald added subscribers: mattd, carlosgalvezp, bixia, yaxunl. Herald added a project: All. tra edited the summary of this revision. tra edited the summary of this revision. tra published this revision for review. tra added reviewers: qiongsiwu1, jlebar. Herald added a re

[PATCH] D151503: [CUDA] correctly install cuda_wrappers/bits/shared_ptr_base.h

2023-05-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Headers/CMakeLists.txt:516 COMPONENT cuda-resource-headers) install( qiongsiwu1 wrote: > Do we need an install target for `${cuda_wrapper_bits_files}` for the > `cuda-resource-headers` component as well? It

[PATCH] D144911: adding bf16 support to NVPTX

2023-05-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:604 def int_nvvm_f # operation # variant : ClangBuiltin, DefaultAttrsIntrinsic<[llvm_i16_ty], [llvm_i16_ty, llvm_i16_ty], tra wrote: > Availability of these new

[PATCH] D144911: adding bf16 support to NVPTX

2023-05-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Here's a rough proof-of-concept patch coalescing i16/f16/bf16 to use the same Int16Regs register class: https://reviews.llvm.org/D151601 The changes are largely mechanical, replacing `%h` -> `%rs` in the tests and eliminating special cases we previously had for Float16Regis

[PATCH] D151503: [CUDA] correctly install cuda_wrappers/bits/shared_ptr_base.h

2023-05-26 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 526227. tra added a comment. Verified that install works correctly with individual component installations: cmake -DCOMPONENT=cuda-resource-headers -P ./cmake_install.cmake cmake -DCOMPONENT=clang-resource-headers -P ./cmake_install.cmake Repository: rG L

[PATCH] D151503: [CUDA] correctly install cuda_wrappers/bits/shared_ptr_base.h

2023-05-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Headers/CMakeLists.txt:516 COMPONENT cuda-resource-headers) install( qiongsiwu1 wrote: > qiongsiwu1 wrote: > > tra wrote: > > > qiongsiwu1 wrote: > > > > Do we need an install target for `${cuda_wrapper_bits_

[PATCH] D151349: [HIP] emit macro `__HIP_NO_IMAGE_SUPPORT`

2023-05-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Basic/Targets/AMDGPU.cpp:248 + auto ISAVer = llvm::AMDGPU::getIsaVersion(Opts.CPU); + HasImage = ISAVer.Major != 9 || ISAVer.Minor != 4; } My usual nit about negations: `!(ISAVer.Major == 9 && ISAVer.Minor == 4)

[PATCH] D151606: [NFC][CLANG] Fix Static Code Analyzer Concerns with bad bit right shift operation in getNVPTXLaneID()

2023-05-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In practice we're guaranteed by GPU architecture that the warp size will always be small enough to fit in 32 bits. Also `log2_32` will never return a value larger than 32. Does this assert help with anything else other than potential undefined behavior? CHANGES SINCE LAS

[PATCH] D151503: [CUDA] correctly install cuda_wrappers/bits/shared_ptr_base.h

2023-05-30 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 526697. tra added a comment. Updated according to comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D151503/new/ https://reviews.llvm.org/D151503 Files: clang/lib/Headers/CMakeLists.txt Index: clang/lib/H

[PATCH] D151503: [CUDA] correctly install cuda_wrappers/bits/shared_ptr_base.h

2023-05-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. @qiongsiwu1 : I've updated the patch. PTAL. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D151503/new/ https://reviews.llvm.org/D151503 ___ cfe-commits mailing list cfe-commits@lists.

[PATCH] D151503: [CUDA] correctly install cuda_wrappers/bits/shared_ptr_base.h

2023-05-30 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG6cdc07a701ee: [CUDA] correctly install cuda_wrappers/bits/shared_ptr_base.h (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE

[PATCH] D150985: [clang] Allow fp in atomic fetch max/min builtins

2023-05-31 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Sema/SemaChecking.cpp:6576-6578 if (!ValType->isFloatingType()) return false; + if (!(AllowedType & AOAVT_FP)) Collapse into a single if statement: `if (!(ValType->isFloatingType() && (Allowed

[PATCH] D150985: [clang] Allow fp in atomic fetch max/min builtins

2023-05-31 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with few more test nits. Comment at: clang/test/Sema/atomic-ops.c:134 int *I, const int *CI, int **P, float *D, struct S *s1, struct S *s2) { __c11_atomic_i

[PATCH] D151839: [LinkerWrapper] Fix static library symbol resolution

2023-05-31 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM in general. Comment at: clang/test/Driver/linker-wrapper-libs.c:27 // // Check that we extract a static library defining an undefined symbol. // How does this test test the functionality of the undefined symbol? E.g. how does it fa

[PATCH] D151839: [LinkerWrapper] Fix static library symbol resolution

2023-05-31 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/linker-wrapper-libs.c:27 // // Check that we extract a static library defining an undefined symbol. // jhuber6 wrote: > tra wrote: > > How does this test test the functionality of the undefined symbol? E

[PATCH] D151904: [clang-repl][CUDA] Add an unit test for interactive CUDA

2023-06-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/unittests/Interpreter/InteractiveCudaTest.cpp:92 + std::unique_ptr Interp = createInterpreter(); + auto Err = Interp->LoadDynamicLibrary("libcudart.so"); + if (Err) { // CUDA runtime is not installed/usable, cannot continue testing

[PATCH] D151904: [clang-repl][CUDA] Add an unit test for interactive CUDA

2023-06-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/unittests/Interpreter/InteractiveCudaTest.cpp:92 + std::unique_ptr Interp = createInterpreter(); + auto Err = Interp->LoadDynamicLibrary("libcudart.so"); + if (Err) { // CUDA runtime is not installed/usable, cannot continue testing

[PATCH] D151876: [NVPTX] Signed char and (unsigned)long overloads of ldg and ldu

2023-06-01 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. I'd change the patch title: - `[NVPTX]` -> `[cuda, NVPTX]` as these are clang changes, not NVPTX back-end. - `overloads ` -> `builtins` Comment at: clang/include/clang/Basic/Built

[PATCH] D144911: adding bf16 support to NVPTX

2023-06-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D144911#4389187 , @manishucsd wrote: > I fail to compile this patch. Please find the compilation error below: > > [build] ./llvm-project/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td:1117:40: > error: Variable not defined: 'hasPTX70'

[PATCH] D151601: [NVPTX] Coalesce register classes for {i16,f16,bf16}, {i32,v2f16,v2bf16}

2023-06-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I've tested the change on a bunch of tensorflow tests and the patch didn't cause any apparent issues. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D151601/new/ https://reviews.llvm.org/D151601

[PATCH] D152027: [CUDA] Update Kepler(sm_3*) support info.

2023-06-02 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. Herald added subscribers: mattd, carlosgalvezp, bixia, yaxunl. Herald added a project: All. tra published this revision for review. tra added a reviewer: jlebar. tra added a comment. Herald added a project: clang. Herald added a subscriber: cfe-commits. Kepler is gone! L

[PATCH] D152027: [CUDA] Update Kepler(sm_3*) support info.

2023-06-02 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG0f49116e261c: [CUDA] Update Kepler(sm_3*) support info. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https:

[PATCH] D149976: adding bf16 support to NVPTX

2023-05-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. This patch appears to *include* the changes in D144911 (e.g. llvm/test/CodeGen/NVPTX/bf16-instructions.ll is added by both patches). Can you update it as an incremental patch that actually excludes it? Repository: rG LLVM Github Monore

[PATCH] D149978: [Clang][NVPTX] Allow passing arguments to the linker while standalone

2023-05-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:594 - // Add paths specified in LIBRARY_PATH environment variable as -L options. - addDirectoryList(Args, CmdArgs, "-L", "LIBRARY_PATH"); - Is removal of this line intentional? =

[PATCH] D149978: [Clang][NVPTX] Allow passing arguments to the linker while standalone

2023-05-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D149978#4323210 , @jhuber6 wrote: > Somewhat annoying, I've discovered that LLVM adds `-Wl,-fcolor-diagnostics` > which obviously isn't supported by `nvlink` so it fails while including this > in `libc`'s CMake. Any clue if there

[PATCH] D149978: [Clang][NVPTX] Allow passing arguments to the linker while standalone

2023-05-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > The latter is a little difficult, The more we dig, the more we want GPU-capable lld. :-) Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:641 + // by nvlink. + if (llvm::any_of(II.getInputArg().getValues(), [](StringRef Arg) { +retu

[PATCH] D149978: [Clang][NVPTX] Allow passing arguments to the linker while standalone

2023-05-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > I've discovered that LLVM adds -Wl,-fcolor-diagnostics Can you tell me where it's done? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D149978/new/ https://reviews.llvm.org/D149978

[PATCH] D149978: [Clang][NVPTX] Allow passing arguments to the linker while standalone

2023-05-05 Thread Artem Belevich via Phabricator via cfe-commits
tra requested changes to this revision. tra added a comment. This revision now requires changes to proceed. In D149978#4323457 , @jhuber6 wrote: > `llvm/cmake/modules/HandleLLVMOptions.cmake:994` I do not think that we should work around this particular

[PATCH] D150136: [Clang] Change default triple to LLVM_HOST_TRIPLE for the CUDA toolchain

2023-05-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. The change may be an improvement, but we may still have a potential issue here. E.g. ideally we may want to be able to cross-compile a CUDA app on a powerpc or ARM build host targeting NVIDIA GPU on a x86 host. So, the compilation tools would need to be found for the powerp

[PATCH] D150136: [Clang] Change default triple to LLVM_HOST_TRIPLE for the CUDA toolchain

2023-05-08 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. > right now all it's used for is HostTriple.isOSWindows() OK. In that case we may want to rename the parameter to `BuildHostTriple` to make it clear which host we have in mind. Repository: rG L

[PATCH] D150718: [CUDA] Relax restrictions on GPU-side variadic functions

2023-05-16 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. Herald added subscribers: mattd, carlosgalvezp, bixia, yaxunl. Herald added a project: All. tra added reviewers: jlebar, yaxunl. tra published this revision for review. Herald added subscribers: cfe-commits, MaskRay. Herald added a project: clang. Allow parsing GPU-side

[PATCH] D150718: [CUDA] Relax restrictions on GPU-side variadic functions

2023-05-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D150718#4348737 , @jlebar wrote: > This seems a little dangerous -- we're saying the frontend will accept this > but we can't generate code for it? What happens if we try to generate code? > Do we get some sort of error, or do

[PATCH] D150718: [CUDA] Relax restrictions on GPU-side variadic functions

2023-05-17 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 523109. tra added a comment. test. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D150718/new/ https://reviews.llvm.org/D150718 Files: clang/lib/Driver/ToolChains/Cuda.cpp Index: clang/lib/Driver/ToolChains/Cuda

[PATCH] D150718: [CUDA] Relax restrictions on GPU-side variadic functions

2023-05-17 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGa825f3754b3c: [CUDA] Relax restrictions on GPU-side variadic functions (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST

[PATCH] D100394: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions

2023-05-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Hi. It looks like CUDA-11+ headers need a variant of cm.async intrinsics which provides the optional src_size argument. I'm planning to add it to the existing intrinsics in NVPTX. It's just a heads-up in case you may have existing uses of them that may need to be updated.

[PATCH] D150820: [NVPTX, CUDA] added optional src_size argument to __nvvm_cp_async*

2023-05-17 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. Herald added subscribers: mattd, gchakrabarti, asavonic, bixia, hiraditya. Herald added a project: All. tra updated this revision to Diff 523216. tra added a comment. tra retitled this revision from "[NVPTX] added src_size argument to __nvvm_cp_async* intrinsics." to "[N

[PATCH] D150820: [NVPTX, CUDA] added optional src_size argument to __nvvm_cp_async*

2023-05-18 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 523426. tra added a comment. Actually connected the Sema check for the optional argument, and added a test to cover it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D150820/new/ https://reviews.llvm.org/D150820 F

[PATCH] D150820: [NVPTX, CUDA] added optional src_size argument to __nvvm_cp_async*

2023-05-18 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 523428. tra added a comment. Cosmetic test cleanup. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D150820/new/ https://reviews.llvm.org/D150820 Files: clang/include/clang/Basic/BuiltinsNVPTX.def clang/include/c

[PATCH] D150820: [NVPTX, CUDA] added optional src_size argument to __nvvm_cp_async*

2023-05-18 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGe7b9c2f00fa0: [NVPTX/CUDA] added an optional src_size argument to __nvvm_cp_async* (authored by tra). Repository: rG LLVM Github Monorepo CHANGES

[PATCH] D150894: [CUDA] provide wrapper functions for new NVCC builtins.

2023-05-18 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. Herald added subscribers: mattd, bixia, yaxunl. Herald added a project: All. tra published this revision for review. tra added reviewers: jlebar, nyalloc. Herald added a project: clang. Herald added a subscriber: cfe-commits. For sm_80 NVCC introduced a handful of builti

[PATCH] D150894: [CUDA] provide wrapper functions for new NVCC builtins.

2023-05-18 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 523466. tra added a comment. Prefix function args with `__`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D150894/new/ https://reviews.llvm.org/D150894 Files: clang/lib/Headers/__clang_cuda_intrinsics.h Index:

[PATCH] D150894: [CUDA] provide wrapper functions for new NVCC builtins.

2023-05-18 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 523472. tra added a comment. Put the wrappers behind __CUDA_ARCH__ >= 800, as these clang builtins are not available on older GPUs. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D150894/new/ https://reviews.llvm.org

[PATCH] D150820: [NVPTX, CUDA] added optional src_size argument to __nvvm_cp_async*

2023-05-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Looks like the extra intrinsic argument broke MLIR. I'll need to figure out how to deal with that. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D150820/new/ https://reviews.llvm.org/D150820 ___

[PATCH] D150820: [NVPTX, CUDA] added optional src_size argument to __nvvm_cp_async*

2023-05-18 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 523566. tra added a comment. Instead of changing existing intrinsic, introduce a new set which takes an additional src_size argument. This should keep existing users working. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm

[PATCH] D150820: [NVPTX, CUDA] added optional src_size argument to __nvvm_cp_async*

2023-05-18 Thread Artem Belevich via Phabricator via cfe-commits
tra requested review of this revision. tra added a comment. PTAL. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D150820/new/ https://reviews.llvm.org/D150820 ___ cfe-commits mailing list cfe-commits@lists

[PATCH] D146389: [clang-repl][CUDA] Initial interactive CUDA support for clang-repl

2023-04-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. lib/CodeGen changes look OK to me. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:6257 + // Device code should not be at top level. + if (LangOpts.CUDA && LangOpts.CUDAIsDevice) +return; Could you give me an example of what exactly w

[PATCH] D149364: [CUDA] Temporarily undefine __noinline__ when including bits/shared_ptr_base.h

2023-04-27 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. Herald added subscribers: mattd, carlosgalvezp, bixia, yaxunl. Herald added a project: All. tra updated this revision to Diff 517656. tra added a comment. tra published this revision for review. tra added reviewers: jlebar, phawkins. Herald added a project: clang. Herald

[PATCH] D149364: [CUDA] Temporarily undefine __noinline__ when including bits/shared_ptr_base.h

2023-04-27 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Fun tidbit: https://github.com/NVIDIA/thrust/issues/1703#issuecomment-1526604000 > Indeed, I believe the nvcc frontend has special handling for that attribute > expansion. clang would need to emulate that "special" handling > > Right. The __attribute__((__attribute__((noinli

[PATCH] D149451: [NVPTX] Add NVPTXCtorDtorLoweringPass to handle global ctors / dtors

2023-05-01 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM overall. Comment at: llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp:58 +((IsCtor ? "__init_array_object_" : "__fini_array_object_") + + F->getName() + "_" + g

[PATCH] D149451: [NVPTX] Add NVPTXCtorDtorLoweringPass to handle global ctors / dtors

2023-05-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp:58 +((IsCtor ? "__init_array_object_" : "__fini_array_object_") + + F->getName() + "_" + getHash(M.getName()) + "_" + + std::to_string(Priority)) jhuber

[PATCH] D149451: [NVPTX] Add NVPTXCtorDtorLoweringPass to handle global ctors / dtors

2023-05-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp:58 +((IsCtor ? "__init_array_object_" : "__fini_array_object_") + + F->getName() + "_" + getHash(M.getName()) + "_" + + std::to_string(Priority)) tra wr

[PATCH] D149364: [CUDA] Temporarily undefine __noinline__ when including bits/shared_ptr_base.h

2023-05-01 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGa50e54fbeb48: [CUDA] Temporarily undefine __noinline__ when including bits/shared_ptr_base.h (authored by tra). Repository: rG LLVM Github Monorep

[PATCH] D149451: [NVPTX] Add NVPTXCtorDtorLoweringPass to handle global ctors / dtors

2023-05-02 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. LGTM. Comment at: llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp:31 +GlobalStr("nvptx-lower-global-ctor-dtor-id", + cl::desc("Override the name of ctor/dtor globals."), cl::init(""), + cl::Hidden);

[PATCH] D152164: [CUDA][HIP] Externalize device var in anonymous namespace

2023-06-05 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/test/CodeGenCUDA/anon-ns.cu:46 + +// COMMON-DAG: @[[STR1:.*]] = {{.*}} c"[[KERN1]]\00" +// COMMON-DAG: @[[STR2:.*]] = {{.*}} c"[[KERN2]]\00"

[PATCH] D144911: adding bf16 support to NVPTX

2023-06-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. FYI https://reviews.llvm.org/D151601 has landed in https://github.com/llvm/llvm-project/commit/dc90f42ea7b4f6d9e643f5ad2ba663eba2f9e421. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D144911/new/ https://reviews.llvm.org/D14491

[PATCH] D144911: adding bf16 support to NVPTX

2023-06-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp:615 setFP16OperationAction(Op, MVT::v2f16, Legal, Expand); - } - - for (const auto &Op : {ISD::FADD, ISD::FMUL, ISD::FSUB, ISD::FMA}) { setBF16OperationAction(Op, MVT::bf16, Legal, Prom

[PATCH] D99201: [HIP] Diagnose unaligned atomic for amdgpu

2023-06-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:7215 +// warnings as errors. +CmdArgs.push_back("-Werror=atomic-alignment"); } Should it be done from `HIPAMDToolChain::addClangWarningOptions` ? That's where Darwin does sim

[PATCH] D144911: adding bf16 support to NVPTX

2023-06-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp:615 setFP16OperationAction(Op, MVT::v2f16, Legal, Expand); - } - - for (const auto &Op : {ISD::FADD, ISD::FMUL, ISD::FSUB, ISD::FMA}) { setBF16OperationAction(Op, MVT::bf16, Legal, Prom

[PATCH] D152391: [Clang] Allow bitcode linking when the input is LLVM-IR

2023-06-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > clang in.bc -Xclang -mlink-builtin-bitcode -Xclang libdevice.10.bc If that's something we intend to expose to the user, should we consider promoting it to a top-level driver option? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.o

[PATCH] D152403: [Clang][CUDA] Disable diagnostics for neon attrs for GPU-side CUDA compilation

2023-06-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Sema/SemaType.cpp:8168 +IsTargetCUDAAndHostARM = +!AuxTI || AuxTI->getTriple().isAArch64() || AuxTI->getTriple().isARM(); + } Should it be `AuxTI && (AuxTI->getTriple().isAArch64() || AuxTI->getTriple

[PATCH] D144911: adding bf16 support to NVPTX

2023-06-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Overall looks good with few minor nits and a couple of questions. Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:604 def int_nvvm_f # operation # variant : ClangBuiltin, DefaultAttrsIntrinsic<[llvm_i16_ty], [llvm_i16_ty, llvm_i1

[PATCH] D152403: [Clang][CUDA] Disable diagnostics for neon attrs for GPU-side CUDA compilation

2023-06-08 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with a nit. Comment at: clang/lib/Sema/SemaType.cpp:8168 +IsTargetCUDAAndHostARM = +!AuxTI || AuxTI->getTriple().isAArch64() || AuxTI->getTriple().isARM(); + } --

[PATCH] D144911: adding bf16 support to NVPTX

2023-06-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp:615 // need to deal with. if (Vector.getSimpleValueType() != MVT::v2f16) return false; This needs to be updated to include v2bf16 Repository: rG LLVM Github Monorepo

[PATCH] D16559: [CUDA] Add -fcuda-allow-variadic-functions.

2023-06-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D16559#4410067 , @garymm wrote: > Could you please add this to the documentation? > Could this be made the default? It seems like nvcc does this by default. Clang already does that, though we only allow variadic functions that don'

[PATCH] D144911: adding bf16 support to NVPTX

2023-06-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Almost there. Just few cosmetic nits remaining. Comment at: llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.cpp:64-69 + case 9: OS << "%h"; break; case 8: + case 10: OS << "%hh"; tra wrote: > Looks like I've forgot t

[PATCH] D144911: adding bf16 support to NVPTX

2023-06-13 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with few nits. Thank you for your patience with revising the patch. Comment at: llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp:629-631 + const bool IsBFP16FP16x2NegAvailable = S

[PATCH] D144911: adding bf16 support to NVPTX

2023-06-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: llvm/lib/Target/NVPTX/NVPTXIntrinsics.td:1271-1287 -def : Pat<(int_nvvm_ff2f16x2_rn Float32Regs:$a, Float32Regs:$b), - (CVT_f16x2_f32 Float32Regs:$a, Float32Regs:$b, CvtRN)>; -def : Pat<(int_nvvm_ff2f16x2_rn_relu Float32Regs:$a, Flo

[PATCH] D151361: [CUDA] bump supported CUDA version to 12.1/11.8

2023-06-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/docs/ReleaseNotes.rst:590 +- Clang now supports CUDA SDK up to 12.1 bader wrote: > @tra, could you update llvm/docs/CompileCudaWithLLVM.rst as well, please? Done in d028188412fa54774e2c60e21f0929a0fede

[PATCH] D139045: [HIP] support --offload-arch=native

2022-12-12 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Driver/Driver.cpp:3058 // Collect all offload arch parameters, removing duplicates. + const StringRef NativeArchStr = "native"; std::se

[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2022-12-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM overall. So, essentially the patch refactors what we've already been doing for OpenMP and made it usable manually, which will be useful for things like GPU-side libc tests. Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:631-632 + + const char *Ex

[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2022-12-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D140158#3999716 , @jhuber6 wrote: > I just realized the method of copying the `.o` to a `.cubin` doesn't work if > the link step is done in the same compilation because it doesn't exist yet. > To fix this I could either make the

[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2022-12-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D140158#3999810 , @JonChesterfield wrote: > I don't think we should assume they want implicit behaviour from other > programming models thrown in. Agreed. Also, removing things is often surprisingly hard. Let's keep things simlp

[PATCH] D140226: [NVPTX] Introduce attribute to mark kernels without a language mode

2022-12-16 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/TargetInfo.cpp:7362 + if (FD->hasAttr()) { +addNVVMMetadata(F, "kernel", 1); + } How does AMDGPU track kernels? It may be a good opportunity to stop using metadata for this if we can use a better sui

[PATCH] D140226: [NVPTX] Introduce attribute to mark kernels without a language mode

2022-12-16 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/TargetInfo.cpp:7362 + if (FD->hasAttr()) { +addNVVMMetadata(F, "kernel", 1); + } jhuber6 wrote: > tra wrote: > > How does AMDGPU track kernels? It may be a good opportunity to stop using > > metadata

[PATCH] D140226: [NVPTX] Introduce attribute to mark kernels without a language mode

2022-12-16 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM. General question -- what happens now that the `global` and `launch_bounds` are target-specific as opposed to language-specific, if they happen to be used in a C++ compilation targeting `x86`? I assume they will still be ignored, right? Comment at:

[PATCH] D28145: [OpenMP] Basic support for a parallel directive in a target region on an NVPTX device.

2017-01-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:511-516 +// Activate workers. +syncCTAThreads(CGF); + +// Barrier at end of parallel region. +syncCTAThreads(CGF); + Are two back-to-back syncCTAThreads() intentional or d

[PATCH] D28145: [OpenMP] Basic support for a parallel directive in a target region on an NVPTX device.

2017-01-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:511-516 +// Activate workers. +syncCTAThreads(CGF); + +// Barrier at end of parallel region. +syncCTAThreads(CGF); + arpith-jacob wrote: > tra wrote: > > Are two back-to-b

[PATCH] D28301: [CUDA] Pre-include sm_60 and sm_61 headers.

2017-01-04 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. tra added a subscriber: cfe-commits. CUDA-8.0 comes with new headers which nvcc pre-includes via cuda_runtime.h Clang now makes them available as well. https://reviews.llvm.org/D28301 Files: lib/Headers/__clang_cuda_runtime_wrapper.h

[PATCH] D28301: [CUDA] Pre-include sm_60 and sm_61 headers.

2017-01-04 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rL290982: [CUDA] Pre-include sm_60 and sm_61 headers. (authored by tra). Changed prior to commit: https://reviews.llvm.org/D28301?vs=83071&id=83083#toc Repository: rL LLVM https://reviews.llvm.org/D28

[PATCH] D28320: [Driver] Driver changes to support CUDA compilation on Windows.

2017-01-04 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Driver/ToolChains.cpp:1819 Args.getLastArgValue(options::OPT_cuda_path_EQ)); - else { + else if (HostTriple.isOSLinux() || HostTriple.isMacOSX(

[PATCH] D28324: [CUDA] Don't define functions that the CUDA headers themselves define on Windows.

2017-01-04 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Headers/__clang_cuda_cmath.h:76 + +// For inscrutible reasons, the CUDA headers define these functions for us on +// Windows. inscrut__a_

[PATCH] D28793: [NVPTX] Auto-upgrade some NVPTX intrinsics to LLVM target-generic code.

2017-01-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Headers/__clang_cuda_runtime_wrapper.h:124 +} +inline long long __nvvm_max_i(long long __a, long long __b) { + return __a >= __b ? __a : __b; Shouldn't that be `_ll` ? That was the name of the max of long long arg

[PATCH] D28793: [NVPTX] Auto-upgrade some NVPTX intrinsics to LLVM target-generic code.

2017-01-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Now that we've removed ton of @llvm.nvvm intrinsics from .td files, we have no easy way to tell that we still do support these intrinsics (mostly for the sake of libdevice?) by upgrading them. Perhaps we should add the list of such intrinsics and what happens to them in a c

[PATCH] D28793: [NVPTX] Auto-upgrade some NVPTX intrinsics to LLVM target-generic code.

2017-01-20 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. There are couple of bits to be deleted. LGTM otherwise. Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:671-672 def int_nvvm_h2f : GCCBuiltin<"__nvvm_h2f">, Intrinsi

[PATCH] D41788: [DeclPrinter] Fix two cases that crash clang -ast-print.

2018-01-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. @arphaman: ping. https://reviews.llvm.org/D41788 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D41788: [DeclPrinter] Fix two cases that crash clang -ast-print.

2018-01-16 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. @bkramer Can you take a look at the patch? https://reviews.llvm.org/D41788 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D41788: [DeclPrinter] Fix two cases that crash clang -ast-print.

2018-01-17 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rL322742: [DeclPrinter] Fix two cases that crash clang -ast-print. (authored by tra, committed by ). Herald added a subscriber: llvm-commits. Changed prior to commit: https://reviews.llvm.org/D41788?vs=12

[PATCH] D42319: [CUDA] CUDA has no device-side library builtins.

2018-01-19 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. Herald added a subscriber: sanjoy. We should (almost) never consider a device-side declaration to match a builtin. If we do, the un-inlined device-side functions provided by CUDA headers that ship with clang may be ignored. We may end up em

[PATCH] D42319: [CUDA] CUDA has no device-side library builtins.

2018-01-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In https://reviews.llvm.org/D42319#983377, @jlebar wrote: > How does this affect e.g. calling memcpy()? There isn't a standard library > implementation of this on nvptx, but we do want calls to memcpy() to be > lowered to llvm.memcpy so that they can be optimized. We imp

[PATCH] D42319: [CUDA] CUDA has no device-side library builtins.

2018-01-23 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rL323239: [CUDA] CUDA has no device-side library builtins. (authored by tra, committed by ). Herald added a subscriber: llvm-commits. Changed prior to commit: https://reviews.llvm.org/D42319?vs=130697&id=

[PATCH] D42452: [CUDA] Disable PGO and coverage instrumentation in NVPTX.

2018-01-23 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. Herald added a subscriber: sanjoy. NVPTX does not have runtime support necessary for profiling to work and even call arc collection is prohibitively expensive. Furthermore, there's no easy way to collect the samples. NVPTX also does not supp

[PATCH] D42452: [CUDA] Disable PGO and coverage instrumentation in NVPTX.

2018-01-24 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rC323345: [CUDA] Disable PGO and coverage instrumentation in NVPTX. (authored by tra, committed by ). Changed prior to commit: https://reviews.llvm.org/D42452?vs=131170&id=131298#toc Repository: rC Cla

<    11   12   13   14   15   16   17   18   19   >