[PATCH] D130308: [clang] extend getCommonSugaredType to merge sugar nodes

2022-09-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. The culprit appears to be D111509 . Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D130308/new/ https://reviews.llvm.org/D130308 ___ cfe-commits maili

[PATCH] D111509: [clang] use getCommonSugar in an assortment of places

2022-09-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. It appears that this change may result in clang crashing (e.g. it broke CUDA compilation: https://lab.llvm.org/buildbot/#/builders/55/builds/35047/steps/3/logs/stdio). Reproducer: class { typedef __complex__ a; operator=(double b){c *= b} a c $bin/clang++ -c

[PATCH] D133522: [clang] Fix Complex x Float x Int conversions so it handles type sugar

2022-09-08 Thread Artem Belevich via Phabricator via cfe-commits
tra resigned from this revision. tra added a comment. I can confirm that this patch does address the compiler crash seen on CUDA build bots. I'm not familiar enough with the code to review it otherwise. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/

[PATCH] D133705: [HIP] Fix unbundling archive

2022-09-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > Archives passed by -l: should not be prefixed with > prefix lib and appended with '.a', but still need to be prefixed with > paths in -L options. > Archives passed as input files should not be prefixed > or appended with anything. I'm not sure I understand the requiremen

[PATCH] D133804: Cuda Check for ignored return errors from api calls to cuda

2022-09-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang-tools-extra/test/clang-tidy/check_clang_tidy.py:121 +# Tests should not rely on a certain cuda device being available on the machine, +# or a certain version of it The comment is incorrect. These flags have

[PATCH] D133956: git push Cuda Check for ignored errors after calling a CUDA kernel

2022-09-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > Our linter cannot easily check whether the error was reset It can not in principle. Many CUDA errors are 'sticky' and can only be cleared by resetting the GPU or exiting the application and the former is virtually never used beyond toy examples (resetting a GPU would clea

[PATCH] D128752: [CUDA] Stop adding CUDA features twice

2022-06-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > we no longer will have a cached CUDA installation so we will usually create > it twice. Does that result in extra output in case we find an unexpected CUDA version, or when compiler is run with `-v` ? We may want to wrap installation detector creation into some sort of s

[PATCH] D128752: [CUDA] Stop adding CUDA features twice

2022-06-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D128752#3616579 , @jhuber6 wrote: > In D128752#3616553 , @tra wrote: > >>> we no longer will have a cached CUDA installation so we will usually create >>> it twice. >> >> Does that result

[PATCH] D128752: [CUDA] Stop adding CUDA features twice

2022-06-28 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Do we have tests that verify `-target-feature` arguments? It may be worth adding a test case there checking for redundant features. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D128752: [CUDA] Stop adding CUDA features twice

2022-06-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D128752#3616837 , @jhuber6 wrote: > In D128752#3616831 , @tra wrote: > >> Do we have tests that verify `-target-feature` arguments? It may be worth >> adding a test case there checking for

[PATCH] D128914: [HIP] Add support for handling HIP in the linker wrapper

2022-06-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Syntax/style looks OK to me with a few nits. Comment at: clang/test/Driver/linker-wrapper.c:109 // RUN: clang-offload-packager -o %t-lib.out \ // RUN: --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \ ---

[PATCH] D122846: [CUDA] Don't call inferCUDATargetForImplicitSpecialMember too early.

2022-03-31 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: yaxunl. Herald added a subscriber: bixia. Herald added a project: All. tra requested review of this revision. Herald added a project: clang. Otherwise we may crash because the special member has not been sufficiently set up yet. Fixes #54537 Rep

[PATCH] D122846: [CUDA] Don't call inferCUDATargetForImplicitSpecialMember too early.

2022-03-31 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGfe528e721633: [CUDA] Don't call inferCUDATargetForImplicitSpecialMember too early. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES

[PATCH] D130800: [clang][Headers] Avoid compiler warnings in builtin headers

2022-08-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Headers/__clang_cuda_builtin_vars.h:37 -#if __cplusplus >= 201103L +#if defined(__cplusplus) && __cplusplus >= 201103L #define __DELETE =delete Are there actual use cases where CUDA headers are used from a non-C

[PATCH] D130800: [clang][Headers] Avoid compiler warnings in builtin headers

2022-08-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: yaxunl. tra added inline comments. Comment at: clang/lib/Headers/__clang_cuda_builtin_vars.h:14 +#ifndef __cplusplus +#error CUDA header must be used from C++ +#endif Nit: the error message sounds a bit odd. Perhaps rephrase it as `This CU

[PATCH] D131265: Fixed sm version for .and bmma operator.

2022-08-05 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Good catch. LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D131265/new/ https://reviews.llvm.org/D131265

[PATCH] D128441: [CUDA] Do not embed a fatbinary when using the new driver

2022-08-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. This change breaks `clang++ --cuda-device-only` compilation. Clang does not create any output in this case. Reverting the change fixes the problem. Reproducible with: echo '__global__ void k(){}' | bin/clang++ --offload-arch=sm_70 -x cuda - --cuda-device-only -v -c -

[PATCH] D128441: [CUDA] Do not embed a fatbinary when using the new driver

2022-08-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D128441#3702816 , @jhuber6 wrote: > Is it spitting it out as `foo123.cubin` instead? That's the output name it passes to `ptxas`, but it's treated as a temporary file and is removed at the end, so the user gets nothing. Reposit

[PATCH] D131265: Fixed sm version for .and bmma operator.

2022-08-05 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG3e0e5568a6a8: [CUDA] Fixed sm version constrain for __bmma_m8n8k128_mma_and_popc_b1. (authored by JackAKirk, committed by tra). Changed prior to com

[PATCH] D131265: Fixed sm version for .and bmma operator.

2022-08-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Looks like the tests needed to be updated (and I've found one bug which explains how we've missed this). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D131265/new/ https://reviews.llvm.org/D131265 _

[PATCH] D131278: [CUDA] Fix output name being replaced in device-only mode

2022-08-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/cuda-bindings.cu:148 +// RUN: | FileCheck -check-prefix=D_ONLY %s +// D_ONLY: "foo.o" Ideally we want to test that the output file actually survives the compilation. It's traky to do for in-tree tests as w

[PATCH] D131278: [CUDA] Fix output name being replaced in device-only mode

2022-08-05 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. I can confirm that this patch fixes the issue. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D131278/new/ https://reviews.llvm.org/D131278 ___

[PATCH] D131278: [CUDA] Fix output name being replaced in device-only mode

2022-08-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/cuda-bindings.cu:148 +// RUN: | FileCheck -check-prefix=D_ONLY %s +// D_ONLY: "foo.o" jhuber6 wrote: > tra wrote: > > Ideally we want to test that the output file actually survives the > > compilation. It'

[PATCH] D105384: [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.

2021-07-15 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 359047. tra marked an inline comment as done. tra added a comment. rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D105384/new/ https://reviews.llvm.org/D105384 Files: clang/include/clang/Basic/BuiltinsNVPTX

[PATCH] D105384: [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.

2021-07-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D105384#2879308 , @steffenlarsen wrote: > LGTM! Are the test failures related to this, you reckon? AFAICT, no, the test failures don't seem to be related. It appears that the test runs rarely succeed in general. They are mostly

[PATCH] D105384: [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.

2021-07-15 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGd774b4aa5eac: [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE

[PATCH] D106401: [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA.

2021-07-20 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: nikic, fhahn. Herald added subscribers: bixia, hiraditya, yaxunl. tra requested review of this revision. Herald added projects: clang, LLVM. Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact

[PATCH] D106401: [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA.

2021-07-20 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 360293. tra edited the summary of this revision. tra added a comment. Fixed the option name. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106401/new/ https://reviews.llvm.org/D106401 Files: clang/lib/Driver/Too

[PATCH] D106315: [HIP] Preserve ASAN bitcode library functions

2021-07-22 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM in general. One question -- does it have to be a function calling other functions just for the sake of preserving them? Can it be a flat array of pointers to the functions you need to keep arou

[PATCH] D106315: [HIP] Preserve ASAN bitcode library functions

2021-07-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D106315#2899928 , @yaxunl wrote: > Yes that's possible. However that would require FE to know these functions > and declare them, whereas the current approach leave the concern to the > device library. I was thinking of just add

[PATCH] D106401: [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA.

2021-07-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D106401#2903114 , @nikic wrote: > Would the variant of the original patch at D106769 > be sufficient for your purposes? Or are > you also interested in the optimizations that introduce new memse

[PATCH] D107054: [Clang][CUDA] Add descriptors, mappings, and features for missing CUDA and PTX versions

2021-07-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:821 #pragma pop_macro("PTX71") #pragma pop_macro("PTX72") You need to pop new macros here. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.ll

[PATCH] D120132: [HIP] Fix HIP include path

2022-03-09 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. I still don't quite understand your reluctance to use `AddClangSystemIncludeArgs` for adding HIP path, but if this change works for HIP, I'm fine with it. Also, thank you for fixing the issue with

[PATCH] D121765: [CUDA][HIP] Fix hostness check with -fopenmp

2022-03-16 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: rsmith. tra added a comment. @rsmith - this touches generic Sema functionality and could use your input. Overall, the patch looks OK to me. Comment at: clang/include/clang/Sema/Sema.h:3327-3328 + /// a pointer to the function or lambda decl for the fun

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-03-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/Cuda.h:105-107 +constexpr CudaArch DefaultCudaArch = CudaArch::SM_35; +constexpr CudaArch DefaultHIPArch = CudaArch::GFX803; + Nit: those could be just enum values. ``` ... LAST, DefaultCudaAr

[PATCH] D122069: [Clang] Add binary format for bundling offloading metadata

2022-03-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > I decided to use a custom format rather than using an existing format (ELF, > JSON, etc) because of the specialty use-case of this. Will we ever need to process the files with tools built with a different LLVM version? E.g clang and lld may be built separately from differ

[PATCH] D122069: [Clang] Add binary format for bundling offloading metadata

2022-03-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D122069#3397560 , @jhuber6 wrote: > I was definitely thinking about a lot of alternatives to making yet another > on-disk binary format. I had a few discussions with @JonChesterfield on the > subject. There were two things that p

[PATCH] D122734: [HIP] Fix mangling number for local struct

2022-04-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added subscribers: rnk, rsmith. tra added a comment. @rnk, @rsmith -- PTAL. Changing mangling for CXXABI on Windows is way out of my comfort zone. > This patch uses Itanium mangling number for structs in HIP host compilation > on Windows to fix this issue. It does not appear to be HIP-sp

[PATCH] D122734: [HIP] Fix mangling number for local struct

2022-04-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D122734#3435086 , @yaxunl wrote: > This patch takes a similar approach as https://reviews.llvm.org/D69322 has > done for lambda. When doing host compilation for CUDA/HIP on Windows with > MSVC toolchain, mangling number of lambda

[PATCH] D123353: [CUDA][HIP] Externalize kernels in anonymous name space

2022-04-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/CodeGenCUDA/kernel-in-anon-ns.cu:13 + +// CHECK: define weak_odr {{.*}}void @[[KERN:_ZN12_GLOBAL__N_16kernelEv\.anon\..*]]( +// CHECK: @[[STR:.*]] = {{.*}} c"[[KERN]]\00" Will the externalized names be uniquified

[PATCH] D123353: [CUDA][HIP] Externalize kernels in anonymous name space

2022-04-08 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM overall. Comment at: clang/test/CodeGenCUDA/kernel-in-anon-ns.cu:13 + +// CHECK: define weak_odr {{.*}}void @[[KERN:_ZN12_GLOBAL__N_16kernelEv\.anon\..*]]( +// CHECK: @[[STR:

[PATCH] D123441: [CUDA][HIP] Fix host used external kernel in archive

2022-04-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM in principle. This will keep around the GPU code we do need. That said, it seems to be a rather blunt hammer. I think we'll end up linking almost everything in an archive into the final executable as we'll likely have a host-visible symbol in most of the GPU objects (e

[PATCH] D123471: [CUDA] Create offloading entries when using the new driver

2022-04-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I've mentioned in D123441 that it would be useful to have a list of GPU-side symbols needed by the host and this offload info is pretty close to what we need. The only remaining feature is being able to extract them by external tool, so we

[PATCH] D123441: [CUDA][HIP] Fix host used external kernel in archive

2022-04-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > This approach will only link in kernels and device variables used by host code In the absence of the explicit reference info from the host side, GPU-side linker must link all objects with symbols that **may** be used by the host. E.g if we have a library with three object

[PATCH] D123441: [CUDA][HIP] Fix host used external kernel in archive

2022-04-12 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. In D123441#3446499 , @yaxunl wrote: > You are talking about a use case that actually needs --whole-archive option. > If the main TU does not reference some

[PATCH] D122734: [CUDA][HIP] Fix mangling number for local struct

2022-04-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/AST/ASTContext.cpp:11760-11763 + auto Cutoff = [](unsigned V) { return V > 1 ? V : 1; }; + if (CUDANameMangleCtx.MangleDeviceNameInHostCompilation) +return Cutoff(Res >> 16); + return Cutoff(Res & 0x); I

[PATCH] D123812: [CUDA] Add wrapper code generation for registering CUDA images

2022-04-14 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/tools/clang-linker-wrapper/OffloadWrapper.cpp:351 +/// required for texture / surface / managed variables. +Function *createRegisterGlobalsFunction(Module &M) { + LLVMContext &C = M.getContext(); Do you think ge

[PATCH] D123812: [CUDA] Add wrapper code generation for registering CUDA images

2022-04-14 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/tools/clang-linker-wrapper/OffloadWrapper.cpp:351 +/// required for texture / surface / managed variables. +Function *createRegisterGlobalsFunction(Module &M) { + LLVMContext &C = M.getContext(); jhuber6 wrote:

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-04-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Thank you for adding the compilation pipeline tests. LGTM overall. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6223-6226 if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false)) CmdArgs.push_back("-fgpu-rdc"); +if (Ar

[PATCH] D122734: [CUDA][HIP] Fix mangling number for local struct

2022-04-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/AST/ASTContext.cpp:11760-11763 + auto Cutoff = [](unsigned V) { return V > 1 ? V : 1; }; + if (CUDANameMangleCtx.MangleDeviceNameInHostCompilation) +return Cutoff(Res >> 16); + return Cutoff(Res & 0x); r

[PATCH] D124189: [CUDA][HIP] Externalize kernels with internal linkage

2022-04-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/AST/ASTContext.cpp:11322 + } else if (Context.getLangOpts().CUDA && Context.getLangOpts().CUDAIsDevice && + !IgnoreCUDAGlobalAttr) { // Device-side functions with __global__ attribute must always be -

[PATCH] D124189: [CUDA][HIP] Externalize kernels with internal linkage

2022-04-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/AST/ASTContext.cpp:12300 + (D->hasAttr() && + GetGVALinkageForFunction(cast(D), + /*IgnoreCUDAGlobalAttr=*/true) == yaxunl wrote: > tra wrote: > > Perhaps we don't

[PATCH] D124189: [CUDA][HIP] Externalize kernels with internal linkage

2022-04-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/AST/ASTContext.cpp:12300 + (D->hasAttr() && + GetGVALinkageForFunction(cast(D), + /*IgnoreCUDAGlobalAttr=*/true) == yaxunl wrote: > tra wrote: > > yaxunl wrote: >

[PATCH] D118949: [HIP] Support code object v5

2022-02-03 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Few nits, LGTM in general. Comment at: clang/lib/Driver/ToolChains/ROCm.h:27 +struct DeviceLibABIVersion { + unsigned Value = 0; + DeviceLibABIVersion(unsigned V) : Value(V) {} -

[PATCH] D118977: [NVPTX] Add more FMA intriniscs/builtins

2022-02-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > They all require PTX 7.0, SM_80. According to https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions-fma only `fma.relu` and `bf16*` variants require ptx70/sm80: PTX ISA Notes Introduced in PTX ISA version 4.2.

[PATCH] D119026: [HIP] Emit amdgpu_code_object_version module flag

2022-02-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:575 +// times 100. +if (getTarget().getTargetOpts().CodeObjectVersion != "none") { + unsigned CodeObjVer; When will it ever be set to `none`? Does the new option parser enforc

[PATCH] D119026: [HIP] Emit amdgpu_code_object_version module flag

2022-02-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:575 +// times 100. +if (getTarget().getTargetOpts().CodeObjectVersion != "none") { + unsigned CodeObjVer; yaxunl wrote: > tra wrote: > > When will it ever be set to `none`? Do

[PATCH] D119026: [HIP] Emit amdgpu_code_object_version module flag

2022-02-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:3445 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group, - HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">, - MetaVarName<"">, Values<"2,3,4,5">; +

[PATCH] D119026: [HIP] Emit amdgpu_code_object_version module flag

2022-02-07 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:1166 CmdArgs.insert(CmdArgs.begin() + 1, "-mllvm"); +// -cc1as does not need -mcode-object-version option. +if (!

[PATCH] D119157: [NVPTX] Add ex2 f16 support

2022-02-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Is the patch is ready for review? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D119157/new/ https://reviews.llvm.org/D119157 ___ cfe-commits mailing list cfe-commits@lists.llvm.org h

[PATCH] D118977: [NVPTX] Add more FMA intriniscs/builtins

2022-02-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D118977#3299974 , @jchlanda wrote: >> Target ISA Notes >> Requires sm_53 or higher. I think we do need this constraint applied to the new builtins, too. Right now nothing stops using them on a GPU where they do not exist and that

[PATCH] D118977: [NVPTX] Add more FMA intriniscs/builtins

2022-02-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: llvm/lib/Target/NVPTX/NVPTXIntrinsics.td:937 +class FMA_TUPLE Preds = [hasPTX70, hasSM80]> { + string Variant = V; I think the default should be the most useful/common and the least surprising value. I'd argue that in thi

[PATCH] D117887: [NVPTX] Expose float tys min, max, abs, neg as builtins

2022-02-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp:162 SimplifyAction(Instruction::BinaryOps BinaryOp, FtzRequirementTy FtzReq) : BinaryOp(BinaryOp), FtzRequirement(FtzReq) {} The new 3-argument constructor a

[PATCH] D117887: [NVPTX] Expose float tys min, max, abs, neg as builtins

2022-02-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp:162 SimplifyAction(Instruction::BinaryOps BinaryOp, FtzRequirementTy FtzReq) : BinaryOp(BinaryOp), FtzRequirement(FtzReq) {} jchlanda wrote: > tra wrote: > >

[PATCH] D119615: [CUDA][HIP] Do not promote constexpr var with non-constant initializer

2022-02-12 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Sema/SemaCUDA.cpp:148-150 + if ((Var->isConstexpr() || Var->getType().isConstQualified()) && + Var->hasAttr() && !hasExplicitAttr(Var)) -

[PATCH] D120070: [HIP] Support linking archive of bundled bitcode

2022-02-17 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Comment at: clang/test/Driver/hip-link-bundle-archive.hip:3 + +// RUN: touch %T/libhipBundled.a + Is this file necessary? `clang -###` should not need the fi

[PATCH] D118977: [NVPTX] Add more FMA intriniscs/builtins

2022-02-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D118977#3329146 , @jchlanda wrote: > @tra I've fixed the test failure (`math-intrins.ll`) the rest seems to be > unrelated timeouts, Thank you. > would you be able to merge those patches in, as I don't have the commit > access

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-04-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6223-6224 if (IsCuda || IsHIP) { -if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false)) +if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false) || +

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-04-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6223-6224 if (IsCuda || IsHIP) { -if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false)) +if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false) || +

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-04-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6223-6224 if (IsCuda || IsHIP) { -if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false)) +if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false) || +

[PATCH] D124189: [CUDA][HIP] Externalize kernels with internal linkage

2022-04-22 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM overal, with few test nits. Comment at: clang/lib/AST/ASTContext.cpp:12300 + (D->hasAttr() && + GetGVALinkageForFunction(cast(D), +

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-04-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: rnk. tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6225 if (IsCuda || IsHIP) { -if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false)) +if (Args.hasArg(options::OPT_fno_gpu_rdc) && IsCudaDevice &

[PATCH] D124292: [OpenMP] Use CUDA's non-RDC mode when LTO has whole program visibility

2022-04-22 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with a minor test nit. Comment at: clang/test/Driver/linker-wrapper.c:41 -// LTO: ptxas{{.*}}-m64 -o {{.*}}.cubin -O2 --gpu-name sm_70 -c {{.*}}.s -// LTO: nvlink{{.*}}-m64

[PATCH] D124396: [HIP] Fix diag msg about sanitizer

2022-04-25 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Wording nit. LGTM otherwise. Comment at: clang/include/clang/Basic/DiagnosticDriverKinds.td:113-114 def warn_drv_unsupported_option_for_offload_arch_req_feature : Warning< - "ign

[PATCH] D124382: [Clang] Recognize target address space in superset calculation

2022-04-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/AST/Type.h:486 + bool IsSYCLOrOpenCL = false) { +if (ASMap) { + bool IsATargetAS = false; If A and B are both target AS, we fall through to the code which is

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-04-26 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. In D120272#3475155 , @jhuber6 wrote: > Changing this to simply require that the user manually passes `-fgpu-rdc` in > order to use the new driver. I think t

[PATCH] D124220: [OpenMP] Add options to only compile the host or device when offloading

2022-04-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:2535-2538 +def foffload_device_only : Flag<["-"], "foffload-device-only">, + HelpText<"Only compile for the offloading device.">; +def foffload_host_only : Flag<["-"], "foffload-host-only">, + HelpText

[PATCH] D124220: [OpenMP] Add options to only compile the host or device when offloading

2022-04-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:2535-2538 +def foffload_device_only : Flag<["-"], "foffload-device-only">, + HelpText<"Only compile for the offloading device.">; +def foffload_host_only : Flag<["-"], "foffload-host-only">, + HelpText

[PATCH] D124220: [OpenMP] Add options to only compile the host or device when offloading

2022-04-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:2535-2538 +def foffload_device_only : Flag<["-"], "foffload-device-only">, + HelpText<"Only compile for the offloading device.">; +def foffload_host_only : Flag<["-"], "foffload-host-only">, + HelpText

[PATCH] D124220: [OpenMP] Add options to only compile the host or device when offloading

2022-04-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:2527 HelpText<"Use the static host OpenMP runtime while linking.">; +def offload_device_only : Flag<["-"], "offload-device-only">, + HelpText<"Only compile for the offloading device.">; -

[PATCH] D124545: [HIP] Add HIP runtime library arguments for linker

2022-04-27 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Driver/ToolChains/Linux.cpp:681 + ArgStringList &CmdArgs) const { + CmdArgs.push_back( + Args.MakeArgString(StringRe

[PATCH] D124220: [OpenMP] Add options to only compile the host or device when offloading

2022-04-27 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:2527 HelpText<"Use the static host OpenMP runtime while linking.">; +def offload_device_only : Flag<["-"], "offload-device-only">, + HelpText<"Only compile for the offloading device.">; -

[PATCH] D124220: [OpenMP] Add options to only compile the host or device when offloading

2022-04-27 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with a minor nit. Comment at: clang/lib/Driver/Driver.cpp:4230 + // compile action to embed it in. If preprocessing only ignore embedding. + if ((!isa(HostAction) && PL.back

[PATCH] D123325: [Clang] Make enabling the new driver more generic

2022-04-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:2535-2538 +def foffload_new_driver : Flag<["-"], "foffload-new-driver">, Flags<[CC1Option]>, Group, + HelpText<"Use the new driver for offloading compilation.">; +def fno_offload_new_driver : Flag<["-"

[PATCH] D123325: [Clang] Make enabling the new driver more generic

2022-04-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:2535-2538 +def foffload_new_driver : Flag<["-"], "foffload-new-driver">, Flags<[CC1Option]>, Group, + HelpText<"Use the new driver for offloading compilation.">; +def fno_offload_new_driver : Flag<["-"

[PATCH] D124721: [OpenMP] Allow compiling multiple target architectures with OpenMP

2022-05-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Comment at: clang/test/Driver/openmp-offload-gpu-new.c:56 + +// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp \ +// RUN: -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -Xopenmp-target=nvptx64-nvidia-cuda --

[PATCH] D124842: [NFC][CUDA][HIP] rework mangling number for aux target

2022-05-03 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Comment at: clang/lib/AST/ASTContext.cpp:11770-11778 + if (!LangOpts.CUDA || LangOpts.CUDAIsDevice) { +assert(!ForAuxTarget && "Only CUDA/HIP host compilation supports m

[PATCH] D124721: [OpenMP] Allow compiling multiple target architectures with OpenMP

2022-05-05 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM in general. Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:313 + if (BoundArch.empty()) +checkSystemForAMDGPU(Args, *this, Arch); DAL->AddJoinedA

[PATCH] D125050: [OpenMP] Try to Infer target triples using the offloading architecture

2022-05-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Just a drive-by style review. I didn't look at the functionality of the changes much. Comment at: clang/lib/Driver/Driver.cpp:788-789 + options::OPT_fno_openmp, false) && + (C.getInputArgs().hasArg(options::OPT_fopenmp_ta

[PATCH] D123471: [CUDA] Create offloading entries when using the new driver

2022-05-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > Type struct __tgt_offload_entry { > > void*addr; // Pointer to the offload entry info. > // (function or global) > char*name; // Name of the function or global. > size_t size; // Size of the entry info (0 if it a function)

[PATCH] D124866: [CUDA][HIP] support __noinline__ as keyword

2022-05-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added a reviewer: rsmith. tra added a subscriber: rsmith. tra added a comment. > I don't know how language extensions come about in CUDA or HIP -- is there an > appropriate standards body (or something similar) that's aware of this > extension and supports it? Summoning @rsmith for his lang

[PATCH] D124382: [Clang] Recognize target address space in superset calculation

2022-05-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/AST/Type.h:508 + static_cast(LangAS::FirstTargetAddressSpace)); +// When dealing with target AS return true if: +// * A is equal to B, or jchlanda wrote: > tra wrote: > > Is t

[PATCH] D123471: [CUDA] Create offloading entries when using the new driver

2022-05-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D123471#3497224 , @jhuber6 wrote: > It's a little tough, I chose that format because it's exactly the same as we > use with OpenMP with a few different flags. I wish whoever initially designed > the struct made the ``reserved`` f

[PATCH] D124866: [CUDA][HIP] support __noinline__ as keyword

2022-05-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > CUDA/HIP do not have language spec. Well. It's not completely true. CUDA programming guide does serve as the de-facto spec for CUDA. It's far from perfect, but it does mention `__noinline__` and `__forceinline__` as function qualifiers: https://docs.nvidia.com/cuda/cuda-

[PATCH] D123471: [CUDA] Create offloading entries when using the new driver

2022-05-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CGCUDARuntime.h:56-70 +enum OffloadRegionEntryKindFlag : uint32_t { + /// Mark the region entry as a kernel. + OffloadRegionKernelEntry = 0x0, +}; + +/// The kind flag of the global variable entry. +

[PATCH] D123471: [CUDA] Create offloading entries when using the new driver

2022-05-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CGCUDARuntime.h:56-70 +enum OffloadRegionEntryKindFlag : uint32_t { + /// Mark the region entry as a kernel. + OffloadRegionKernelEntry = 0x0, +}; + +/// The kind flag of the global variable entry. +

[PATCH] D123471: [CUDA] Create offloading entries when using the new driver

2022-05-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CGCUDARuntime.h:58 + /// Mark the entry as a global variable. + OffloadGlobalEntry = 0x0, + /// Mark the entry as a managed global variable. We're still using the same numeric value for two d

[PATCH] D123471: [CUDA] Create offloading entries when using the new driver

2022-05-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CGCUDARuntime.h:58 + /// Mark the entry as a global variable. + OffloadGlobalEntry = 0x0, + /// Mark the entry as a managed global variable. jhuber6 wrote: > tra wrote: > > We're still using

[PATCH] D128914: [HIP] Add support for handling HIP in the linker wrapper

2022-07-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D128914#3643495 , @jhuber6 wrote: > Yes, it's actually pretty difficult to find a CUDA application using > `fgpu-rdc`. It seems much more common to just stick everything that's needed > in the file.I've considered finding a CUDA

[PATCH] D129536: [CUDA][FIX] Make shfl[_sync] for unsigned long long non-recursive

2022-07-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Oops. Thank you for fixing this. Comment at: clang/test/CodeGenCUDA/shuffle_long_long.cu:52 + long long ll = 17; + ull = __shfl(ull, 7, 32); + ll = __shfl(ll, 7, 32); This crashes LLVM when we taget sm_70 where these instructions no long

<    2   3   4   5   6   7   8   9   10   11   >