[PATCH] D114601: Read path to CUDA from env. variable CUDA_PATH on Windows

2021-12-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D114601#3172199 , @mojca wrote: > What can/should I do next in order to proceed with this? Next step would be to commit the patch. If you do not have commit access, I can help you with that. If that's the case, please let me know

[PATCH] D114407: [CUDA][SPIRV] Use OpenCLKernel CC for CUDA -> SPIRV

2021-12-06 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rGabbdc13e6803: [CUDA][SPIRV] Use OpenCLKernel CC for CUDA -> SPIRV (authored by dcastagna, committed by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D11

[PATCH] D110622: [HIPSPV][3/4] Enable SPIR-V emission for HIP

2021-12-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D110622#3176804 , @yaxunl wrote: >>> So, the question is -- what's the right way to specify something like this >>> in a consistent manner? >>> `--offload` option proposed here does not seem to be a good fit. It was >>> intended

[PATCH] D110622: [HIPSPV][3/4] Enable SPIR-V emission for HIP

2021-12-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D110622#3177490 , @yaxunl wrote: > I don't think `--offload=` is restricted to be specified only once. The test > checks `--offload-arch=` and `--offload=` are mutually exclusive. It effectively is. See my inline comment. // R

[PATCH] D110618: [HIPSPV][2/4] Add HIPSPV tool chain

2021-12-14 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG4e94cba5b4e4: [HIPSPV][2/4] Add HIPSPV tool chain (authored by linjamaki, committed by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LA

[PATCH] D98783: [CUDA][HIP] Remove unused addr space casts

2021-03-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Shouldn't the unused ASCs be eliminated by DCE? We seem to be cleaning up the consequences of a problem that happened somewhere else. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D98783/new/ https://reviews.llvm.org/D98783 _

[PATCH] D98783: [CUDA][HIP] Remove unused addr space casts

2021-03-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I do not see any hanging ASCs in the generated IR with clang@HEAD: https://godbolt.org/z/TE4Yhr What do I miss? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D98783/new/ https://reviews.llvm.org/D98783 ___ cfe-commits ma

[PATCH] D98783: [CUDA][HIP] Remove unused addr space casts

2021-03-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D98783#2632244 , @yaxunl wrote: > In D98783#2632143 , @tra wrote: > >> Shouldn't the unused ASCs be eliminated by DCE? We seem to be cleaning up >> the consequences of a problem that happe

[PATCH] D98814: [CUDA][HIP] Mark device var used by host only

2021-03-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CGCUDANV.cpp:1102 + Info.D->isUsed() && !Info.D->hasAttr()) { +CGM.addCompilerUsedGlobal(Info.Var); + } Do we want to limit it further to only externally-visible variables? I think we

[PATCH] D98867: [HIP] Fix ROCm detection

2021-03-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:271 +} +if (LatestROCm < FileName) + LatestROCm = FileName.str(); This will rank `rocm-3.9` higher than `rocm-3.10`. I think you do need to extract and compare the version

[PATCH] D98902: [Clang][OpenMP][NVPTX] Fixed failure in openmp-offload-gpu.c if the system has CUDA

2021-03-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > My question is, if DeviceOffloadingKind == Action::OFK_Cuda, and we use -S, > do we also want to skip as well? I do not think so. Libdevice is needed to implement some libcalls that LLVM currently does not know how to handle. We do need it even when we compile with `-S`.

[PATCH] D98867: [HIP] Fix ROCm detection

2021-03-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:267-268 +// rocm-{major}.{minor}.{subMinor}[-{build}] +auto Loc = StringRef(VerStr).rfind('_'); +if (Loc != StringRef::npos) + VerStr[Loc] = '.'; You don't need `String

[PATCH] D98902: [Clang][OpenMP][NVPTX] Fixed failure in openmp-offload-gpu.c if the system has CUDA

2021-03-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D98902#2636308 , @jdoerfert wrote: > @tra, so you think we should not do this? The user will see a link error late > I assume, might be better. If I compile a `__device__ float foo(float f) { return sin(f); }` and it does compile

[PATCH] D98902: [Clang][OpenMP][NVPTX] Fixed failure in openmp-offload-gpu.c if the system has CUDA

2021-03-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D98902#2640477 , @tianshilei1992 wrote: > In D98902#2636308 , @jdoerfert wrote: > >> @tra, so you think we should not do this? The user will see a link error >> late I assume, might be bett

[PATCH] D98814: [CUDA][HIP] Mark device var used by host only

2021-03-22 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/test/CodeGenCUDA/host-used-device-var.cu:31-33 +// Check device-used static device var is not in llvm.compiler.used. +// CHECK-DAG: @_ZL2u4 +static __device__

[PATCH] D98867: [HIP] Fix ROCm detection

2021-03-22 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:269 +if (Loc != StringRef::npos) + VerStr[Loc] = '.'; +V.tryParse(VerStr); yaxunl wrote: > tra

[PATCH] D98783: [CUDA][HIP] Remove unused addr space casts

2021-03-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: rsmith. tra added a comment. We may want to add someone with more expertise with the IR as a reviewer. I'd like an educated opinion on whether the invisible dangling IR is something that needs fixing in general or if it's OK to just clean it up in this particular case. O

[PATCH] D71726: Let clang atomic builtins fetch add/sub support floating point types

2021-03-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. @jyknight - James, do you have further concerns about the patch? Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6454 +bool DiagAtomicLibCall = true; +for (auto *A : Args.filtered(options::OPT_W_Joined)) { + if (StringRef(A->getValue()) == "n

[PATCH] D99235: [HIP] Change to code object v4

2021-03-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/HIP.cpp:116 + if (getOrCheckAMDGPUCodeObjectVersion(C.getDriver(), Args) >= 4) +OffloadKind = OffloadKind + "v4"; for (const auto &II : Inputs) { We do not do it for v2/v3. Could you elabo

[PATCH] D99301: [HIP] add __builtin_get_device_side_mangled_name

2021-03-24 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with a couple of nits. Comment at: clang/include/clang/Basic/Builtins.h:39 OMP_LANG = 0x80,// builtin requires OpenMP. + HIP_LANG = 0x100, // builtin requires HIP.

[PATCH] D99297: [OPENMP]Fix PR49636: Assertion `(!Entry.getAddress() || Entry.getAddress() == Addr) && "Resetting with the new address."' failed.

2021-03-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM for CUDA. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D99297/new/ https://reviews.llvm.org/D99297 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org

[PATCH] D98193: [CUDA][HIP] Allow non-ODR use of host var in device

2021-03-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I think we also may want to check that we allow `sizeof(host_var)` in the GPU code. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D98193/new/ https://reviews.llvm.org/D98193 ___ cfe-commits mailing list cfe-commits@lists.

[PATCH] D99233: [HIP] Add option --gpu-inline-threshold

2021-03-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I'm concerned that if we make it a top-level option, it's likely to be cargo-culted and (mis)used as a sledgehammer in cases where it's not needed. I think the option should remain hidden. While thresholds do need to be tweaked, in my experience it happens very rarely. W

[PATCH] D99688: [CUDA][HIP] rename -fcuda-flush-denormals-to-zero

2021-04-01 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/test/Driver/cuda-flush-denormals-to-zero.cu:2 // Checks that cuda compilation does the right thing when passed -// -fcuda-flush-denormals-to-zero. This shoul

[PATCH] D99683: [HIP] Support ThinLTO

2021-04-01 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a subscriber: tejohnson. tra added a comment. This revision is now accepted and ready to land. LGTM in general. Please give LTO folks some time to chime in case they have any feedback. @tejohnson: Just a FYI that we're tinkering with LTO on GPUs here. ===

[PATCH] D99683: [HIP] Support ThinLTO

2021-04-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:1904-1907 +def foffload_lto : Flag<["-"], "foffload-lto">, Flags<[CoreOption]>, Group, + HelpText<"Enable LTO in 'full' mode for offload compilation">; +def fno_offload_lto : Flag<["-"], "fno-offload-l

[PATCH] D99235: [HIP] Change to code object v4

2021-04-06 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Driver/ToolChains/HIP.cpp:115 std::string OffloadKind = "hip"; + if (getOrCheckAMDGPUCodeObjectVersion(C.getDriver(), Args) >= 4) +OffloadKind = O

[PATCH] D99235: [HIP] Change to code object v4

2021-04-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Still LGTM. Comment at: clang/test/Driver/hip-code-object-version.hip:24-39 // Check bundle ID for code object v2. // RUN: %clang -### -target x86_64-linux-gnu \ // RUN: -mno-code-object-v3 \ // RUN: --offload-arch=gfx906 -nogpulib \ // RUN: %s

[PATCH] D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions

2021-04-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:460-468 +TARGET_BUILTIN(__nvvm_redux_sync_add_s32, "SiSii", "", SM_80) +TARGET_BUILTIN(__nvvm_redux_sync_min_s32, "SiSii", "", SM_80) +TARGET_BUILTIN(__nvvm_redux_sync_max_s32, "SiSii", "", SM_80)

[PATCH] D100136: Allow applying attributes to subset of allowed subjects.

2021-04-08 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: rsmith, arphaman. Herald added a subscriber: bixia. tra requested review of this revision. Herald added a project: clang. The original intent of enforcing exact match between `apply_to` and the attribute's was to `ensure that the user will know wha

[PATCH] D100136: Allow applying attributes to subset of allowed subjects.

2021-04-08 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 336212. tra added a comment. Updated Sema/pragma-attribute-strict-subjects.c Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D100136/new/ https://reviews.llvm.org/D100136 Files: clang/lib/Sema/SemaAttr.cpp clang/

[PATCH] D100136: Allow applying attributes to subset of allowed subjects.

2021-04-08 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 336236. tra added a comment. Addressed comments Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D100136/new/ https://reviews.llvm.org/D100136 Files: clang/lib/Sema/SemaAttr.cpp clang/test/Parser/pragma-attribute.

[PATCH] D100136: Allow applying attributes to subset of allowed subjects.

2021-04-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Sema/SemaAttr.cpp:896 + attr::SubjectMatchRule MatchRule = attr::SubjectMatchRule(Rule.first); + if (auto ParentRule = getParentAttrMatcherRule(MatchRule)) { +if (llvm::any_of(StrictSubjectMatchRuleSet, --

[PATCH] D100136: Allow applying attributes to subset of allowed subjects.

2021-04-12 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rG38cf112a6bc8: Allow applying attributes to subset of allowed subjects. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D100136/new/ https://

[PATCH] D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions

2021-04-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:460-468 +TARGET_BUILTIN(__nvvm_redux_sync_add_s32, "SiSii", "", SM_80) +TARGET_BUILTIN(__nvvm_redux_sync_min_s32, "SiSii", "", SM_80) +TARGET_BUILTIN(__nvvm_redux_sync_max_s32, "SiSii", "", SM_80)

[PATCH] D98783: [AMDGPU] Add GlobalDCE before internalization pass

2021-04-15 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with a test nit. Comment at: clang/test/CodeGenCUDA/unused-global-var.cu:23-29 +// CHECK-NOT: @_ZL2v3 +constexpr int v3 = 1; + +// Check managed variables are always kept. + +

[PATCH] D100598: [CUDA, FDO] Filter out profiling options from GPU-side compilations.

2021-04-15 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: yaxunl, tejohnson. Herald added subscribers: wenlei, bixia. tra requested review of this revision. Herald added a project: clang. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D100598 Files: clang/lib/Driver/ToolChains/Clang.cpp

[PATCH] D100598: [CUDA, FDO] Filter out profiling options from GPU-side compilations.

2021-04-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:5830 // these by hand. - if (Arg *A = getLastProfileSampleUseArg(Args)) { -auto *PGOArg = Args.getLastArg( -options::OPT_fprofile_generate, options::OPT_fprofile_generate_EQ, -opt

[PATCH] D100598: [CUDA, FDO] Filter out profiling options from GPU-side compilations.

2021-04-15 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 337896. tra added a comment. filter the options for AMD GPUs too. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D100598/new/ https://reviews.llvm.org/D100598 Files: clang/lib/Driver/ToolChains/Clang.cpp clang/t

[PATCH] D100552: [HIP] Diagnose compiling kernel without offload arch

2021-04-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Enforcing explicit GPU target makes sense. However, I think that singling out a `__global__` as the trigger is not sufficient for the intended purpose. If we can't generate a usable GPU-side binary, then we should produce an error if we need to generate *anything* during G

[PATCH] D100609: [Offload][OpenMP][CUDA] Allow fembed-bitcode for device offload

2021-04-16 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:4442-4446 + std::string CPU = getCPUName(Args, Triple, /*FromAs*/ false); + if (!CPU.empty()) { +CmdArgs.push_back("-target-cpu"); +CmdArgs.push_back(Args.MakeArgString(CPU)); +

[PATCH] D100609: [Offload][OpenMP][CUDA] Allow fembed-bitcode for device offload

2021-04-16 Thread Artem Belevich via Phabricator via cfe-commits
tra requested changes to this revision. tra added inline comments. This revision now requires changes to proceed. Comment at: clang/test/Driver/embed-bitcode-nvptx.cu:1 +// RUN: %clang -Xclang -triple -Xclang nvptx64 -S -Xclang -target-feature -Xclang +ptx70 -fembed-bitcode=all

[PATCH] D100598: [CUDA, FDO] Filter out profiling options from GPU-side compilations.

2021-04-16 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rGeaa9ef075d9b: [CUDA, FDO] Filter out profiling options from GPU-side compilations. (authored by tra). Changed prior to commit: https://reviews.llvm.org/D100598?vs=337896&id=338184#toc Repository: rG

[PATCH] D100609: [Offload][OpenMP][CUDA] Allow fembed-bitcode for device offload

2021-04-16 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/embed-bitcode-nvptx.cu:1 +// RUN: %clang -Xclang -triple -Xclang nvptx64 -S -Xclang -target-feature -Xclang +ptx70 -fembed-bitcode=all --cuda-device-only -nocudalib -nocudainc %s -o - | FileCheck %s +// REQUIRES: nvptx-re

[PATCH] D98193: [CUDA][HIP] Allow non-ODR use of host var in device

2021-04-19 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Small test nit. LGTM otherwise. Comment at: clang/test/CodeGenCUDA/device-use-host-var.cu:22 + +// CHECK: store i32 1 +// CHECK: store i32 2 Nit: You may want to a

[PATCH] D100794: [HIP] Support overloaded math functions for hipRTC

2021-04-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: jlebar. tra added a comment. LGTM overall. @jlebar: I could use your opinion here. Comment at: clang/lib/Headers/__clang_hip_cmath.h:341 - typedef decltype(__test(std::declval<_Tp>())) type; - static const bool value = !std::is_same::value; + typed

[PATCH] D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions

2021-04-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:460-468 +TARGET_BUILTIN(__nvvm_redux_sync_add_s32, "SiSii", "", SM_80) +TARGET_BUILTIN(__nvvm_redux_sync_min_s32, "SiSii", "", SM_80) +TARGET_BUILTIN(__nvvm_redux_sync_max_s32, "SiSii", "", SM_80)

[PATCH] D100794: [HIP] Support overloaded math functions for hipRTC

2021-04-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Headers/__clang_hip_cmath.h:341 - typedef decltype(__test(std::declval<_Tp>())) type; - static const bool value = !std::is_same::value; + typedef decltype(__test(_Tp{})) type; + static const bool value = !is_same::value;

[PATCH] D100794: [HIP] Support overloaded math functions for hipRTC

2021-04-20 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Comment at: clang/lib/Headers/__clang_hip_cmath.h:347 + static const bool value = !is_same::value; }; Nit: `}; // namespace __hip` CHANGES SINCE LAST A

[PATCH] D99233: [HIP] Add option --gpu-inline-threshold

2021-04-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > The planned new option for offloading will be a more generic solution, > however, I expect it will take time to develop and be adopted. Agreed. OK, let's use a hidden option until we have a better way of dealing with this. Comment at: clang/include/cla

[PATCH] D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions

2021-04-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Do you know if any existing code already uses the `__nvvm_*` builtins for `cp.async`? In other words, does nvcc provide them already or is it something we're free to name as we wish? I do not see any relevant intrinsics mentioned in NVVM IR spec: https://docs.nvidia.com/cud

[PATCH] D92954: [clang-offload-bundler] Add option -list

2021-01-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp:176 + /// List bundle IDs in \a Input. + virtual Error listBundleIDs(MemoryBuffer &Input) { +if (Error Err = ReadHeader(Input)) Now that listBundleIDs is only u

[PATCH] D93587: [hip] Fix HIP version parsing.

2021-01-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:91 -void RocmInstallationDetector::ParseHIPVersionFile(llvm::StringRef V) { +bool RocmInstallationDetector::parseHIPVersionFile(llvm::StringRef V) { SmallVector VersionParts; A com

[PATCH] D93638: [hip] Enable HIP compilation with ` on MSVC.

2021-01-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Headers/__clang_hip_runtime_wrapper.h:73-74 +#define __HOST_DEVICE__ \ + static __host__ __device__ inline __attribute__((always_inline)) +__HOST_DEVICE__ double _Cosh(double

[PATCH] D92434: [NFC][AMDGPU] AMDGPU code object V4 ABI documentation

2021-01-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: llvm/docs/AMDGPUUsage.rst:959 + = = = + ``EF_AMDGPU_FEATURE_XNACK_V2``0x01 Indicates if the ``xnack`` + targ

[PATCH] D94123: [NVPTX] Fix debugging information being added to NVPTX target if remarks are enabled

2021-01-06 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D94123/new/ https://reviews.llvm.org/D94123 ___ cfe-co

[PATCH] D93638: [hip] Enable HIP compilation with ` on MSVC.

2021-01-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Headers/__clang_hip_runtime_wrapper.h:73-74 +#define __HOST_DEVICE__ \ + static __host__ __device__ inline __attribute__((always_inline)) +__HOST_DEVICE__ double _Cosh(double

[PATCH] D93587: [hip] Fix HIP version parsing.

2021-01-06 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM overall. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:92 +// Parse and extract version numbers from `.hipVersion`. Return `true` if +// the parsing fails. +bool RocmInst

[PATCH] D93638: [hip] Enable HIP compilation with ` on MSVC.

2021-01-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Headers/__clang_hip_runtime_wrapper.h:73-74 +#define __HOST_DEVICE__ \ + static __host__ __device__ inline __attribute__((always_inline)) +__HOST_DEVICE__ double _Cosh(double

[PATCH] D93638: [hip] Enable HIP compilation with ` on MSVC.

2021-01-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Headers/__clang_hip_runtime_wrapper.h:73-74 +#define __HOST_DEVICE__ \ + static __host__ __device__ inline __attribute__((always_inline)) +__HOST_DEVICE__ double _Cosh(double

[PATCH] D93638: [hip] Enable HIP compilation with ` on MSVC.

2021-01-07 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. In D93638#2483913 , @hliao wrote: > Forget that C function could be overloaded on Clang with overloadable > extension. With that, we don't need to mark functi

[PATCH] D94337: Add cuda header type for cuh files

2021-01-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > This adds a cuda header type with file extension "cuh". The output type file > extension is "cuhi" - not sure if this is a good choice. This allows > language servers to properly handle cuh files without additional arguments. CUDA compilation is ... odd. While I don't have

[PATCH] D94337: Add cuda header type for cuh files

2021-01-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D94337#2491269 , @rgreenblatt wrote: > My primary goal for this change was to allow for language servers and other > tooling to properly handle cuda header files. From my understanding the way > that language servers handle c++ he

[PATCH] D94337: Add cuda header type for cuh files

2021-01-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D94337#2491761 , @rgreenblatt wrote: > For example consider the following header: > > #pragma once > > __global__ void a() { > unsigned block_idx = blockIdx.x; > unsigned thread_idx = threadIdx.x; > > __shfl_do

[PATCH] D94337: Add cuda header type for cuh files

2021-01-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D94337#2492108 , @rgreenblatt wrote: > In D94337#2491825 , @tra wrote: > >> 'Works' is not exactly the same as 'works correctly'. This example makes >> `a()` look like a regular host functio

[PATCH] D94337: Add cuda header type for cuh files

2021-01-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D94337#2496627 , @rgreenblatt wrote: >> `...` The goal of `__clang_cuda_standalone_defs.h` is to make it possible to >> parse CUDA sources at all w/o having to rely on CUDA SDK. `...` > > Should `__clang_cuda_standalone_defs.h` dep

[PATCH] D94732: [CUDA] Normalize handling of defauled dtor.

2021-01-14 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: HAPPY, yaxunl. Herald added a subscriber: bixia. tra requested review of this revision. Herald added a project: clang. Defaulted destructor was treated inconsistently, compared to other compiler-generated functions. When Sema::IdentifyCUDATarget()

[PATCH] D94884: [Clang][OpenMP] Include header for CUDA builtin vars into OpenMP wrapper header

2021-01-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Headers/__clang_cuda_builtin_vars.h:40 +#ifdef __cplusplus #define __CUDA_DEVICE_BUILTIN(FIELD, INTRINSIC) \ Perhaps we should move all C++-related code under `#ifdef __cplusplus`

[PATCH] D85223: [CUDA][HIP] Support accessing static device variable in host code for -fgpu-rdc

2021-01-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I'd propose splitting the patch into two. One with the addition of CUID and the other that changes the way we havdle static vars. CUID is useful on its own and is relatively uncontroversial. Externalizing static vars is a more interesting issue and I'm not sure what's the b

[PATCH] D94884: [Clang][OpenMP] Include header for CUDA builtin vars into OpenMP wrapper header

2021-01-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Headers/__clang_cuda_builtin_vars.h:31 +#ifdef __OPENMP_NVPTX__ +#define DEVICE +#else You should use `__` prefix to avoid unintentional clashes with user-defined macros. `__DEVICE__` ? Comment

[PATCH] D94814: [HIP] Support `__managed__` attribute

2021-01-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Presumably, `__managed__` variables would have to be memory-mapped into the host address space. Comment at: clang/lib/CodeGen/CGCUDANV.cpp:391 + SmallVector, 8> WorkList; + for (auto &&UU : Var->uses()) { +WorkList.push_back({UU.getUser()});

[PATCH] D94884: [Clang][OpenMP] Include header for CUDA builtin vars into OpenMP wrapper header

2021-01-19 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM for `__clang_cuda_builtin_vars.h`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D94884/new/ https://reviews.llvm.org/D94884

[PATCH] D94884: [Clang][OpenMP] Include header for CUDA builtin vars into OpenMP wrapper header

2021-01-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D94884#2508000 , @JonChesterfield wrote: > I won't object too strongly, as ultimately I don't care about cuda, but I > view intertwining the two implementations as technical debt. +1 A lot of CUDA-releated headers are not intende

[PATCH] D95007: [CUDA][HIP] Add -fuse-cuid

2021-01-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Herald added a reviewer: jansvoboda11. Comment at: clang/lib/Frontend/CompilerInvocation.cpp:2307 + // Only alphanumeric and underscore is allowed in -cuid option. + if (auto *A = Args.getLastArg(OPT_cuid_EQ)) { +const char *V = A->getValue(); ---

[PATCH] D94732: [CUDA] Normalize handling of defauled dtor.

2021-01-20 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 317946. tra added a comment. Addressed Richard's comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D94732/new/ https://reviews.llvm.org/D94732 Files: clang/lib/Sema/SemaCUDA.cpp clang/lib/Sema/SemaDeclCXX

[PATCH] D94732: [CUDA] Normalize handling of defauled dtor.

2021-01-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Sema/SemaDeclCXX.cpp:15162-15170 + bool SkipDtorChecks = VD->getType()->isArrayType(); + + // CUDA: Skip destructor checks for host-only variables during device-side + // compilation + SkipDtorChecks |= + (LangOpts.CUDAIsD

[PATCH] D85223: [CUDA][HIP] Support accessing static device variable in host code for -fgpu-rdc

2021-01-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/AST/ASTContext.cpp:11446-11447 bool ASTContext::mayExternalizeStaticVar(const Decl *D) const { - return !getLangOpts().GPURelocatableDeviceCode && + return (!getLangOpts().CUID.empty() || + !getLangOpts().GPURelocatable

[PATCH] D94732: [CUDA] Normalize handling of defauled dtor.

2021-01-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Sema/SemaDeclCXX.cpp:15162-15170 + bool SkipDtorChecks = VD->getType()->isArrayType(); + + // CUDA: Skip destructor checks for host-only variables during device-side + // compilation + SkipDtorChecks |= + (LangOpts.CUDAIsD

[PATCH] D94732: [CUDA] Normalize handling of defauled dtor.

2021-01-20 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 318018. tra added a comment. Removed unneeded changes. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D94732/new/ https://reviews.llvm.org/D94732 Files: clang/lib/Sema/SemaCUDA.cpp clang/lib/Sema/SemaExprCXX.cpp

[PATCH] D94732: [CUDA] Normalize handling of defauled dtor.

2021-01-20 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 318052. tra added a comment. Added a test for the corner case Richard has pointed out in the comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D94732/new/ https://reviews.llvm.org/D94732 Files: clang/lib/Se

[PATCH] D94814: [HIP] Support `__managed__` attribute

2021-01-21 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D94814/new/ https://reviews.llvm.org/D94814 ___ cfe-commits mailing list cfe-commits@lists.ll

[PATCH] D110781: [CUDA] Make sure is included with original __THROW defined.

2021-10-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. @jdoerfert: Ping. I think you're the best match for reviewing the code related to interaction between CUDA and the system include files, even if this particular header does not have much to do with OpenMP. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D110781: [CUDA] Make sure is included with original __THROW defined.

2021-10-06 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 377573. tra edited the summary of this revision. tra added a comment. Added a missing push_macro Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D110781/new/ https://reviews.llvm.org/D110781 Files: clang/lib/Header

[PATCH] D110781: [CUDA] Make sure is included with original __THROW defined.

2021-10-06 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 377587. tra added a comment. re-set __THROW to an empty value. It's still needed for CUDA-7.5 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D110781/new/ https://reviews.llvm.org/D110781 Files: clang/lib/Headers/_

[PATCH] D110781: [CUDA] Make sure is included with original __THROW defined.

2021-10-06 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 377593. tra added a comment. Moved string.h inclusion to the top of the file. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D110781/new/ https://reviews.llvm.org/D110781 Files: clang/lib/Headers/__clang_cuda_runt

[PATCH] D110781: [CUDA] Make sure is included with original __THROW defined.

2021-10-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D110781#3045280 , @jdoerfert wrote: > I first had to read up on pop_macro -.-. Still unsure if it is OK to push it > once and pop it twice, that works fine? Good catch. > Can't we move the string include earlier, grouped with st

[PATCH] D110089: [CUDA] Implement experimental support for texture lookups.

2021-10-06 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 377687. tra added a comment. Removed obsolete comment. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D110089/new/ https://reviews.llvm.org/D110089 Files: clang/lib/Headers/CMakeLists.txt clang/lib/Headers/__cla

[PATCH] D110089: [CUDA] Implement experimental support for texture lookups.

2021-10-06 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 377703. tra added a comment. Use `int` for string hash calculations to avoid dealing with char signedness. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D110089/new/ https://reviews.llvm.org/D110089 Files: clang/

[PATCH] D110089: [CUDA] Implement experimental support for texture lookups.

2021-10-06 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGccfb0555f76b: [CUDA] Implement experimental support for texture lookups. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAS

[PATCH] D110089: [CUDA] Implement experimental support for texture lookups.

2021-10-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D110089#3047161 , @kgk wrote: > Will the new macros in this patch also be useful for supporting the > surface-related methods that also use __nv_tex_surf_handler (from > surface_indirect_functions.h)? > > I gave this new code a t

[PATCH] D110781: [CUDA] Make sure is included with original __THROW defined.

2021-10-07 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG29e00b29f76a: [CUDA] Make sure is included with original __THROW defined. (authored by tra). Repository: rG LLVM Github Monorepo CHANG

[PATCH] D110089: [CUDA] Implement experimental support for texture lookups.

2021-10-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D110089#3049656 , @kgk wrote: > Very cool! I am selfishly curious if support for surface operations is > something you plan to add. I had a go at implementing it myself today based > on this patch, and found it a bit harder than

[PATCH] D111443: [Driver] Fix ToolChain::getSanitizerArgs

2021-10-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I'm curious why we need the cache at all. While the construction of sanitizer args is hairy, it's only called few times from the driver and will be lost in the noise compared to everything else. Comment at: clang/include/clang/Driver/ToolChain.h:165 -

[PATCH] D111443: [Driver] Fix ToolChain::getSanitizerArgs

2021-10-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a reviewer: eugenis. tra added a subscriber: eugenis. tra added a comment. In D111443#3052381 , @yaxunl wrote: > In D111443#3052099 , @tra wrote: > >> I'm curious why we need the cache at all. While the

[PATCH] D111488: [Clang][clang-nvlink-wrapper] Pass nvlink path to the wrapper

2021-10-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:620 + Args.MakeArgString(Twine("--nvlink-command=" + NvlinkBin)); + CmdArgs.push_back(NvlinkPath); + Nit: the variables above are used only once. It would be fine to just `push_bac

[PATCH] D111665: [CUDA] Provide address space conversion builtins.

2021-10-12 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jdoerfert. Herald added subscribers: bixia, yaxunl. tra requested review of this revision. Herald added a project: clang. CUDA-11 headers rely on these NVCC builtins. Despite having `__nv` previx, those are *not* provided by libdevice. Repository:

[PATCH] D111665: [CUDA] Provide address space conversion builtins.

2021-10-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Tested by verifying that generated code (both PTX and SASS) matches that of NVCC: https://godbolt.org/z/rWYYx63bT Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D111665/new/ https://reviews.llvm.org/D111665

[PATCH] D111665: [CUDA] Provide address space conversion builtins.

2021-10-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D111665#3059427 , @jdoerfert wrote: > Not loving the magic constants here but I don't think we have a enum or > similar right now. Yup. > I also have to question the people that choose `size_t` here... we will end > up with int

[PATCH] D111665: [CUDA] Provide address space conversion builtins.

2021-10-12 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rGf526ee5b8517: [CUDA] Provide address space conversion builtins. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D111665/new/ https://reviews

[PATCH] D111665: [CUDA] Provide address space conversion builtins.

2021-10-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D111665#3059989 , @jdoerfert wrote: > Except when it doesn't get instcombined away: https://godbolt.org/z/YE4EfEPde Well, it does get translated into sensible PTX, so, while not ideal, it's not too big of a deal. Using an integer

<    13   14   15   16   17   18   19   >