[PATCH] D43461: [CUDA] Include single GPU binary, NFCI.

2018-02-20 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: lib/Frontend/CompilerInvocation.cpp:1044-1045 - Opts.CudaGpuBinaryFileNames = - Args.getAllArgValues(OPT_fcuda_include_gpubinary); + Opts.CudaGpuBinaryFile

[PATCH] D43602: [CUDA] Added missing functions.

2018-02-21 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. Herald added a subscriber: sanjoy. Initial commit missed sincos(float), llabs() and few atomics that we used to pull in from device_functions.hpp, which we no longer include. https://reviews.llvm.org/D43602 Files: clang/lib/Headers/__cl

[PATCH] D43602: [CUDA] Added missing functions.

2018-02-21 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 135348. tra added a comment. Added missing __threadfence_system(). https://reviews.llvm.org/D43602 Files: clang/lib/Headers/__clang_cuda_device_functions.h Index: clang/lib/Headers/__clang_cuda_device_functions.h

[PATCH] D43602: [CUDA] Added missing functions.

2018-02-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In https://reviews.llvm.org/D43602#1015370, @jlebar wrote: > For my information, how are we verifying that we've caught everything? for v in 8.0 9.0 9.1 ; do /usr/local/cuda-$v/bin/nvcc -c -x cu /dev/null -o /tmp/null.o -arch=sm_60 -keep-dir=nvcc-$v -keep -v

[PATCH] D43602: [CUDA] Added missing functions.

2018-02-22 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rC325814: [CUDA] Added missing functions. (authored by tra, committed by ). Changed prior to commit: https://reviews.llvm.org/D43602?vs=135348&id=135466#toc Repository: rC Clang https://reviews.llvm.o

[PATCH] D33108: Generate extra .ll files before/after optimization when using -save-temps.

2017-05-31 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/Driver/Driver.cpp:2600-2603 + // When saving temps, add extra actions to write unoptimized and optimized + // IR besides the normal bitcode outputs if possible. This is not possible + // for multi-arch builds because in

[PATCH] D33108: Generate extra .ll files before/after optimization when using -save-temps.

2017-06-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/Driver/Driver.cpp:2603-2614 + // lipo-able. + if (!MultiArchUniversalBuild) { +if (isSaveTempsEnabled() && Phase == phases::Compile) { + Actions.push_back( + C.MakeAction(Current, types::TY_LLVM_IR

[PATCH] D41521: [CUDA] fixes for __shfl_* intrinsics.

2017-12-21 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. Herald added a subscriber: sanjoy. - __shfl_{up,down}* uses `unsigned int` for the third parameter. - added [unsigned] long overloads for non-sync shuffles. Augments r319908 which added long overload for sync shuffles. https://reviews.llv

[PATCH] D41521: [CUDA] fixes for __shfl_* intrinsics.

2017-12-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Added to my todo list. There are few more gaps that I want to test in order to make sure we don't regress on compatibility with older CUDA versions while changing these wrappers. https://reviews.llvm.org/D41521 ___ cfe-commit

[PATCH] D41521: [CUDA] fixes for __shfl_* intrinsics.

2017-12-21 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rC321326: [CUDA] More fixes for __shfl_* intrinsics. (authored by tra, committed by ). Changed prior to commit: https://reviews.llvm.org/D41521?vs=127950&id=127962#toc Repository: rC Clang https://rev

[PATCH] D41781: [DeclPrinter] Handle built-in C++ types in -ast-print.

2018-01-05 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: arphaman. Herald added subscribers: jlebar, sanjoy. Fixes a crash in clang when it crashed with -ast-print on code that contained decltype(nullptr). https://reviews.llvm.org/D41781 Files: clang/lib/AST/DeclPrinter.cpp clang/test/Sema/ast-pri

[PATCH] D41781: [DeclPrinter] Handle built-in C++ types in -ast-print.

2018-01-05 Thread Artem Belevich via Phabricator via cfe-commits
tra abandoned this revision. tra added a comment. Never mind. There must be something else going on in the case where I've discovered the crash. the test case in this patch does not really reproduce the issue by itself. :-( https://reviews.llvm.org/D41781 ___

[PATCH] D41788: [DeclPrinter] Fix two cases that crash clang -ast-print.

2018-01-05 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: arphaman. Herald added subscribers: jlebar, sanjoy. Both crashes are related to handling anonymous structures. - clang didn't handle () around an anonymous struct variable. - clang also crashed on syntax errors that could lead to other syntactic c

[PATCH] D44435: Add the module name to __cuda_module_ctor and __cuda_module_dtor for unique function names

2018-03-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/CodeGen/CGCUDANV.cpp:281 + // get name from the module to generate unique ctor name for every module + SmallString<128> ModuleName rjmccall wrote: > Please explain in the comment *why* you're doing this. It's just f

[PATCH] D44188: Misc typos

2018-03-14 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. You may want to split the patch into smaller pieces -- per leaf directory, for example. That way potential reviewers would at least be able to see the changes (hundreds of files in one patch is a bit too much), and the subset of changes would be small enough for someone to

[PATCH] D44435: Add the module name to __cuda_module_ctor and __cuda_module_dtor for unique function names

2018-03-14 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: unittests/CodeGen/IncrementalProcessingTest.cpp:176-178 + +// In CUDA incremental processing, a CUDA ctor or dtor will be generated for +// every statement if a fatbinary file exists. SimeonEhrig wrote: > tra wrote: > > I d

[PATCH] D44435: Add the module name to __cuda_module_ctor and __cuda_module_dtor for unique function names

2018-03-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: unittests/CodeGen/IncrementalProcessingTest.cpp:176-178 + +// In CUDA incremental processing, a CUDA ctor or dtor will be generated for +// every statement if a fatbinary file exists. SimeonEhrig wrote: > tra wrote: > > Sim

[PATCH] D44435: Add the module name to __cuda_module_ctor and __cuda_module_dtor for unique function names

2018-03-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: unittests/CodeGen/IncrementalProcessingTest.cpp:176-178 + +// In CUDA incremental processing, a CUDA ctor or dtor will be generated for +// every statement if a fatbinary file exists. SimeonEhrig wrote: > tra wrote: > > Sim

[PATCH] D47394: [OpenMP][Clang][NVPTX] Replace bundling with partial linking for the OpenMP NVPTX device offloading toolchain

2018-05-31 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In https://reviews.llvm.org/D47394#1118223, @gtbercea wrote: > I tried this example > (https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/). > It worked with NVCC but not with clang++. I can produce the main.o particle.o > and v.o objects as relocata

[PATCH] D47201: [CUDA] Implement nv_weak attribute for functions

2018-06-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. IIUIC, nv_weak is a synonym for weak (why, oh why did they need it?) You may need to hunt down and change few other places that deal with the weak attribute. E.g.: https://github.com/llvm-project/llvm-project-20170507/blob/master/clang/lib/AST/Decl.cpp#L4267 https://github.

[PATCH] D47070: [CUDA] Upgrade linked bitcode to enable inlining

2018-06-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. IMO overriding TargetTransformInfo::areInlineCompatible to always return true on NVPTX is what we want to do instead of upgrading everything else. AFAICT, on NVPTX there's no reason to prevent inlining due to those attributes -- we'll never generate code, nor will we ever ex

[PATCH] D47733: [CUDA][HIP] Set kernel calling convention before arrange function

2018-06-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Looks OK overall, but I'm not very familiar with CGCall, so you may want to dig through the history and find someone with more expertise. Comment at: test/CodeGenCUDA/kernel-args-amdgcn.cu:1 +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -e

[PATCH] D47394: [OpenMP][Clang][NVPTX] Replace bundling with partial linking for the OpenMP NVPTX device offloading toolchain

2018-06-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. With the updated patch description + the discussion I'm OK with the approach from the general "how do we compile/use CUDA" point of view. I'll leave the question of whether the approach works for OpenMP to someone more familiar with it. While I'm not completely convinced t

[PATCH] D47201: [CUDA] Implement nv_weak attribute for functions

2018-06-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I've experimented a bit and I think that we may not need this patch at all. As far as I can tell, nv_weak is only applicable to __device__ functions. It's ignored for __global__ kernels and is apparently forbidden for data. For __device__ functions nvcc produces .weak attribu

[PATCH] D47804: [CUDA] Replace 'nv_weak' attributes in CUDA headers with 'weak'.

2018-06-05 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: Hahnfeld, aaron.ballman, jlebar. Herald added subscribers: bixia, sanjoy. An alternative to implementing nv_weak attribute (https://reviews.llvm.org/D47201). The patch should make runtime sub functions to have .weak attribute in PTX and that shoul

[PATCH] D47804: [CUDA] Replace 'nv_weak' attributes in CUDA headers with 'weak'.

2018-06-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I'll wait to see if that fixes @Hahnfeld's problem. AFAICT, nv_weak is not used for anything interesting other than the device-side runtime stubs that return errors (apparently, until they are linked with the proper device runtime which would have strong version of the symb

[PATCH] D47376: [CUDA][HIP] Do not offload for -M

2018-06-05 Thread Artem Belevich via Phabricator via cfe-commits
tra requested changes to this revision. tra added a comment. This revision now requires changes to proceed. I'm not sure this is the right thing to do. What if user explicitly wants device-side dependencies and runs `clang --cuda-device-only -M` ? This patch makes it impossible. I'd rather tell

[PATCH] D47376: [CUDA][HIP] Do not offload for -M

2018-06-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Just to make it clear, I'm not against making a sensible default choice, but rather want to make sure that it is possible to override it if the user needs to. https://reviews.llvm.org/D47376 ___ cfe-commits mailing list cfe-co

[PATCH] D47733: [CUDA][HIP] Set kernel calling convention before arrange function

2018-06-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. @rsmith - Richard, can you take a look? Comment at: test/CodeGenCUDA/kernel-args.cu:1-2 +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm %s -o - | FileCheck -check-prefix=AMDGCN %s +// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda- -

[PATCH] D47555: [HIP] Fix unbundling

2018-06-06 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Few minor nits/suggestions. LGTM otherwise. Comment at: lib/Driver/Driver.cpp:3895 +if (UI.DependentOffloadKind == Action::OFK_Host) + Arch = StringRef(); +

[PATCH] D47804: [CUDA] Replace 'nv_weak' attributes in CUDA headers with 'weak'.

2018-06-06 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rC334108: [CUDA] Replace 'nv_weak' attributes in CUDA headers with 'weak'. (authored by tra, committed by ). Changed prior to commit: https://reviews.llvm.org/D47804?vs=150055&id=150166#toc Repository:

[PATCH] D47845: [CUDA] Removed unused __nvvm_* builtins with non-generic pointers.

2018-06-06 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: jlebar, arsenm. Herald added subscribers: bixia, wdng, sanjoy, jholewinski. They were hot even hooked into CGBuiltin's machinery. Even if they were, CUDA does not support AS-specific pointers, so there would be no legal way no way to call these built

[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

2018-06-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In https://reviews.llvm.org/D47849#1124638, @Hahnfeld wrote: > IMO this goes into the right direction, we should use the fast implementation > in libdevice. If LLVM doesn't lower these calls in the NVPTX backend, I think > it's ok to use header wrappers as CUDA already does

[PATCH] D47958: [CUDA][HIP] Allow CUDA kernel to have amdgpu kernel attributes

2018-06-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Drive-by review: The patch could use a better description. Something that describes *what* the patch does (E.g. enforce that attributes X/Y/Z are only applied to __global__ functions.) *why* the change is needed is relevant, too, but it's not very useful without the *what*

[PATCH] D47958: [CUDA][HIP] Allow CUDA __global__ functions to have amdgpu kernel attributes

2018-06-12 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Thank you. https://reviews.llvm.org/D47958 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-c

[PATCH] D48036: [CUDA] Make min/max shims host+device.

2018-06-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Last comment in the bug pointed out that those overloads should be constexpr in c++14. Maybe in a separate patch, though. https://bugs.llvm.org/show_bug.cgi?id=37753#c5 https://reviews.llvm.org/D48036 ___ cfe-commits mailing l

[PATCH] D48036: [CUDA] Make min/max shims host+device.

2018-06-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Ack. https://reviews.llvm.org/D48036 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D48287: [HIP] Support -fcuda-flush-denormals-to-zero for amdgcn

2018-06-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Using OpenCL's flag for the purpose adds a *third* way we handle denormals flushing in clang. Now it would be HIP (which is CUDA-like) using OpenCL's flag for denormals instead of CUDA's one. You could change AMDGPUTargetInfo::adjustTargetOptions() to use CGOpts.getLangOpts

[PATCH] D47845: [CUDA] Removed unused __nvvm_* builtins with non-generic pointers.

2018-06-20 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rC335168: [CUDA] Removed unused __nvvm_* builtins with non-generic pointers. (authored by tra, committed by ). Changed prior to commit: https://reviews.llvm.org/D47845?vs=150194&id=152149#toc Repository:

[PATCH] D57162: [DEBUG_INFO][NVPTX] Generate correct data about variable address class.

2019-01-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/CodeGen/CGDebugInfo.cpp:4235 CGM.getContext().getTargetAddressSpace(D->getType()); +if (CGM.getLangOpts().CUDA && CGM.getLangOpts().CUDAIsDevice) { + if (D->hasAttr()) probinson wrote: > Can a variable

[PATCH] D55673: [darwin] parse the SDK settings from SDKSettings.json if it exists and pass in the -target-sdk-version to the compiler and backend

2019-01-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Would that be OK to use target_sdk_version to pass *CUDA* SDK version to the CC1 compilations? I have upcoming changes that need to know the version to generate correct glue IR for CUDA. The driver currently figures out detected CUDA version in lib/Driver/ToolChains/Cuda.cp

[PATCH] D57487: [CUDA] Propagate detected version of CUDA to cc1

2019-01-30 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. Herald added subscribers: bixia, sanjoy. ..and use it to control that parts of CUDA compilation that depend on the specific version of CUDA SDK. This patch has a placeholder for a 'new launch API' support which is in a separate patch (I'll

[PATCH] D57488: [CUDA] add support for the new kernel launch API in CUDA-9.2+.

2019-01-30 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. Herald added subscribers: bixia, sanjoy. Instead of calling CUDA runtime to arrange function arguments, the new API constructs arguments in a local array and the kernels are launched with __cudaLaunchKernel(). The old API has been depreca

[PATCH] D57487: [CUDA] Propagate detected version of CUDA to cc1

2019-01-30 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 184412. tra added a comment. Updated the comment about SDKVersion use. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D57487/new/ https://reviews.llvm.org/D57487 Files: clang/include/clang/Basic/Cuda.h clang/include/clang/Basic/TargetOptions.h clan

[PATCH] D55673: [darwin] parse the SDK settings from SDKSettings.json if it exists and pass in the -target-sdk-version to the compiler and backend

2019-01-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > I would be ok with reusing that option, as long as it's documented that there > is a difference in terms of how it can be used. The patch is in https://reviews.llvm.org/D57487 It does not look like we're formally documenting CC1 options anywhere. I've added some comments

[PATCH] D57487: [CUDA] Propagate detected version of CUDA to cc1

2019-01-30 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 184416. tra added a comment. Addressed Justin's comments. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D57487/new/ https://reviews.llvm.org/D57487 Files: clang/include/clang/Basic/Cuda.h clang/include/clang/Basic/TargetOptions.h clang/lib/Basic/C

[PATCH] D57488: [CUDA] add support for the new kernel launch API in CUDA-9.2+.

2019-01-31 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 184543. tra marked 8 inline comments as done. tra edited the summary of this revision. tra added a comment. Addressed Justin's comments. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D57488/new/ https://reviews.llvm.org/D57488 Files: clang/include/cla

[PATCH] D57488: [CUDA] add support for the new kernel launch API in CUDA-9.2+.

2019-01-31 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CGCUDANV.cpp:239 +CGM.Error(CGF.CurFuncDecl->getLocation(), + "Can't find declaration for cudaLaunchKernel()"); // FIXME. +return; jlebar wrote: > Unfixed FIXME? Fixed the comment. :-)

[PATCH] D57488: [CUDA] add support for the new kernel launch API in CUDA-9.2+.

2019-01-31 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 184592. tra added a comment. Updated ASTMatchers unit test. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D57488/new/ https://reviews.llvm.org/D57488 Files: clang/include/clang/Basic/DiagnosticSemaKinds.td clang/include/clang/Sema/Sema.h clang/lib

[PATCH] D54183: [HIP] Change default optimization level to -O3

2018-11-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I'm not convinced that nvcc's behavior is a good guide for clang's defaults. Considering that clang is not compatible with nvcc when it comes to command line options, whoever is using clang to compile CUDA already has to adjust command line options. Explicitly adding `-O3`

[PATCH] D54496: [HIP] Fix device only compilation

2018-11-13 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Do I understand it correctly that the bug appears to affect HIP compilation only? https://reviews.llvm.org/D54496 ___ cfe-commits mailing list cfe-com

[PATCH] D55269: [CUDA][OpenMP] Fix nvidia-cuda-toolkit detection on Debian/Ubuntu

2018-12-04 Thread Artem Belevich via Phabricator via cfe-commits
tra requested changes to this revision. tra added a comment. This revision now requires changes to proceed. I'm not sure that's something that needs to be fixed in clang. IIUIC, Debian has added a shim that pretends to be a monolithic CUDA install: https://bugs.launchpad.net/ubuntu/+source/clang/

[PATCH] D55269: [CUDA][OpenMP] Fix nvidia-cuda-toolkit detection on Debian/Ubuntu

2018-12-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. It appears that what you're trying to do is to add "/usr/lib/cuda" on Ubuntu and Debian when --cuda-path=/usr is specified. This is a rather odd thing to do. In the end only one of those paths will be in effect and that's the path that should be specified via --cuda-path. Th

[PATCH] D55269: [CUDA][OpenMP] Fix nvidia-cuda-toolkit detection on Debian/Ubuntu

2018-12-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D55269#1319319 , @jdenny wrote: > My real goal is to get clang and openmp working out of the box on Ubuntu. > Treating --cuda-path=/usr as a special case was just a way to get there. > It's odd apparently because nvidia-cuda-too

[PATCH] D55269: [CUDA][OpenMP] Fix nvidia-cuda-toolkit detection on Debian/Ubuntu

2018-12-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D55269#1319437 , @jdenny wrote: > In D55269#1319382 , @tra wrote: > > > Let's start with fixing OpenMP's cmake files. Once it no longer insists on > > specifying --cuda-path=/usr, and isUbun

[PATCH] D55269: [CUDA][OpenMP] Fix nvidia-cuda-toolkit detection on Debian/Ubuntu

2018-12-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D55269#1320207 , @Hahnfeld wrote: > I think there are some misunderstandings here, or at least I understand > things differently than @tra is describing them: AFAICS this change is NOT > about replacing `nvcc` by `clang` in any CM

[PATCH] D55269: [CUDA] Fix nvidia-cuda-toolkit detection on Ubuntu

2018-12-06 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D55269/new/ https://reviews.llvm.org/D55269 ___ cfe-commits mailing list cfe-commits@lists.

[PATCH] D55456: [CUDA] added missing 'inline' for the functions defined in the header.

2018-12-07 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. Herald added subscribers: bixia, sanjoy. https://reviews.llvm.org/D55456 Files: clang/lib/Headers/cuda_wrappers/new Index: clang/lib/Headers/cuda_wrappers/new === --- clang

[PATCH] D55456: [CUDA] added missing 'inline' for the functions defined in the header.

2018-12-07 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. jlebar@ LGTM'ed via email. Landed in rL348662 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D55456/new/ https://reviews.llvm.org/D55456 ___

[PATCH] D58463: [CUDA]Delayed diagnostics for the asm instructions.

2019-02-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. There's a new quirk we've ran into after this patch landed. Consider this code: int foo() { int prev; __asm__ __volatile__("whatever" : "=a" (prev)::); return prev; } When we compile for device, asm constraint is not valid for NVPTX, we emit delayed diag and

[PATCH] D58463: [CUDA]Delayed diagnostics for the asm instructions.

2019-02-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > Hi Artem, I think we can just delay emission of this warning to solve this > problem. I'm not sure we can always tell whether the warning is real or if it's the consequence of failing to parse inline asm. E.g.: namespace { __host__ __device__ a() { int prev;

[PATCH] D58463: [CUDA]Delayed diagnostics for the asm instructions.

2019-02-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. >> E.g.: >> >> namespace { >> __host__ __device__ a() { >> int prev; >> __asm__ __volatile__("mov %0, 0" : "=a" (prev)::); >> return prev; >> } >> >> __host__ __device__ b() { >> int prev; >> return prev; >> } >> >> } //namespace >> >>

[PATCH] D57716: [CUDA][HIP] Check calling convention based on function target

2019-02-26 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D57716/new/ https://reviews.llvm.org/D57716 ___ cfe-commits mailing list cfe-commits@lists.

[PATCH] D58518: [HIP] change kernel stub name

2019-02-26 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added subscribers: jyknight, bkramer. tra added inline comments. This revision is now accepted and ready to land. Comment at: lib/CodeGen/CodeGenModule.cpp:1059 +FD->hasAttr()) + MangledName = MangledName + ".stub"; +

[PATCH] D58917: [HIP] Do not unbundle object files for -fno-gpu-rdc

2019-03-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: ABataev. tra added a comment. The change looks OK as far as regular CUDA is concerned. That said, I'm not quite familiar with the use of bundling/unbundling actions and you should probably get someone who uses/depends on them to take a look. I think OpenMP uses them. Perh

[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

2019-03-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > This is, or is very similar to, the problem that the host/device overloading > addresses in CUDA. IIRC the difference was that OpenMP didn't have explicit notion of host/device functions which made it hard to apply host/device overloading in practice. > It is also the

[PATCH] D59647: [CUDA][HIP] Warn shared var initialization

2019-03-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. This looks like one of the things we should *not* do as it affects correctness -- non-trivial constructor may be arbitrarily complex and the per-TU flag to enable this behavior is way too coarse, IMO. On the other hand, I can believe that someone somewhere did write the code

[PATCH] D59863: [HIP] Support gpu arch gfx906+sram-ecc

2019-03-27 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/Basic/Cuda.cpp:113 + case CudaArch::GFX906_SRAM_ECC: // TBA +return "gfx906+sram-ecc"; case CudaArch::GFX909: // TBA Wording nit: Does it mean `+(SRAM, ECC)` or `+SRAM, -ECC` ? From the rest of the changes I gue

[PATCH] D59900: [Sema] Fix a crash when nonnull checking

2019-03-27 Thread Artem Belevich via Phabricator via cfe-commits
tra added subscribers: jlebar, rsmith. tra added a comment. @rsmith, @jlebar I'm out of my depth here and could use some language lawyering help. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D59900/new/ https://reviews.llvm.org/D59900 _

[PATCH] D59900: [Sema] Fix a crash when nonnull checking

2019-03-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/SemaTemplate/decltype.cpp:1 +// RUN: %clang_cc1 -std=c++11 -fsyntax-only -verify %s +// no crash & no diag Rakete wrote: > test/SemaCXX/nonnull.cpp would be a better place to put this test. `test/SemaCXX/nonnu

[PATCH] D60141: [HIP-Clang] Fat binary should not be produced for non GPU code

2019-04-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. General nit: please use diffs with very large context when you submit patches with Phabricator. https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface Comment at: lib/CodeGen/CGCUDANV.cpp:475-476 return nullptr; + if (IsHIP

[PATCH] D60141: [HIP-Clang] Fat binary should not be produced for non GPU code

2019-04-02 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. Comment at: lib/CodeGen/CGCUDANV.cpp:475-476 return nullptr; + if (IsHIP && EmittedKernels.empty() && DeviceVars.empty()) +return nullptr; // void __{cuda|hip}_register_globals(void* handle); yax

[PATCH] D60141: [HIP-Clang] Fat binary should not be produced for non GPU code

2019-04-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D60141#1452019 , @ashi1 wrote: > Hi Artem, I had just committed the change. IS this change OK or should I > revert it? The `if` condition could use some clang-formatting, but other than that the patch still looks OK to me. Rep

[PATCH] D60220: [CUDA][Windows] Final fix for bug 38811 (Step 3 of 3)

2019-04-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Headers/__clang_cuda_cmath.h:81-90 +__DEVICE__ bool isinf(long double __x) { return ::__isinfl(__x); } __DEVICE__ bool isfinite(float __x) { return ::__finitef(__x); } // For inscrutable reasons, __finite(), the double-precision

[PATCH] D60279: [CUDA] Implemented _[bi]mma* builtins.

2019-04-04 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: timshen, jlebar. Herald added subscribers: llvm-commits, bixia, hiraditya, jholewinski. Herald added a project: LLVM. These builtins provide access to the new integer and sub-integer variants of MMA (matrix multiply-accumulate) instructions provided

[PATCH] D60279: [CUDA] Implemented _[bi]mma* builtins.

2019-04-04 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 193774. tra edited the summary of this revision. tra added a comment. Cleaned up mma test generation. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60279/new/ https://reviews.llvm.org/D60279 Files: clang/include/clang/Basic/BuiltinsNVPTX.def clang/

[PATCH] D60279: [CUDA] Implemented _[bi]mma* builtins.

2019-04-04 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 193796. tra added a comment. - Fixed minor issues with parameters of the new builtins: - __imma*_st_c_i32 builtins have 'const int * src' - __bmma_m8n8k128_mma_xor_popc_b1 does not have 'satf' argument. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D6

[PATCH] D60279: [CUDA] Implemented _[bi]mma* builtins.

2019-04-04 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 193809. tra added a comment. - Added PTX64 to the list of builtins' constraints. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60279/new/ https://reviews.llvm.org/D60279 Files: clang/include/clang/Basic/BuiltinsNVPTX.def clang/lib/Basic/Targets/NVP

[PATCH] D60220: [CUDA][Windows] Final fix for bug 38811 (Step 3 of 3)

2019-04-05 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Thank you for fixing this! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60220/new/ https://reviews.llvm.org/D60220 ___ cfe-commits mailing li

[PATCH] D60220: [CUDA][Windows] Final fix for bug 38811 (Step 3 of 3)

2019-04-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. One more thing -- perhaps the `long double` declarations should be put under `#ifndef _MSC_VER` in all the files to make the change unobservable on non-windows platforms. Adding a comment why we only have declarations for these functions would also be helpful. CHANGES SI

[PATCH] D60220: [CUDA][Windows] Final fix for bug 38811 (Step 3 of 3)

2019-04-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D60220#1456430 , @emankov wrote: > Oooh, sorry, but I've just pushed the fix. But with the following words: "Add > missing long double device functions' declarations. Provide only declarations > to prevent any use of long double o

[PATCH] D60279: [CUDA] Implemented _[bi]mma* builtins.

2019-04-08 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 194226. tra added a comment. - Converted class to struct+function as Tim suggested. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60279/new/ https://reviews.llvm.org/D60279 Files: clang/include/clang/Basic/BuiltinsNVPTX.def clang/lib/Basic/Targets/

[PATCH] D60620: [HIP] Support -offloading-target-id

2019-04-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: echristo. tra added a comment. It looks like you are solving two problems here. a) you want to create multiple device passes for the same GPU, but with different options. b) you may want to pass different compiler options to different device compilations. The patch effect

[PATCH] D60620: [HIP] Support -offloading-target-id

2019-04-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: arsenm. tra added a comment. @arsenm Matt, FYI, this patch seems to be a continuation of D59863 you've commented on. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60620/new/ https://reviews.llvm.org/D60620 ___

[PATCH] D60818: [CUDA][Windows] restrict long double device functions declarations to Windows

2019-04-17 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Thank you for cleaning this up. Repository: rC Clang CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60818/new/ https://reviews.llvm.org/D60818 ___

[PATCH] D60985: Fix compatability for cuda sm_75

2019-04-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. FYI, I have almost-ready set of patches to implement missing bits of sm_75 support, including this change: https://reviews.llvm.org/D60279 I expect to land them some time this week. Repository: rC Clang CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60985/new/ h

[PATCH] D60279: [CUDA] Implemented _[bi]mma* builtins.

2019-04-25 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rC359248: [CUDA] Implemented _[bi]mma* builtins. (authored by tra, committed by ). Herald added a subscriber: kristina. Herald added a project: clang. Changed prior to commit: https://reviews.llvm.org/D60

[PATCH] D61274: [Sema][AST] Explicit visibility for OpenCL/CUDA kernels/variables

2019-04-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. A kernel functions in CUDA is actually two different functions. One is the real kernel we compile for the GPU, another is a host-side stub that launches the device-side kernel. On device side both clang and nvcc currently silently ignore `hidden` visibility and force the k

[PATCH] D60907: [OpenMP] Add math functions support in OpenMP offloading

2019-04-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. +1 to Hal's comments. @jdoerfert : > I'd even go as far as to argue that __clang_cuda_device_functions.h should > include the internal math.h wrapper to get all math functions. See also the > next comment. I'd argue other way around -- include __clang_cuda_device_function

[PATCH] D60907: [OpenMP] Add math functions support in OpenMP offloading

2019-04-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D60907#1484756 , @jdoerfert wrote: > I actually don't want to preinclude anything and my arguments are (mostly) > for the OpenMP offloading code path not necessarily Cuda. > Maybe to clarify, what I want is: > > 1. Make sure the `

[PATCH] D61396: [hip] Fix ambiguity from `>>>` of CUDA.

2019-05-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM, but I've added @rsmith who is way more familiar with this code. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D61396/new/ https://reviews.llvm.org/D61396 ___ cfe-commits maili

[PATCH] D61458: [hip] Relax CUDA call restriction within `decltype` context.

2019-05-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Perhaps we should allow this in all unevaluated contexts? I.e. `int s = sizeof(foo(x));` should also work. Comment at: clang/include/clang/Sema/Sema.h:10411 + auto I = + std::find_if(ExprEvalContexts.rbegin(), ExprEvalContexts.rend(), +

[PATCH] D61458: [hip] Relax CUDA call restriction within `decltype` context.

2019-05-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D61458#1488550 , @hliao wrote: > In D61458#1488523 , @tra wrote: > > > Perhaps we should allow this in all unevaluated contexts? > > I.e. `int s = sizeof(foo(x));` should also work. > > > g

[PATCH] D61458: [hip] Relax CUDA call restriction within `decltype` context.

2019-05-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Sema/Sema.h:10407-10409 bool IsAllowedCUDACall(const FunctionDecl *Caller, const FunctionDecl *Callee) { +if (llvm::any_of(ExprEvalContexts, One more thing. The idea of th

[PATCH] D61470: [CUDA] Do not pass deprecated option fo fatbinary

2019-05-02 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. Herald added a subscriber: bixia. CUDA 10.1 tools have deprecated some command line options, so fatbinary no longer needs --cuda parameter. https://reviews.llvm.org/D61470 Files: clang/lib/Driver/ToolChains/Cuda.cpp Index: clang/lib/

[PATCH] D61470: [CUDA] Do not pass deprecated option fo fatbinary

2019-05-02 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rC359838: [CUDA] Do not pass deprecated option fo fatbinary (authored by tra, committed by ). Herald added a project: clang. Changed prior to commit: https://reviews.llvm.org/D61470?vs=197876&id=197880#to

[PATCH] D61458: [hip] Relax CUDA call restriction within `decltype` context.

2019-05-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Sema/Sema.h:10407-10409 bool IsAllowedCUDACall(const FunctionDecl *Caller, const FunctionDecl *Callee) { +if (llvm::any_of(ExprEvalContexts, hliao wrote: > tra wrote: > >

[PATCH] D52179: [clang-tidy] Replace redundant checks with an assert().

2018-09-17 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: alexfh, rsmith. Herald added subscribers: bixia, xazax.hun, jlebar, sanjoy. findStyleKind is only called if D is an explicit identifier with a name, so the checks for operators will never return true. The explicit assert() enforces this invariant.

[PATCH] D51808: [CUDA] Ignore uncallable functions when we check for usual deallocators.

2018-09-17 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 165794. tra added a comment. Addressed Richard's comments. Moved clang-tidy changes into separate review https://reviews.llvm.org/D52179. https://reviews.llvm.org/D51808 Files: clang/include/clang/AST/DeclCXX.h clang/include/clang/Sema/Sema.h clang/lib/AS

[PATCH] D52179: [clang-tidy] Replace redundant checks with an assert().

2018-09-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In https://reviews.llvm.org/D52179#1237194, @JonasToth wrote: > Is the condition for this assertion checked beforehand or could this create > runtime failures? It's checked by the (only) caller of the function on line 791: if (const auto *Decl = Result.Nodes.getNodeAs

<    1   2   3   4   5   6   7   8   9   10   >