[PATCH] D62046: [OpenMP][bugfix] Add missing math functions variants for log and abs.

2019-05-17 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. I'd add a comment with a brief explanation for the const variant and a TODO() to remove it. Looks OK to me otherwise. Repository: rC Clang CHANGES SINCE LAST ACTION https://reviews.llvm.org/D

[PATCH] D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables.

2019-05-29 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/test/CodeGenCUDA/device-var-init.cu:39 +static __device__ int d_s_v_i = 1; +// DEVICE: @_ZL7d_s_v_i = internal addrspace(1) global i32 1, + P

[PATCH] D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables.

2019-05-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D62603#1521484 , @hliao wrote: > thanks, but that `static __device__` variable won't have shadow in host > anymore. Why not? Your change only changes whether `externally_initialized` is applied to the variable during device-side

[PATCH] D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables.

2019-05-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Note for the future -- it would be great if we could finish discussing the patch before landing it. I would still like to see the host-side test. Repository: rC Clang CHANGES SINCE LAST ACTION https://reviews.llvm.org/D62603/new/ https://reviews.llvm.org/D62603 __

[PATCH] D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables.

2019-05-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. >> NVCC also allows that: https://godbolt.org/z/t78RvM > > BTW, that code posted looks quite weird to me, how the code could make sense > by return a pointer of device variable? or a pointer of shadow host variable? Magic. :-) More practical example would be something like

[PATCH] D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables.

2019-05-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D62603#1521792 , @hliao wrote: > that should assume that variable is not declared with `static`. that's also > the motivation of this patch. cppreference defines internal linkage as 'The name can be referred to from all scopes i

[PATCH] D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables.

2019-05-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D62603#1521979 , @yaxunl wrote: > > I think `static __device__` globals would fall into the same category -- > > nominally they should not be visible outside of device-side object file, > > but in practice we do need to make them

[PATCH] D55663: [CUDA] Make all host-side shadows of device-side variables undefined.

2018-12-13 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: jlebar, yaxunl. Herald added subscribers: bixia, sanjoy. The host-side code can't (and should not) access the values that may only exist on the device side. E.g. address of a __device__ function does not exist on the host side as we don't generate th

[PATCH] D55663: [CUDA] Make all host-side shadows of device-side variables undef.

2018-12-13 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rL349087: [CUDA] Make all host-side shadows of device-side variables undef. (authored by tra, committed by ). Herald added a subscriber: llvm-commits. Changed prior to commit: https://reviews.llvm.org/D55

[PATCH] D56033: [CUDA] Treat extern global variable shadows same as regular extern vars.

2018-12-21 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. Herald added subscribers: bixia, sanjoy. This fixes compiler crash when we attempted to compile this code: extern __device__ int data; __device__ int data = 1; https://reviews.llvm.org/D56033 Files: clang/lib/CodeGen/CodeGenModule.

[PATCH] D56033: [CUDA] Treat extern global variable shadows same as regular extern vars.

2018-12-21 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 179387. tra added a comment. Fixed a typo. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D56033/new/ https://reviews.llvm.org/D56033 Files: clang/lib/CodeGen/CodeGenModule.cpp clang/test/CodeGenCUDA/device-stub.cu Index: clang/test/CodeGenCUDA/dev

[PATCH] D56033: [CUDA] Treat extern global variable shadows same as regular extern vars.

2018-12-21 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rL349981: [CUDA] Treat extern global variable shadows same as regular extern vars. (authored by tra, committed by ). Herald added a subscriber: llvm-commits. Changed prior to commit: https://reviews.llvm.

[PATCH] D67509: [CUDA][HIP] Diagnose defaulted constructor only if it is used

2019-09-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Example of the actual error produced by clang: https://godbolt.org/z/Dl1FfC Ugh. Another corner case of the way we're dealing with implicit `__host__ __device__` functions. :-( LGTM for postponing the error until actual use. Comment at: test/SemaCUDA/defa

[PATCH] D67414: [AST] Treat "inline gnu_inline" the same way as "extern inline gnu_inline" in C++ mode

2019-09-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/SemaCUDA/gnu-inline.cu:8 void foo(); -inline __attribute__((gnu_inline)) void bar() { foo(); } +inline __attribute__((gnu_inline)) void bar() { foo(); } // expected-warning {{'gnu_inline' attribute without 'extern' in C++ treate

[PATCH] D67730: [CUDA][HIP] Fix typo in `BestViableFunction`

2019-09-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM. You may want to wait a bit for Justin's feedback, in case he has some concerns. Comment at: clang/test/SemaCUDA/function-overload.cu:406 + +// Two irrelevant classes with `operator-` defined. One of them is device only. +struct C1 { int m; }; ---

[PATCH] D67509: [CUDA][HIP] Fix hostness of defaulted constructor

2019-09-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/Sema/SemaCUDA.cpp:273-274 + MemberDecl->hasAttr(); + if (!InClass || hasAttr) +return false; + A comment here would be helpful. I think the intent here is to look for implicit special members with

[PATCH] D67509: [CUDA][HIP] Fix hostness of defaulted constructor

2019-09-19 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: lib/Sema/SemaCUDA.cpp:273-274 + MemberDecl->hasAttr(); + if (!InClass || hasAttr) +return false; + yaxunl wrote: > tra wrote:

[PATCH] D67509: [CUDA][HIP] Fix hostness of defaulted constructor

2019-09-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/Sema/SemaCUDA.cpp:387 + // each inference gets same result and not to add duplicate attributes. + auto addBothAttr = [=]() { +assert(MemberDecl->hasAttr() == `addHDAttrIfNeeded` ? We may not even need it. See below

[PATCH] D67509: [CUDA][HIP] Fix hostness of defaulted constructor

2019-09-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/Sema/SemaCUDA.cpp:386-387 + bool NeedsH = true, NeedsD = true; + bool HasH = MemberDecl->hasAttr(); + bool HasD = MemberDecl->hasAttr(); + Nice. Now these can be moved above `HasExpAttr` and then used in its initializ

[PATCH] D67509: [CUDA][HIP] Fix hostness of defaulted constructor

2019-09-19 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. LGTM. Thank you! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D67509/new/ https://reviews.llvm.org/D67509 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bi

[PATCH] D67509: [CUDA][HIP] Fix hostness of defaulted constructor

2019-09-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Looks like CUDA test-suite is triggering the assertion added by this patch: http://lab.llvm.org:8011/builders/clang-cuda-build/builds/37301/steps/ninja%20build%20simple%20CUDA%20tests/logs/stdio Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D67

[PATCH] D67509: [CUDA][HIP] Fix hostness of defaulted constructor

2019-09-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D67509#1678394 , @yaxunl wrote: > A reduced test case is > > struct A { > A(); > }; > > template > struct B > { > T a; > constexpr B() = default; > }; > > B x; > > > > `B::B()` got implicit `__host__

[PATCH] D67947: [HIP] Support new kernel launching API

2019-09-24 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D67947/new/ https://reviews.llvm.org/D67947 ___ cfe-commits mailing list cfe-commits@lists.

[PATCH] D68031: [CUDA][HIP] Enable kernel function return type deduction.

2019-09-25 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Nice. I'd mention in the commit message that NVCC does not support deduced return type for kernel functions. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm

[PATCH] D68300: [HIP] Add option -fno-hip-link-builtin-bitcode to disable linking device lib

2019-10-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. The functionality looks generic enough. Should it be just `flink_builtin_bitcode`? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D68300/new/ https://reviews.llvm.org/D68300 ___ cfe-commits mailing list cfe-commits@list

[PATCH] D68284: [HIP] Support -emit-llvm for device compilation

2019-10-01 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. LGTM. Does it produce textual IR if used with `-S`? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D68284/new/ https://reviews.llvm.org/D68284 ___ cfe-commits mailing list cfe-commits@lists.ll

[PATCH] D68284: [HIP] Support -emit-llvm for device compilation

2019-10-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Could you also add a check for `-S` w/o `-emit-llvm`, too ? AFAICT it currently wants to produce a bundle, which is not very helpful. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D68284/new/ https://reviews.llvm.org/D68284 ___

[PATCH] D68300: [HIP] Add option -fno-link-builtin-bitcode to disable linking device lib

2019-10-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: include/clang/Driver/Options.td:606 +def flink_builtin_bitcode : Flag<["-"], "flink-builtin-bitcode">, + Flags<[CC1Option]>, HelpText<"Link builtin bitcode for HIP device compilation.">; +def fno_link_builtin_bitcode : Flag<["-"], "fno-lin

[PATCH] D68284: [HIP] Support -emit-llvm for device compilation

2019-10-02 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. LGTM. Thank you for fixing this. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D68284/new/ https://reviews.llvm.org/D68284 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists

[PATCH] D68394: [HIP] Enable specifying different default gpu arch for HIP/CUDA.

2019-10-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/Driver.cpp:2538 +: CudaActionBuilderBase(C, Args, Inputs, Action::OFK_Cuda) { + DefaultCudaArch = CudaArch::SM_20; +} This technically depends on the CUDA version. We do have CUDA version in

[PATCH] D68300: [HIP] Use option -nogpulib to disable linking device lib

2019-10-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: include/clang/Driver/Options.td:606 +def flink_builtin_bitcode : Flag<["-"], "flink-builtin-bitcode">, + Flags<[CC1Option]>, HelpText<"Link builtin bitcode for HIP device compilation.">; +def fno_link_builtin_bitcode : Flag<["-"], "fno-lin

[PATCH] D68394: [HIP] Enable specifying different default gpu arch for HIP/CUDA.

2019-10-03 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Driver/Driver.cpp:2538 +: CudaActionBuilderBase(C, Args, Inputs, Action::OFK_Cuda) { + DefaultCudaArch = CudaArch::SM_20; +}

[PATCH] D45068: [NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma instructions.

2018-04-12 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 142262. tra added a comment. Updated BuiltinsNVPTX.def and tests to deal with changes in the way we deal with required features in TARGET_BUILTIN. https://reviews.llvm.org/D45068 Files: clang/include/clang/Basic/BuiltinsNVPTX.def clang/lib/CodeGen/CGBuilti

[PATCH] D44984: [HIP] Add hip input kind and codegen for kernel launching

2018-04-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/CodeGen/CGCUDANV.cpp:51-52 llvm::Constant *getLaunchFn() const; + std::string addPrefixToName(CodeGenModule &CGM, StringRef FuncName) const; + std::string addUnderscoredPrefixToName(CodeGenModule &CGM, +

[PATCH] D45212: [HIP] Let CUDA toolchain support HIP language mode and amdgpu

2018-04-17 Thread Artem Belevich via Phabricator via cfe-commits
tra requested changes to this revision. tra added a subscriber: jlebar. tra added a comment. This revision now requires changes to proceed. Sorry about the delay. I've been out for few days. Comment at: include/clang/Driver/Options.td:552-553 +def : Joined<["--"], "offload-arch

[PATCH] D45489: [HIP] Add input type for HIP

2018-04-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I'm getting confused about the order of the patches. The patch stack phabricator displays in this patch is different compared to the stack in https://reviews.llvm.org/D44984. Which one should I trust? https://reviews.llvm.org/D45489 _

[PATCH] D45489: [HIP] Add input type for HIP

2018-04-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In https://reviews.llvm.org/D45489#1071044, @yaxunl wrote: > In https://reviews.llvm.org/D45489#1070929, @yaxunl wrote: > > > In https://reviews.llvm.org/D45489#1070470, @tra wrote: > > > > > I'm getting confused about the order of the patches. > > > The patch stack phabric

[PATCH] D45780: [CUDA] added missing __ldg(const signed char *)

2018-04-18 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. Herald added a subscriber: sanjoy. Until now we only had variants for 'char' and 'unsigned char'. In C++ 'char' 'unsigned char' and 'signed char' are three different types and we need overloads for all of them. https://reviews.llvm.org/D

[PATCH] D45780: [CUDA] added missing __ldg(const signed char *)

2018-04-18 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rC330280: [CUDA] added missing __ldg(const signed char *) (authored by tra, committed by ). Changed prior to commit: https://reviews.llvm.org/D45780?vs=142970&id=142974#toc Repository: rC Clang https:

[PATCH] D45223: [CUDA] Set LLVM calling convention for CUDA kernel

2018-04-18 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. AFAICT this is the replacement for https://reviews.llvm.org/D44747. LGTM. https://reviews.llvm.org/D45223 ___ cfe-commits mailing list cfe-commits@list

[PATCH] D45068: [NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma instructions.

2018-04-18 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 143003. tra added a comment. Updated the way we specify TARGET_BUILTIN feature constraints https://reviews.llvm.org/D45068 Files: clang/include/clang/Basic/BuiltinsNVPTX.def clang/lib/CodeGen/CGBuiltin.cpp clang/lib/Driver/ToolChains/Cuda.cpp clang/test

[PATCH] D45068: [NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma instructions.

2018-04-18 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rL330296: [NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma… (authored by tra, committed by ). Changed prior to commit: https://reviews.llvm.org/D45068?vs=143003&id=143005#toc Reposi

[PATCH] D42922: [CUDA] Register relocatable GPU binaries

2018-04-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/CodeGen/CGCUDANV.cpp:283-285 + llvm::FunctionType *RegisterGlobalsFnTy; + llvm::FunctionType *RegisterLinkedBinaryFnTy; + llvm::Function *DummyCallback; Instead of tracking these through the conditionals of pretty lon

[PATCH] D44435: CUDA ctor/dtor Module-Unique Symbol Name

2018-04-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Could you please resubmit your patch with complete context? https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface https://reviews.llvm.org/D44435 ___ cfe-commits mailing list cfe-commits@lists.llvm.or

[PATCH] D45827: [CUDA] Enable CUDA compilation with CUDA-9.2

2018-04-19 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. Herald added a subscriber: sanjoy. https://reviews.llvm.org/D45827 Files: clang/include/clang/Basic/Cuda.h clang/lib/Basic/Cuda.cpp clang/lib/Driver/ToolChains/Cuda.cpp clang/lib/Headers/__clang_cuda_runtime_wrapper.h Index: clang

[PATCH] D44435: CUDA ctor/dtor Module-Unique Symbol Name

2018-04-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/CodeGen/CGCUDANV.cpp:287 +CtorSuffix.append("_"); +CtorSuffix.append(ModuleName); + } There is a general problem with this approach. File name can contain the characters that PTX does not allow. We currently on

[PATCH] D42922: [CUDA] Register relocatable GPU binaries

2018-04-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/CodeGen/CGCUDANV.cpp:364-377 + llvm::Constant *NVModuleIDConstant; + SmallString<64> NVModuleID; + if (RelocatableDeviceCode) { +// Generate a unique module ID. +llvm::raw_svector_ostream OS(NVModuleID); +OS << "__nv_" <<

[PATCH] D44435: CUDA ctor/dtor Module-Unique Symbol Name

2018-04-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/CodeGen/CGCUDANV.cpp:287 +CtorSuffix.append("_"); +CtorSuffix.append(ModuleName); + } SimeonEhrig wrote: > tra wrote: > > There is a general problem with this approach. File name can contain the > > characters

[PATCH] D44984: [HIP] Add hip input kind and codegen for kernel launching

2018-04-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/CodeGen/CGCUDANV.cpp:51-52 llvm::Constant *getLaunchFn() const; + std::string addPrefixToName(CodeGenModule &CGM, StringRef FuncName) const; + std::string addUnderscoredPrefixToName(CodeGenModule &CGM, +

[PATCH] D45827: [CUDA] Enable CUDA compilation with CUDA-9.2

2018-04-24 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rL330753: [CUDA] Enable CUDA compilation with CUDA-9.2 (authored by tra, committed by ). Herald added subscribers: llvm-commits, bixia. Changed prior to commit: https://reviews.llvm.org/D45827?vs=143124&i

[PATCH] D44984: [HIP] Add hip input kind and codegen for kernel launching

2018-04-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: test/CodeGenCUDA/device-stub.cu:2-8 +// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \ +// RUN: -fcuda-include-gpubinary %t -o - | FileCheck %s +// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \ +// RUN: -fcuda-includ

[PATCH] D46148: [CUDA] Added -f[no-]cuda-short-ptr option

2018-04-26 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. Herald added subscribers: bixia, sanjoy, jholewinski. The option enables use of 32-bit pointers for accessing const/local/shared memory. The feature is disabled by default. https://reviews.llvm.org/D46148 Files: clang/include/clang/Driv

[PATCH] D46148: [CUDA] Added -f[no-]cuda-short-ptr option

2018-04-26 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 144216. tra added a comment. Removed debug printout. https://reviews.llvm.org/D46148 Files: clang/include/clang/Driver/Options.td clang/lib/Basic/Targets/NVPTX.cpp clang/lib/Driver/ToolChains/Cuda.cpp Index: clang/lib/Driver/ToolChains/Cuda.cpp

[PATCH] D46148: [CUDA] Added -f[no-]cuda-short-ptr option

2018-04-27 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 144419. tra added a subscriber: arsenm. tra added a comment. Do not use subtarget feature to control codegen of short pointers. Instead we need to add a field to TargetOptions which is the only way to pass this info to NVPTXTargetInfo constructor where we calcula

[PATCH] D44435: CUDA ctor/dtor Module-Unique Symbol Name

2018-05-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Perhaps we should take a step back and consider whether this is the right approach to solve your problem. If I understand it correctly, the real issue is that you repeatedly recompile the same module and cling will only use the function from the first module it's seen it i

[PATCH] D48613: [CUDA] Use atexit() to call module destructor.

2018-06-26 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. Herald added subscribers: bixia, sanjoy. This matches the way NVCC does it. Doing module cleanup at global destructor phase used to work, but is, apparently, too late for the CUDA runtime in CUDA-9.2, which ends up crashing with double-free.

[PATCH] D48615: [CUDA] Place all CUDA sections in __NV_CUDA segment on Mac.

2018-06-26 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: Hahnfeld. Herald added subscribers: bixia, jlebar, sanjoy. That's where CUDA SDK binaries appear to put them on Mac. https://reviews.llvm.org/D48615 Files: clang/lib/CodeGen/CGCUDANV.cpp Index: clang/lib/CodeGen/CGCUDANV.cpp =

[PATCH] D48613: [CUDA] Use atexit() to call module destructor.

2018-06-27 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 153138. tra added a comment. Make destructor registration conditional -- we generate no destructor with -fcuda-rdc. Updated the test to reflect changes in destructor registration. https://reviews.llvm.org/D48613 Files: clang/lib/CodeGen/CGCUDANV.cpp clang/

[PATCH] D48613: [CUDA] Use atexit() to call module destructor.

2018-06-27 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rC335763: [CUDA] Use atexit() to call module destructor. (authored by tra, committed by ). Changed prior to commit: https://reviews.llvm.org/D48613?vs=153138&id=153144#toc Repository: rC Clang https:/

[PATCH] D48615: [CUDA] Place all CUDA sections in __NV_CUDA segment on Mac.

2018-06-28 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rL335880: [CUDA] Place all CUDA sections in __NV_CUDA segment on Mac. (authored by tra, committed by ). Herald added a subscriber: llvm-commits. Changed prior to commit: https://reviews.llvm.org/D48615?vs

[PATCH] D49083: [HIP] Register/unregister device fat binary only once

2018-07-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > HIP generates one fat binary for all devices after linking. However, for each > compilation > unit a ctor function is emitted which register the same fat binary. > Measures need to be taken to make sure the fat binary is only registered > once. Are you saying that for

[PATCH] D49274: [CUDA] Provide integer SIMD functions for CUDA-9.2

2018-07-12 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: jlebar, bkramer. Herald added subscribers: bixia, sanjoy. CUDA-9.2 made all integer SIMD functions into compiler builtins, so clang no longer has access to the implementation of these functions in either headers of libdevice and has to provide its ow

[PATCH] D49274: [CUDA] Provide integer SIMD functions for CUDA-9.2

2018-07-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I'm in the middle of writing the tests for these as it's very easy to mess things up. I'll update the patch once I run it through the tests. Another problem with the patch in the current form is that these instructions apparently do not accept immediate arguments. PTX is a

[PATCH] D68578: [HIP] Fix device stub name

2019-10-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Could you elaborate on how exactly current implementation does not work? I would expect the kernel and the stub to be two distinct entities, as far as debugger is concerned. It does have enough information to track each independently (different address, .stub suffix, perhap

[PATCH] D68578: [HIP] Fix device stub name

2019-10-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D68578#1697851 , @yaxunl wrote: > In D68578#1697822 , @tra wrote: > > > Could you elaborate on how exactly current implementation does not work? > > > > I would expect the kernel and the stub

[PATCH] D68587: [hip] Assume host-only compilation if the final phase is ahead of `backend`.

2019-10-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I'm fine with this for -E/-M, I would argue that with `-fsyntax-only` we want to know whether our source code, which is common for all sub-compilations, has syntactic errors. The way we compile HIP & CUDA sources, some of the errors will only be reported on one side of the

[PATCH] D68587: [hip] Assume host-only compilation if the final phase is ahead of `backend`.

2019-10-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D68587#1698102 , @hliao wrote: > for most compilation tools, single input and single output are expected. > Without assuming `-fsyntax-only` alone is host-compilation only, that at > least run syntax checking twice. I believe th

[PATCH] D68652: [driver][hip] Skip bundler if host action is nothing.

2019-10-08 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/test/Driver/hip-syntax-only.hip:7 + +// CHECK-DAG: clang{{.*}}" "-cc1" {{.*}} "-fcuda-is-device" +// CHECK-DAG: clang{{.*}}" "-cc1" "-triple" "x86_64" ---

[PATCH] D68587: [hip] Assume host-only compilation if the final phase is ahead of `backend`.

2019-10-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. TL; DR; +1 to formalizing how we want -M*/-E/-S/-emit-llvm/-fsyntax-only to behave. OK with -M/-E/-S defaulting to host, and erroring out if applied to multiple sub-compilations. I'm still convinced that the tooling issue with multiple subcompilations is orthogonal to this

[PATCH] D68660: [tooling] Teach Tooling to understand compilation with offloading.

2019-10-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a reviewer: klimek. tra added a comment. Added Manuel as someone familiar with tooling. Comment at: clang/lib/Tooling/Tooling.cpp:117 // The one job we find should be to invoke clang again. const auto &Cmd = cast(*Jobs.begin()); if (StringRef(Cmd.getCreator()

[PATCH] D68663: [clang-offload-bundler] Support `.cui` and `.d`.

2019-10-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Maybe, add a TODO to eventually remove `.d` . `.cui` should probably remain as it's yet another variant of preprocessed output that we allow bundling for C/C++. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D68663/new/ https:/

[PATCH] D68652: [driver][hip] Skip bundler if host action is nothing.

2019-10-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/hip-syntax-only.hip:7 + +// CHECK-DAG: clang{{.*}}" "-cc1" {{.*}} "-fcuda-is-device" +// CHECK-DAG: clang{{.*}}" "-cc1" "-triple" "x86_64" hliao wrote: > tra wrote: > > I'd include `-target ` and a comment

[PATCH] D68665: [HIP] Fix -save-temps

2019-10-08 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: lib/Driver/Driver.cpp:4400 + // HIP image for device compilation with -fno-gpu-rdc is per compilation + // unit. + bool IsHIPNoRDC = JA.getOffloading

[PATCH] D68578: [HIP] Fix device stub name

2019-10-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D68578#1698864 , @t-tye wrote: > From a source language point of view, the device function comprises the code > that is launched as a grid. We need this fact to be present in the symbols > used. Only the device function should hav

[PATCH] D68660: [tooling] Teach Tooling to understand compilation with offloading.

2019-10-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Tooling/Tooling.cpp:117 // The one job we find should be to invoke clang again. const auto &Cmd = cast(*Jobs.begin()); if (StringRef(Cmd.getCreator().getName()) != "clang") { hliao wrote: > tra wrote: > >

[PATCH] D68660: [tooling] Teach Tooling to understand compilation with offloading.

2019-10-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Tooling/Tooling.cpp:117 // The one job we find should be to invoke clang again. const auto &Cmd = cast(*Jobs.begin()); if (StringRef(Cmd.getCreator().getName()) != "clang") { hliao wrote: > tra wrote: > >

[PATCH] D68753: [CUDA][HIP} Add a test for constexpr default ctor

2019-10-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: test/SemaCUDA/constexpr-ctor.cu:14-27 +template struct B { + T a; + constexpr B() = default; +}; + +template struct C { + T a; Do we really need three identical templates? If they are needed to let compiler emit multip

[PATCH] D68660: [tooling] Teach Tooling to understand compilation with offloading.

2019-10-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Tooling/Tooling.cpp:102 + if (isa(A)) { +assert(Actions.size() == 2); +assert(isa(Actions.front())); Why 2? Will it not be different if user targeted multiple GPUs? Comment

[PATCH] D68823: Fix help message for -ffp-contract

2019-10-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: include/clang/Driver/Options.td:1148 + " | on (according to FP_CONTRACT pragma) | off (never fuse). Default" + " is 'fast' for CUDA/HIP and 'on' otherwise.">, Values<"fast,on,off">; Could you point me where we set it? I'

[PATCH] D68753: [CUDA][HIP} Add a test for constexpr default ctor

2019-10-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: test/SemaCUDA/constexpr-ctor.cu:14-27 +template struct B { + T a; + constexpr B() = default; +}; + +template struct C { + T a; yaxunl wrote: > tra wrote: > > Do we really need three identical templates? If they are need

[PATCH] D68823: Fix help message for -ffp-contract

2019-10-10 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: include/clang/Driver/Options.td:1148 + " | on (according to FP_CONTRACT pragma) | off (never fuse). Default" + " is 'fast' for CUDA/HIP and 'on' otherwise.">, Val

[PATCH] D68753: [CUDA][HIP} Add a test for constexpr default ctor

2019-10-10 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Sometimes I wish it would be possible to specify some of the -verify diagnostics in temporal order, as opposed to tying them to locations. I.e. in this case it would be way more useful to see

[PATCH] D68818: [hip][cuda] Fix the extended lambda name mangling issue.

2019-10-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. @rsmith Richard, could you take a look, please? Lambdas, mangling, ODR rules & ABI scare me. :-) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D68818/new/ https://reviews.llvm.org/D68818 _

[PATCH] D69124: [clang][driver] Print compilation phases with indentation.

2019-10-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Could you give an example of before/after output? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D69124/new/ https://reviews.llvm.org/D69124 ___ cfe-commits mailing list cfe-commits@

[PATCH] D69124: [clang][driver] Print compilation phases with indentation.

2019-10-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. This is... rather oddly-structured output. My brain refuses to accept that the most-indented phase is the input. Perhaps we should do `llvm::errs().indent(MaxIdent-Ident)`. This should give us something like this (withMaxIdent=9), which is somewhat easier to grok, IMO:

[PATCH] D69124: [clang][driver] Print compilation phases with indentation.

2019-10-18 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Neat. I like the visual cues showing what gets passed on to the next processing stage. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D69124/new/ https

[PATCH] D69268: [HIP] Add option -fgpu-allow-device-init

2019-10-21 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Nice. I wish we could do that for CUDA. Comment at: lib/Frontend/CompilerInvocation.cpp:2530 Opts.GPURelocatableDeviceCode = Args.hasArg(OPT_fgpu_rdc); + Opts.GPUAllowDeviceIni

[PATCH] D69268: [HIP] Add option -fgpu-allow-device-init

2019-10-22 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. Thank you for adding the warning. One small nit about the name. LGTM otherwise. Comment at: include/clang/Basic/DiagnosticCommonKinds.td:308 +// HIP +def warn_ignore_hip_only_option : Warning< + "'%0' is ignored since it is only

[PATCH] D69389: [hip] Allow the declaration of functions with variadic arguments in HIP.

2019-10-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/TargetInfo.cpp:7764 + QualType Ty) const { + llvm_unreachable("AMDGPU does not support varargs"); +} llvm_unreachable() should be used to indicate an error in compiler's own

[PATCH] D69389: [hip] Allow the declaration of functions with variadic arguments in HIP.

2019-10-24 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Perhaps we should rename `-fcuda-allow-variadic-functions` to `-fgpu-allow-variadic-functions` after this patch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.l

[PATCH] D60220: [CUDA][Windows] Final fix for bug 38811 (Step 3 of 3)

2019-10-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D60220#1723458 , @emankov wrote: > In D60220#1723350 , @6yearold wrote: > > > I'm seeing quite similar errors on FreeBSD with Clang 8 and 9: > > Any idea how to fix this? > > > It looks like

[PATCH] D69493: Add -fconvergent-functions flag

2019-10-28 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. > The CUDA builtin library is apparently compiled in C++ mode, so the > assumption of convergent needs to be made in a typically non-SPMD > language. I think the key here is that we sometime

[PATCH] D69322: [hip][cuda] Enable extended lambda support on Windows.

2019-10-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a reviewer: rnk. tra added a subscriber: rnk. tra added a comment. @rnk Reid, would you be the right person to look at the change on the Windows' side? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D69322/new/ https://reviews.llvm.org/D69

[PATCH] D60220: [CUDA][Windows] Final fix for bug 38811 (Step 3 of 3)

2019-10-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D60220#1725633 , @dim wrote: > Hm, I would really say that `__isnan` and the other `__` prefixed functions > are Linuxisms, or more accurately, glibc-isms. They also don't exist on e.g. > macOS: > > Why can't the regular `isnan`

[PATCH] D80237: [hip] Ensure pointer in struct argument has proper `addrspacecast`.

2020-05-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CGCall.cpp:1339 + } + for (uint64_t i = 0, e = SrcATy->getNumElements(); i < e; ++i) { +Address EltPtr = CGF.Builder.CreateConstArrayGEP(Dest, i); Is there a limit on array size? We may en

[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-05-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Basic/HIP.cpp:16 +const llvm::SmallVector +getAllPossibleTargetIdFeatures(llvm::StringRef Device) { + llvm::SmallVector Ret; Nit: there's an unfortunate clash with already [[ https://github.com/llvm/llvm-project/

[PATCH] D71726: Let clang atomic builtins fetch add/sub support floating point types

2020-05-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/TargetInfo.h:1418 + /// Whether floating point atomic fetch add/sub is supported. + virtual bool isFPAtomicFetchAddSubSupported() const { return false; } + I think it should be predicated on speci

[PATCH] D71726: Let clang atomic builtins fetch add/sub support floating point types

2020-05-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/TargetInfo.h:1418 + /// Whether floating point atomic fetch add/sub is supported. + virtual bool isFPAtomicFetchAddSubSupported() const { return false; } + yaxunl wrote: > tra wrote: > > I think i

[PATCH] D80464: [CUDA] Missing __syncthreads intrinsic in __clang_cuda_device_functions.h

2020-05-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. `__syncthreads` is clang's built-in and as such should not be in any header file: https://github.com/llvm/llvm-project/blob/master/clang/include/clang/Basic/BuiltinsNVPTX.def#L406 My guess is that the root cause of this problem is that source parsing is done using C++, not

[PATCH] D80450: [CUDA][HIP] Fix implicit HD function resolution

2020-05-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Is this patch supposed to be used with D79526 or instead of it? Comment at: clang/test/SemaCUDA/function-overload.cu:551 + struct a { +__attribute__((device)) a(short); +__attribute__((device)) operator unsigned()

<    1   2   3   4   5   6   7   8   9   10   >