[PATCH] D42452: [CUDA] Disable PGO and coverage instrumentation in NVPTX.

2018-01-24 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rL323345: [CUDA] Disable PGO and coverage instrumentation in NVPTX. (authored by tra, committed by ). Herald added a subscriber: llvm-commits. Changed prior to commit: https://reviews.llvm.org/D42452?vs=1

[PATCH] D42513: [CUDA] Added partial support for CUDA-9.1

2018-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: jlebar, jholewinski. Herald added subscribers: hintonda, mgorny, sanjoy. Clang can use CUDA-9.1 now, though new builtins (__hmma_m32n8k16*) are not implemented yet. The major change is that headers in CUDA-9.1 went through substantial changes that

[PATCH] D42513: [CUDA] Added partial support for CUDA-9.1

2018-01-25 Thread Artem Belevich via Phabricator via cfe-commits
tra marked 2 inline comments as done. tra added inline comments. Comment at: clang/lib/Headers/__clang_cuda_device_functions.h:32 + +#define __DEVICE__ static __device__ __forceinline__ +// There are number of functions that became compiler builtins in CUDA-9 and are ---

[PATCH] D42513: [CUDA] Added partial support for CUDA-9.1

2018-01-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Headers/__clang_cuda_device_functions.h:32 + +#define __DEVICE__ static __device__ __forceinline__ +// There are number of functions that became compiler builtins in CUDA-9 and are jlebar wrote: > tra wrote: > > j

[PATCH] D42581: [NVPTX] Emit debug info in DWARF-2 by default for Cuda devices.

2018-01-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/Driver/ToolChains/Cuda.cpp:353 CmdArgs.push_back(Args.MakeArgString(Output.getFilename())); + if (mustEmitDebugInfo(Args) && Args.hasArg(options::OPT_g_Flag)) +CmdArgs.push_back("-g"); There's more than one -g op

[PATCH] D42581: [NVPTX] Emit debug info in DWARF-2 by default for Cuda devices.

2018-01-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/Driver/ToolChains/Cuda.cpp:353 CmdArgs.push_back(Args.MakeArgString(Output.getFilename())); + if (mustEmitDebugInfo(Args) && Args.hasArg(options::OPT_g_Flag)) +CmdArgs.push_back("-g"); ABataev wrote: > tra wrote:

[PATCH] D42581: [NVPTX] Emit debug info in DWARF-2 by default for Cuda devices.

2018-01-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/Driver/ToolChains/Cuda.cpp:353 CmdArgs.push_back(Args.MakeArgString(Output.getFilename())); + if (mustEmitDebugInfo(Args) && Args.hasArg(options::OPT_g_Flag)) +CmdArgs.push_back("-g"); ABataev wrote: > tra wrote:

[PATCH] D42513: [CUDA] Added partial support for CUDA-9.1

2018-01-26 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 131650. tra added a comment. Addressed Justin's comments. https://reviews.llvm.org/D42513 Files: clang/include/clang/Basic/Cuda.h clang/lib/Basic/Cuda.cpp clang/lib/Basic/Targets/NVPTX.cpp clang/lib/Driver/ToolChains/Cuda.cpp clang/lib/Headers/CMakeLi

[PATCH] D42642: [CUDA] Detect installation in PATH

2018-01-29 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Some linux distributions integrate CUDA into the standard directory structure. I.e. binaries go into /usr/bin, headers into /usr/include, bitcode goes somewhere else, etc. ptxas will be found, but w

[PATCH] D42642: [CUDA] Detect installation in PATH

2018-01-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In https://reviews.llvm.org/D42642#991127, @Hahnfeld wrote: > In https://reviews.llvm.org/D42642#990976, @tra wrote: > > > Some linux distributions integrate CUDA into the standard directory > > structure. I.e. binaries go into /usr/bin, headers into /usr/include, > > bitco

[PATCH] D42513: [CUDA] Added partial support for CUDA-9.1

2018-01-29 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 131894. tra added a comment. Rebased to HEAD. https://reviews.llvm.org/D42513 Files: clang/include/clang/Basic/Cuda.h clang/lib/Basic/Cuda.cpp clang/lib/Basic/Targets/NVPTX.cpp clang/lib/Driver/ToolChains/Cuda.cpp clang/lib/Headers/CMakeLists.txt c

[PATCH] D42513: [CUDA] Added partial support for CUDA-9.1

2018-01-29 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rC323713: [CUDA] Added partial support for CUDA-9.1 (authored by tra, committed by ). Changed prior to commit: https://reviews.llvm.org/D42513?vs=131894&id=131895#toc Repository: rL LLVM https://revie

[PATCH] D42513: [CUDA] Added partial support for CUDA-9.1

2018-01-29 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rL323713: [CUDA] Added partial support for CUDA-9.1 (authored by tra, committed by ). Herald added a subscriber: llvm-commits. Changed prior to commit: https://reviews.llvm.org/D42513?vs=131894&id=131896#

[PATCH] D42642: [CUDA] Detect installation in PATH

2018-01-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/Driver/ToolChains/Cuda.cpp:206 // -nocudalib hasn't been specified. -if (LibDeviceMap.empty() && !Args.hasArg(options::OPT_nocudalib)) +if (CheckLibDevice && LibDeviceMap.empty()) continue; I think th

[PATCH] D42642: [CUDA] Detect installation in PATH

2018-01-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I've thought a bit more about this and there's another quirk -- symlinks. What if we've found /usr/bin/ptxas and is a symlink pointing to the real ptxas in the CUDA installation? If we add /usr to the list of candidates it will not help us at all. We should probably find th

[PATCH] D42581: [NVPTX] Emit debug info in DWARF-2 by default for Cuda devices.

2018-01-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/Driver/ToolChains/Cuda.cpp:436-437 assert(Output.isNothing() && "Invalid output."); - if (Args.hasArg(options::OPT_g_Flag)) + if (mustEmitDebugInfo(Args) == FullDebug) CmdArgs.push_back("-g"); Do we need to

[PATCH] D42642: [CUDA] Detect installation in PATH

2018-01-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/Driver/ToolChains/Cuda.cpp:96-105 + if (llvm::ErrorOr ptxas = + llvm::sys::findProgramByName("ptxas")) { +SmallString<256> ptxasAbsolutePath; +llvm::sys::fs::real_path(*ptxas, ptxasAbsolutePath); + +

[PATCH] D42642: [CUDA] Detect installation in PATH

2018-01-30 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. LGTM. Comment at: lib/Driver/ToolChains/Cuda.cpp:96-105 + if (llvm::ErrorOr ptxas = + llvm::sys::findProgramByName("ptxas")) { +SmallString<256> ptxasAbsolutePath; +llvm::sys::fs::real_path(*ptxa

[PATCH] D42800: Let CUDA toolchain support amdgpu target

2018-02-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I don't have enough knowledge about compute on AMD's GPU and would appreciate if you could share your thoughts on how you think CUDA on AMD should work. Is there a good document describing how compute currently works (how do I launch a kernel using rough equivalent of nvidi

[PATCH] D25796: [CUDA] Create __host__ and device variants of standard allocator declarations.

2016-12-06 Thread Artem Belevich via Phabricator via cfe-commits
tra closed this revision. tra added a comment. Landed in r284879. https://reviews.llvm.org/D25796 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D25809: [CUDA] Improved target attribute-based overloading.

2016-12-06 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 80522. tra added a comment. Removed HD overloading checks for using declarations. 'using' exposes number of issues with the way we handle overloading of HD functions vs H/D. The issues will be addressed in a separate patch. https://reviews.llvm.org/D25809 File

[PATCH] D25809: [CUDA] Improved target attribute-based overloading.

2016-12-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In https://reviews.llvm.org/D25809#615485, @jlebar wrote: > If you would like me to have another look at this, is it possible to make an > interdiff of your changes between this and the last version I reviewed? > phab's interdiff is useless because it straddles a rebase.

[PATCH] D25809: [CUDA] Improved target attribute-based overloading.

2016-12-07 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rL288962: [CUDA] Improve target attribute checking for function templates. (authored by tra). Changed prior to commit: https://reviews.llvm.org/D25809?vs=80522&id=80635#toc Repository: rL LLVM https:/

[PATCH] D25845: [CUDA] Ignore implicit target attributes during function template instantiation.

2016-12-07 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 80677. tra marked 4 inline comments as done. tra added a comment. Addressed Justin's comments. https://reviews.llvm.org/D25845 Files: include/clang/Sema/Sema.h lib/Sema/SemaCUDA.cpp lib/Sema/SemaDecl.cpp lib/Sema/SemaTemplate.cpp test/SemaCUDA/functio

[PATCH] D25845: [CUDA] Ignore implicit target attributes during function template instantiation.

2016-12-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: lib/Sema/SemaCUDA.cpp:99 + if (!D->hasAttrs()) +return false; + for (Attr *Attribute : D->getAttrs()) { jlebar wrote: > Is this early return necessary? Yes. Otherwise D->getAttrs() will trigger assert(hasAttrs) if we d

[PATCH] D25845: [CUDA] Ignore implicit target attributes during function template instantiation.

2016-12-08 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 80783. tra marked 3 inline comments as done. tra added a comment. Fixed comments. https://reviews.llvm.org/D25845 Files: include/clang/Sema/Sema.h lib/Sema/SemaCUDA.cpp lib/Sema/SemaDecl.cpp lib/Sema/SemaTemplate.cpp test/SemaCUDA/function-template-ov

[PATCH] D25845: [CUDA] Ignore implicit target attributes during function template instantiation.

2016-12-08 Thread Artem Belevich via Phabricator via cfe-commits
tra marked an inline comment as done. tra added inline comments. Comment at: lib/Sema/SemaTemplate.cpp:7048 // target attributes into account, we perform target match check // here and reject candidates that have different target. if (LangOpts.CUDA && ---

[PATCH] D25845: [CUDA] Ignore implicit target attributes during function template instantiation.

2016-12-08 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. tra marked an inline comment as done. Closed by commit rL289091: [CUDA] Ignore implicit target attributes during function template instantiation. (authored by tra). Changed prior to commit: https://reviews.llvm.org/D25845

[PATCH] D27631: [CUDA,Driver] Added --no-cuda-gpu-arch= option.

2016-12-09 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: jlebar, echristo. tra added a subscriber: cfe-commits. Herald added a subscriber: mehdi_amini. This allows us to negate preceding --cuda-gpu-arch=X. This comes handy when user needs to override default flags set for them by the build system. https

[PATCH] D27631: [CUDA,Driver] Added --no-cuda-gpu-arch= option.

2016-12-09 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 80950. tra added a comment. Removed sorting and extraneous empty lines. https://reviews.llvm.org/D27631 Files: include/clang/Driver/Options.td lib/Driver/Driver.cpp test/Driver/cuda-options.cu Index: test/Driver/cuda-options.cu ==

[PATCH] D27631: [CUDA,Driver] Added --no-cuda-gpu-arch= option.

2016-12-09 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rL289287: [CUDA,Driver] Added --no-cuda-gpu-arch= option. (authored by tra). Changed prior to commit: https://reviews.llvm.org/D27631?vs=80950&id=80961#toc Repository: rL LLVM https://reviews.llvm.org

[PATCH] D117137: [Driver] Add CUDA support for --offline param

2022-01-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM in general, modulo few nits. Nit: looks like the changes need some clang-formatting. Comment at: clang/lib/Driver/Driver.cpp:112 default: - D.Diag(diag::err_drv_only_one_offload_target_supported_in) << "HIP"; + D.Diag(diag::err_drv_only_

[PATCH] D117887: [NVPTX] Expose float tys min, max, abs, neg as builtins

2022-01-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Looks good overall. Please do check that the generated PTX does get assembled by ptxas. There are few newer variants of these instructions that appear to be missing. E.g. `{min/max}.xorsign.abs`. If you only intended to add instructions available in PTX-7.0, which, based o

[PATCH] D118023: Corrected fragment size for tf32 LD B matrix.

2022-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Should I commit the patch on your behalf? In D118023#3265601 , @JackAKirk wrote: > Note that the test, llvm/test/CodeGen/NVPTX/wmma.py line 210, had t

[PATCH] D118023: Corrected fragment size for tf32 LD B matrix.

2022-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. If you are not sure if you have commit access, you probably do not. It must be explicitly granted: https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access As for landing the patch, I do it manually with git. Never tried it with `arc`, so I can't say much, except

[PATCH] D118023: Corrected fragment size for tf32 LD B matrix.

2022-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. The patch uses a `@gmail.com` email. Should I change it to `JackAKirk ` to match the sign-off email? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D118023/new/ https://reviews.llvm.org/D118023 _

[PATCH] D117137: [Driver] Add CUDA support for --offload param

2022-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/Driver.cpp:153-155 + if (TT->getArch() == llvm::Triple::spirv64 && + TT->getVendor() == llvm::Triple::UnknownVendor && + TT->getOS() == llvm::Triple::UnknownOS) What's expected to happen if someon

[PATCH] D117137: [Driver] Add CUDA support for --offload param

2022-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: linjamaki. tra added inline comments. Comment at: clang/lib/Driver/Driver.cpp:153-155 + if (TT->getArch() == llvm::Triple::spirv64 && + TT->getVendor() == llvm::Triple::UnknownVendor && + TT->getOS() == llvm::Triple::UnknownOS)

[PATCH] D118084: [CUDA, NVPTX] Pass byval aggregates directly

2022-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: jdoerfert, yaxunl. Herald added subscribers: asavonic, bixia. tra requested review of this revision. Herald added a project: clang. Changes the NVPTX ABI to pass aggregates directly. Only clang-generated IR is affected. The change does not affect AB

[PATCH] D117137: [Driver] Add CUDA support for --offload param

2022-01-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D117137#3268548 , @linjamaki wrote: > SPIR-V target requires that the OS and the environment type is unknown (see > TargetInfo::AllocateTarget and BaseSPIRTargetInfo). The problem is that LLVM's triple parser will set `UnknownVen

[PATCH] D118153: [CUDA][HIP] Do not treat host var address as constant in device compilation

2022-01-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM. Do we need to do anything special about `__managed__` vars? Comment at: clang/lib/AST/ExprConstant.cpp:2224 + Info.getCtx().getLangOpts().CUDAIsDevice) { +if (!Var->hasAttr() && +!Var->hasAttr() && Nit: N

[PATCH] D118023: Corrected fragment size for tf32 LD B matrix.

2022-01-25 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG0ad19a833177: [CUDA,NVPTX] Corrected fragment size for tf32 LD B matrix. (authored by JackAKirk, committed by tra). Repository: rG LLVM Github Mon

[PATCH] D118084: [CUDA, NVPTX] Pass byval aggregates directly

2022-01-25 Thread Artem Belevich via Phabricator via cfe-commits
tra planned changes to this revision. tra added a comment. Getting rid of byval helps getting rid of locals in quite a few places, but runs into a new problem. 😕 Looks like this change does have unexpected side-effects. When we need to dynamically index into a struct passed directly, there's no

[PATCH] D118084: [CUDA, NVPTX] Pass byval aggregates directly

2022-01-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D118084#3271073 , @jdoerfert wrote: > @lebedev.ri wanted to teach SROA how to deal with dynamic indices before, > IIRC. It seems to be generally useful. Interesting. I'd like to hear more. > This patch can wait till then? Yes.

[PATCH] D117137: [Driver] Add CUDA support for --offload param

2022-01-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D117137#3269365 , @yaxunl wrote: > Does that mean only "spirv{64}-unknown-unknown" is acceptable, or > "spirv{64}-amd-unknown-unknown" is also acceptable? My point is that `unknown` part of the triple is a catch-all for `anything

[PATCH] D118084: [CUDA, NVPTX] Pass byval aggregates directly

2022-01-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D118084#3272154 , @lebedev.ri wrote: > My last idea was about allowing splitting > > struct { > int a; > int b[2]; > } a; > > into > > // not in a struct anymore! > int a; > int b[2] This looks like it's a somew

[PATCH] D118153: [CUDA][HIP] Do not treat host var address as constant in device compilation

2022-01-27 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: rsmith. tra added a comment. @rsmith -- is there anything else we need to worry about when it comes to treating pointers as constant values (or not)? Comment at: clang/lib/AST/ExprConstant.cpp:2227 +!Var->hasAttr() && +!Var->hasA

[PATCH] D117887: [NVPTX] Expose float tys min, max, abs, neg as builtins

2022-02-02 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. In D117887#3289481 , @jchlanda wrote: > `ptxas` is happy with asm generated from both `math-intrins-sm86-ptx72.ll` > and `math-intrins-sm80-ptx70.ll` Thank

[PATCH] D120298: [HIP] Support `--default-stream`

2022-02-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:962 NegFlag>; +def default_stream_EQ : Joined<["--"], "default-stream=">, + HelpText<"Specify default stream. Valid values are 'legacy' and 'per-thread'. The default value is 'legacy'. (HIP only)">,

[PATCH] D120132: [HIP] Fix HIP include path

2022-02-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:531-532 + DriverArgs.hasArg(options::OPT_nostdlibinc)) { +CC1Args.push_back("-internal-isystem"); +CC1Args.push_back(HipIncludePath); + } My impression, after reading the

[PATCH] D120132: [HIP] Fix HIP include path

2022-02-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:531-532 + DriverArgs.hasArg(options::OPT_nostdlibinc)) { +CC1Args.push_back("-internal-isystem"); +CC1Args.push_back(HipIncludePath); + } yaxunl wrote: > tra wrote: > > My

[PATCH] D120298: [HIP] Support `-fgpu-default-stream`

2022-02-23 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with a minor nit. > Also -DHIP_API_PER_THREAD_DEFAULT_STREAM is passed to clang -cc1 to enable > other per-thread stream You may want to rephrase patch description it a bit to match the latest

[PATCH] D117887: [NVPTX] Expose float tys min, max, abs, neg as builtins

2022-02-23 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGe0dc4ac28f00: [NVPTX] Expose float tys min, max, abs, neg as builtins (authored by jchlanda, committed by tra). Changed prior to commit: https://r

[PATCH] D118977: [NVPTX] Add more FMA intriniscs/builtins

2022-02-23 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGbe672934ff88: [NVPTX] Add more FMA intriniscs/builtins (authored by jchlanda, committed by tra). Changed prior to commit: https://reviews.llvm.org

[PATCH] D119157: [NVPTX] Add ex2 f16 support

2022-02-23 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG69a8350c232a: [NVPTX] Add ex2.approx.f16/f16x2 support (authored by npmiller, committed by tra). Changed prior to commit: https://reviews.llvm.org

[PATCH] D120132: [HIP] Fix HIP include path

2022-02-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:531-532 + DriverArgs.hasArg(options::OPT_nostdlibinc)) { +CC1Args.push_back("-internal-isystem"); +CC1Args.push_back(HipIncludePath); + } yaxunl wrote: > tra wrote: > > ya

[PATCH] D120499: [NVPTX] Fix nvvm.match.sync*.i64 intrinsics return type (i64 -> i32)

2022-02-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Good catch. Thank you for the fix. Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:477 TARGET_BUILTIN(__nvvm_match_any_sync_i32, "UiUiUi", "", PTX60) -TARGET_BUILTIN(__nvvm_match_any_sync_i64, "WiUiWi", "", PTX60) +TARGET_BUILTIN(__nvvm_match_any_s

[PATCH] D120132: [HIP] Fix HIP include path

2022-02-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:486 -void RocmInstallationDetector::AddHIPIncludeArgs(const ArgList &DriverArgs, - ArgStringList &CC1Args) const { +std::string RocmInstallationDetecto

[PATCH] D120499: [NVPTX] Fix nvvm.match.sync*.i64 intrinsics return type (i64 -> i32)

2022-02-28 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D120499/new/ https://reviews.llvm.org/D120499 ___ cfe-

[PATCH] D120132: [HIP] Fix HIP include path

2022-02-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D120132#3345534 , @yaxunl wrote: > I just found one issue with the current patch. It adds HIP include path for > non-HIP programs. > > We should only add HIP include path for JobAction with HIP offloading kind. > However, AddClan

[PATCH] D120132: [HIP] Fix HIP include path

2022-02-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D120132#3349936 , @yaxunl wrote: > Users may use clang driver to compile HIP program and C++ program with one > clang driver invocation, e.g. > > clang --offload-arch=gfx906 a.hip b.cpp > > Clang driver will create job actions f

[PATCH] D120132: [HIP] Fix HIP include path

2022-03-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D120132#3351391 , @yaxunl wrote: > > If any input file is HIP program, clang driver will use HIP offload kind for > all inputs. This behavior is similar as cuda-clang. I do not think this is the case as illustrated by the exa

[PATCH] D117887: [NVPTX] Expose float tys min, max, abs, neg as builtins

2022-03-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D117887#3351257 , @jchlanda wrote: > @tra thank you for landing the patches, it seems that the clang part (builtin > declarations and tests) have been dropped, only `llvm` dir changes made it > through. Is there any way I could f

[PATCH] D120132: [HIP] Fix HIP include path

2022-03-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D120132#3351999 , @yaxunl wrote: > In D120132#3351853 , @tra wrote: > >> In D120132#3351391 , @yaxunl wrote: >> >>> >> >> >> >>> If any input file

[PATCH] D117887: [NVPTX] Expose float tys min, max, abs, neg as builtins

2022-03-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Missing clang-side changes have landed. Please check. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D117887/new/ https://reviews.llvm.org/D117887 ___ cfe-commits mailing list cfe-comm

[PATCH] D117887: [NVPTX] Expose float tys min, max, abs, neg as builtins

2022-03-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D117887#3353653 , @jchlanda wrote: > I went with the web interface as described here: > https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface > > with `git diff -U99 ...` didn't want to bite the bu

[PATCH] D120911: [CUDA][HIP] Fix offloading kind for linking C++ programs

2022-03-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. We should probably also check what happens when we specify compilation language explicitly: E.g. `clang -x cuda a.cu -x c++ b.cc`, `clang -x cuda a.cu b.cc` and `clang a.cu -x cuda b.cc`. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D120911/new/ https://review

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-03-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/Driver.cpp:4107 + options::OPT_no_offload_arch_EQ)) { +C.getDriver().Diag(diag::err_opt_not_valid_with_opt) << "--offload-arch" + << "--offl

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-03-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/Driver.cpp:4132-4134 + Archs.insert(CudaArchToString(CudaArch::SM_35)); +else if (Kind == Action::OFK_HIP) + Archs.insert(CudaArchToString(CudaArch::GFX803)); JonChesterfield wrote: > tra wrote

[PATCH] D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions

2021-04-22 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. In D100124#2707731 , @steffenlarsen wrote: >> Do you know if any existing code already uses the __nvvm_* builtins for >> cp.async? In other words, does nvc

[PATCH] D100794: [HIP] Support overloaded math functions for hipRTC

2021-04-22 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. Comment at: clang/lib/Headers/__clang_hip_cmath.h:586-587 _GLIBCXX_BEGIN_NAMESPACE_VERSION #endif #endif Nit: I'd add `// ` here for consistency. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D1

[PATCH] D101630: [HIP] Fix device-only compilation

2021-04-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. CUDA compilation currently errors out if `-o` is used when more than one output would be produced. E.g. % bin/clang++ -x cuda --offload-arch=sm_60 --offload-arch=sm_70 --cuda-path=$HOME/local/cuda-10.2 zz.cu -c -E #... preprocessed output from host and 2 GPU compilati

[PATCH] D101630: [HIP] Fix device-only compilation

2021-04-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. What will happen with this patch in the following scenarios: - `--offload_arch=A -S -o out.s` - `--offload_arch=A --offload-arch=B -S -o out.s` I would expect the first case to produce a plain text assembly file. With this patch the second case will produce a bundle. With s

[PATCH] D101630: [HIP] Fix device-only compilation

2021-05-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: jdoerfert. tra added a comment. In D101630#2730273 , @yaxunl wrote: > How about an option -fhip-bundle-device-output. If it is on, device output is > bundled no matter how many GPU arch there are. By default it is on. +1 to the o

[PATCH] D101630: [HIP] Fix device-only compilation

2021-05-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:977 NegFlag>; +defm hip_bundle_device_output : BoolFOption<"hip-bundle-device-output", EmptyKPM, DefaultTrue, + PosFlag, jansvoboda11 wrote: > The TableGen marshalling infrastructur

[PATCH] D101630: [HIP] Fix device-only compilation

2021-05-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D101630#2744861 , @yaxunl wrote: > [snip] it is the convention for compiler to have one output. > The compilation is like a pipeline. If we break it into stages, users would > expect to use the output from one stage as input for

[PATCH] D102237: [CUDA][HIP] Fix non-ODR-use of static device variable

2021-05-11 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with few nits. Comment at: clang/lib/Sema/SemaExpr.cpp:17145 }; -if (Var && Var->hasGlobalStorage() && !IsEmittedOnDeviceSide(Var)) { - SemaRef.targetDiag(Loc, d

[PATCH] D102251: Suppress Deferred Diagnostics in discarded statements.

2021-05-11 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. LGTM for CUDA. This matches the intent of deferred diags -- we only emit them if we get to generate the code for the sources that triggered them, so they should not show up for the false constexpr branches. Repository: rG LLVM Github Monorepo

[PATCH] D102270: [CUDA][HIP] Fix device template variables

2021-05-11 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM in general. Perhaps it would make sense to combine this patch with D102237 as both patches are changing the same code for the same reason, just for slightly d

[PATCH] D116583: Change the default optimisation level of PTXAS from -O0 to -O3. This makes the optimisation levels of PTXAS and the ptxjitcompiler equal (ptxjitcompiler defaults to -O3).

2022-01-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:433 } else { -// If no -O was passed, pass -O0 to ptxas -- no opt flag should correspond -// to no optimizations, but ptxas's default is -O3. -CmdArgs.push_back("-O0"); +// If no -O was

[PATCH] D111047: CUDA/HIP: Allow __int128 on the host side

2022-01-04 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGc99b2c63169d: CUDA/HIP: Allow __int128 on the host side (authored by linjamaki, committed by tra). Repository: rG LLVM Github Monorepo CHANGES SI

[PATCH] D116583: Change the default optimisation level of PTXAS from -O0 to -O3. This makes the optimisation levels of PTXAS and the ptxjitcompiler equal (ptxjitcompiler defaults to -O3).

2022-01-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:433 } else { -// If no -O was passed, pass -O0 to ptxas -- no opt flag should correspond -// to no optimizations, but ptxas's default is -O3. -CmdArgs.push_back("-O0"); +// If no -O was

[PATCH] D116673: [Clang][NVPTX]Add NVPTX intrinsics and builtins for CUDA PTX cvt sm80 instructions

2022-01-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM overall. Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:405 +TARGET_BUILTIN(__nvvm_ff2v2bf_rn, "ZUiff", "", AND(SM_80,PTX70)) +TARGET_BUILTIN(__nvvm_ff2v2bf_rn_relu, "ZUiff", "", AND(SM_80,PTX70)) Nit: `ff2v2bf` is a bit har

[PATCH] D114601: Read path to CUDA from env. variable CUDA_PATH on Windows

2022-01-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Ping. @mojca, do you need help landing the patch? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D114601/new/ https://reviews.llvm.org/D114601 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/c

[PATCH] D116673: [Clang][NVPTX]Add NVPTX intrinsics and builtins for CUDA PTX cvt sm80 instructions

2022-01-06 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Comment at: clang/test/CodeGen/builtins-nvptx.c:760 +// CHECK-LABEL: nvvm_cvt_sm80 +__device__ void nvvm_cvt_sm80() { +#if __CUDA_ARCH__ >= 800 Can you try c

[PATCH] D114601: Read path to CUDA from env. variable CUDA_PATH on Windows

2022-01-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D114601#3224469 , @mojca wrote: > In D114601#3223155 , @tra wrote: > >> Ping. @mojca, do you need help landing the patch? > > Yes, please. I don't have commit access yet. > You can attribut

[PATCH] D112718: Add intrinsics and builtins for PTX atomics with semantic orders

2022-01-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:1057 + +BUILTIN(__nvvm_atom_xchg_global_i, "iiD*i", "n") +TARGET_BUILTIN(__nvvm_atom_cta_xchg_global_i, "iiD*i", "n", SM_60) We need to figure out how address-space-specific builti

[PATCH] D112718: Add intrinsics and builtins for PTX atomics with semantic orders

2022-01-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:1057 + +BUILTIN(__nvvm_atom_xchg_global_i, "iiD*i", "n") +TARGET_BUILTIN(__nvvm_atom_cta_xchg_global_i, "iiD*i", "n", SM_60) t4c1 wrote: > tra wrote: > > We need to figure out how

[PATCH] D116840: [HIP] Fix device only linking for -fgpu-rdc

2022-01-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/Driver.cpp:3173 + AssociatedOffloadKind); +AL.clear(); +// Offload the host object to the host linker. Doing `clear()` in a function intended to append looks suspicious. We

[PATCH] D112718: Add intrinsics and builtins for PTX atomics with semantic orders

2022-01-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:1057 + +BUILTIN(__nvvm_atom_xchg_global_i, "iiD*i", "n") +TARGET_BUILTIN(__nvvm_atom_cta_xchg_global_i, "iiD*i", "n", SM_60) t4c1 wrote: > tra wrote: > > t4c1 wrote: > > > tra wrot

[PATCH] D116967: [HIP] Fix device malloc/free

2022-01-10 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Headers/__clang_hip_runtime_wrapper.h:80 +#if HIP_VERSION_MAJOR > 4 || (HIP_VERSION_MAJOR == 4 && HIP_VERSION_MINOR >= 5) +extern "C" __device__ unsigne

[PATCH] D116967: [HIP] Fix device malloc/free

2022-01-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Headers/__clang_hip_runtime_wrapper.h:80 +#if HIP_VERSION_MAJOR > 4 || (HIP_VERSION_MAJOR == 4 && HIP_VERSION_MINOR >= 5) +extern "C" __device__ unsigned long long __ockl_dm_alloc(unsigned long long __size); yax

[PATCH] D112718: Add intrinsics and builtins for PTX atomics with semantic orders

2022-01-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:1057 + +BUILTIN(__nvvm_atom_xchg_global_i, "iiD*i", "n") +TARGET_BUILTIN(__nvvm_atom_cta_xchg_global_i, "iiD*i", "n", SM_60) Naghasan wrote: > tra wrote: > > t4c1 wrote: > > > tra

[PATCH] D112718: Add intrinsics and builtins for PTX atomics with semantic orders

2022-01-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:1057 + +BUILTIN(__nvvm_atom_xchg_global_i, "iiD*i", "n") +TARGET_BUILTIN(__nvvm_atom_cta_xchg_global_i, "iiD*i", "n", SM_60) t4c1 wrote: > tra wrote: > > Naghasan wrote: > > > tra

[PATCH] D116673: [Clang][NVPTX]Add NVPTX intrinsics and builtins for CUDA PTX cvt sm80 instructions

2022-01-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D116673#3237342 , @JackAKirk wrote: > I thought I should let you know that I do not have commit access to land the > patch. I'm also happy to wait a little longer in case you think other > interested parties might still chime in

[PATCH] D117137: [Driver] Add a flag cuda-device-triple

2022-01-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I think instead of setting the triple directly from the command line, we should start with adding another `--cuda-gpu-arch` (AKA --offload-arch) variant and derive the triple and other parameters from it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D116673: [Clang][NVPTX]Add NVPTX intrinsics and builtins for CUDA PTX cvt sm80 instructions

2022-01-13 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rGbef3eb83442a: [Clang][NVPTX]Add NVPTX intrinsics and builtins for CUDA PTX cvt sm80… (authored by JackAKirk, committed by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://r

[PATCH] D116673: [Clang][NVPTX]Add NVPTX intrinsics and builtins for CUDA PTX cvt sm80 instructions

2022-01-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D116673#3237873 , @JackAKirk wrote: >>> I can land the patch on your behalf. Are you OK to use the name/email in >>> this patch or do you prefer to use a different email for the LLVM commit? > > Thanks very much. Yes the name/ema

[PATCH] D110622: [HIPSPV][3/4] Enable SPIR-V emission for HIP

2021-12-06 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Note to self: don't forget to hit "submit". The comments below have been left unsubmitted for two weeks. Sorry about that. The patch looks OK for the time being. That said, I do have concerns that w

[PATCH] D115039: [HIP] Fix -fgpu-rdc for Windows

2021-12-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > Put __hip_gpubin_handle in comdat when it has linkonce_odr linkage. I wonder when would this happen? I'm not sure we ever want gpubin handles from different TUs merged. I think it may result in different TUs attempting to load/init the same GPU binary multiple times. C

<    12   13   14   15   16   17   18   19   >