[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2023-01-18 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with few minor nits and questions. In D140158#4063689 , @jhuber6 wrote: > Addressing some comments. I don't know if there's a cleaner way to mess > ar

[PATCH] D142133: [LinkerWrapper] Use `clang` to perform the device linking

2023-01-19 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. Nice. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D142133/new/ https://reviews.llvm.org/D142133 ___ cfe-commits mailing list cfe-commits@lists.llvm.org h

[PATCH] D142118: [HIP] Unbundler allows missing host entry

2023-01-19 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Driver/OffloadBundler.cpp:1063 // in case host bundle name was provided in command line. - if (!FoundHostBundle && BundlerConfig.HostInputIndex != ~0u

[PATCH] D127673: [OpenMP] Fix offload packager not writing to temps correctly

2022-06-13 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM overall with a couple of nits. Comment at: clang/lib/Driver/Driver.cpp:5420 +/*CreatePrefixForHost=*/isa(A) || +(!!A->getOffloadingHostActiveKinds() && !At

[PATCH] D127686: [Offloading] Embed the target features in the OffloadBinary

2022-06-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:8320 +TC->getDriver().isUsingLTO(/* IsOffload */ true) +? ",feature=" + llvm::join(FeatureArgs, ",feature=") +: ""; This makes a couple of implicit assump

[PATCH] D127686: [Offloading] Embed the target features in the OffloadBinary

2022-06-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/openmp-offload-gpu-new.c:117 + +// CHECK-LTO-FEATURES: clang-offload-packager{{.*}}feature={{.*}}ptx jhuber6 wrote: > tra wrote: > > This should probably be a bit more specific/verbose. E.g. I'd want to mak

[PATCH] D127686: [Offloading] Embed the target features in the OffloadBinary

2022-06-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:735 + llvm::StringMap LastOpt; + for (unsigned I = 0, N = Features.size(); I < N; ++I) { +StringRef Name = Features[I]; // Record the index of the last occurence

[PATCH] D127707: [Clang] Simplify unifying target features

2022-06-14 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Driver/ToolChains/CommonArgs.cpp:153 +if (UsedFeatures.insert(Feature.drop_front()).second) + UnifiedFeatures.push_back(Feature); } --

[PATCH] D127771: [HIP] fix long double size

2022-06-14 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. AFAICT, the test case you've added works fine with the compiler at HEAD: https://cuda.godbolt.org/z/q3xYMfdeb I guess it only shows up in assertion-enabled builds. Can you check what happens if you

[PATCH] D127901: [LinkerWrapper] Add PTX output to CUDA fatbinary in LTO-mode

2022-06-16 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Playing devil's advocate, I've got to ask -- do we even want to support JIT? JIT brings more trouble than benefits. - substantial start-up time on nontrivial apps. Last time I tried launching a tensorflow app and needed to JIT its kernels, it took about half an hour until

[PATCH] D128022: [HIP] add -fhip-kernel-arg-name

2022-06-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:1845-1846 + } + if (getCodeGenOpts().EmitOpenCLArgMetadata || + getCodeGenOpts().HIPSaveKernelArgName) Fn->setMetadata("kernel_arg_name", Should we consolidate both options int

[PATCH] D128022: [HIP] add -fhip-kernel-arg-name

2022-06-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:1845-1846 + } + if (getCodeGenOpts().EmitOpenCLArgMetadata || + getCodeGenOpts().HIPSaveKernelArgName) Fn->setMetadata("kernel_arg_name", yaxunl wrote: > yaxunl wrote: > > tra

[PATCH] D127901: [LinkerWrapper] Add PTX output to CUDA fatbinary in LTO-mode

2022-06-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D127901#3602771 , @jdoerfert wrote: > Do we want/need PTX, I do not, but I don't mind having it. Someone will ask > for it eventually. Fair enough. > However, if we embed bitcode via LTO we can use the > single-linked PTX image

[PATCH] D127686: [Offloading] Embed the target features in the OffloadBinary

2022-06-22 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with a nit. Comment at: clang/test/Driver/openmp-offload-gpu-new.c:117 + +// CHECK-LTO-FEATURES: clang-offload-packager{{.*}}feature={{.*}}ptx tra wrote: > jh

[PATCH] D127901: [LinkerWrapper] Add PTX output to CUDA fatbinary in LTO-mode

2022-06-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D127901#3603118 , @jhuber6 wrote: > In D127901#3603006 , @tra wrote: > >> Then we do need a knob controlling whether we do want to embed PTX or not. >> The default should be "off" IMO. >>

[PATCH] D128022: [HIP] add -fhip-kernel-arg-name

2022-06-23 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:1845-1846 + } + if (getCodeGenOpts().EmitOpenCLArgMetadata || + getCodeGenOpts().HIPSaveKernelArgName) Fn->setMetada

[PATCH] D128441: [CUDA] Do not embed a fatbinary when using the new driver

2022-06-23 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. > The linker wrapper cannot do anything with these embedded PTX files because > we do not know how to link them, Neither, apparently does `nvlink`. It does have `--emip-ptx ` option, but only if LT

[PATCH] D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given

2022-05-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:6836 + + // If the CUID is not specified we try to generate a unique postfix. + if (getLangOpts().CUID.empty()) { jhuber6 wrote: > tra wrote: > > > However, [CUID] is not always availibl

[PATCH] D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given

2022-05-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I'm still itching to figure out a way to avoid CUID altogether and with the new driver it may be possible. CUID serves two purposes: a) avoid name conflicts during device-side linking ("must be globally unique" part) b) allow host to refer to something in the GPU executable

[PATCH] D126369: [LLVM] Add rcp.approx.ftz.f32 intrinsic

2022-05-25 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Do you need me to land it? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D126369/new/ https://reviews.llvm.org/D126369 _

[PATCH] D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given

2022-05-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Is this patch in its current form blocking any of your other work? no-cuid approach, even if we figure out how to do it, will likely take some time. Do you need an interim solution until then? In D125904#3537385 , @jhuber6 wrote: >

[PATCH] D126398: [Clang] Introduce `-dl` option to perform offload device linking

2022-05-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > Currently, we infer the usage of the device linker by the user > specifying an offloading toolchain, e.g. (--offload-arch=...) or > (-fopenmp-targets=...), but this shouldn't be strictly necessary. Yup. Whether we want to perform device link or not is orthogonal to those o

[PATCH] D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given

2022-05-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. How much work would it take to add cuid generation in the new driver, similar to what the old driver does, using the same logic, however imperfect it is? I'd be OK with that as a possibly permanent solution. I'm somewhat wary of temporary solutions as they tend to become pe

[PATCH] D126398: [Clang] Introduce `-dlink` option to perform offload device linking

2022-05-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:825-826 + HelpText<"Use the new offloading linker to perform the link job.">; +def device_link : Flag<["-"], "dlink">, Group, + Alias; def Xlinker : Separate<["-"], "Xlinker">, Flags<[LinkerInput, Ren

[PATCH] D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given

2022-05-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D125904#3538163 , @yaxunl wrote: > I am OK with this patch. OK. > That said, this patch provided a default CUID that do not depend on driver, > which is its advantage. It should be OK for most usecases. I agree with this part.

[PATCH] D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given

2022-05-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D125904#3538742 , @jhuber6 wrote: > In D125904#3538714 , @tra wrote: > >> It would be great to have some compile-time checks for that, if possible. >> Otherwise it will only manifest at ru

[PATCH] D126681: [HIP] Fix static lib name on windows

2022-05-31 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Driver/ToolChains/CommonArgs.cpp:1788 +for (auto Prefix : {"/libdevice/", "/"}) { + if (IsMSVC) { +AOBFileNames.push_back(Twine(LPath + P

[PATCH] D126812: [Binary] Promote OffloadBinary to inherit from Binary

2022-06-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM in principle, with new nits. Comment at: llvm/include/llvm/Object/OffloadBinary.h:121 const Entry *TheEntry) - : Buffer(Buffer), TheHeader(TheHeader), TheEntry(TheEntry) { - + : Binary(Binary::ID_Offload, Source), Buffer(Sour

[PATCH] D126812: [Binary] Promote OffloadBinary to inherit from Binary

2022-06-01 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: llvm/lib/Object/OffloadBinary.cpp:28 + if (identify_magic(Buf.getBuffer()) != file_magic::offload_binary) return errorCodeToError(llvm::object::object_error::

[PATCH] D126812: [Binary] Promote OffloadBinary to inherit from Binary

2022-06-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: llvm/lib/Object/OffloadBinary.cpp:18 - -namespace llvm { MaskRay wrote: > tra wrote: > > tra wrote: > > > Nit: `using namespace` seems a bit too heavy-handed when you only need > > > one enum > > > `using object_error = l

[PATCH] D125165: [Clang] Introduce clang-offload-packager tool to bundle device files

2022-06-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. @jhuber6 -- @MaskRay has found that `ninja install` is failing in a clean build with: clang: error: unable to execute command: Executable "clang-offload-packager" doesn't exist! It looks like a missing dependency somewhere. Repository: rG LLVM Github Monorepo CHANG

[PATCH] D127142: [HIP] Link with clang_rt.builtins

2022-06-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added a reviewer: MaskRay. tra added a comment. Herald added a subscriber: StephenFan. Looks OK syntax-wise. Library path should probably be fixed, though it appears to be a somewhat separate issue and could be done in its own patch, if the required change is not trivial. ==

[PATCH] D127142: [HIP] Link with clang_rt.builtins

2022-06-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Linux.cpp:684-690 CmdArgs.append( {Args.MakeArgString(StringRef("-L") + RocmInstallation.getLibPath()), "-rpath", Args.MakeArgString(RocmInstallation.getLibPath())}); CmdArgs.push_back("-lam

[PATCH] D127267: [NVPTX] Add setAuxTarget override rather than make a new TargetInfo

2022-06-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I think the root cause of the problem here is that CUDA compilation assumes that the GPU-side looks identical to the host side of the compilation. The test was intended to verify that and, AFAICT, it did its job flagging the issue here. AFAICT, __GCC_ATOMIC_LLONG_LOCK_FREE=

[PATCH] D155539: [CUDA][HIP] Use the same default language std as C++

2023-07-18 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. We should probably update documentation that C++ standard version for CUDA/HIP compilation now matches C++ default instead of previously used c++14. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D155539/new/ https://reviews.llvm.org/D1555

[PATCH] D154559: [clang] Fix constant evaluation about static member function

2023-07-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. @rsmith Richard, PTAL. This needs your language lawyering expertise. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D154559/new/ https://reviews.llvm.org/D154559 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https

[PATCH] D156014: [Clang][NVPTX] Permit use of the alias attribute for NVPTX targets

2023-07-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Sema/SemaDeclAttr.cpp:1995 } - if (S.Context.getTargetInfo().getTriple().isNVPTX()) { -S.Diag(AL.getLoc(), diag::err_alias_not_supported_on_nvptx); Allowing or not `noreturn` depends on the CUDA version we'

[PATCH] D156014: [Clang][NVPTX] Permit use of the alias attribute for NVPTX targets

2023-07-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Sema/SemaDeclAttr.cpp:1995 } - if (S.Context.getTargetInfo().getTriple().isNVPTX()) { -S.Diag(AL.getLoc(), diag::err_alias_not_supported_on_nvptx); tra wrote: > Allowing or not `noreturn` depends on the CUD

[PATCH] D157750: Properly handle -fsplit-machine-functions for fatbinary compilation

2023-08-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Comment at: clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c:9 + +// Check that -fsplit-machine-functions is passed to both x86 and cuda compilation and does not cause driver error. +// MFS2: -fsplit-machine-functions We will

[PATCH] D157750: Properly handle -fsplit-machine-functions for fatbinary compilation

2023-08-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c:9 + +// Check that -fsplit-machine-functions is passed to both x86 and cuda compilation and does not cause driver error. +// MFS2: -fsplit-machine-functions shenh

[PATCH] D157750: Properly handle -fsplit-machine-functions for fatbinary compilation

2023-08-14 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c:9 + +// Check that -fsplit-machine-functions is passed to both x86 and cuda compilation and does not cause driver error. +// MFS2: -fsplit-machine-functions shenh

[PATCH] D157750: Properly handle -fsplit-machine-functions for fatbinary compilation

2023-08-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c:16 +// causes a warning. +// RUN: %clang --target=x86_64-unknown-linux-gnu -nogpulib -nogpuinc \ +// RUN: --cuda-gpu-arch=sm_70 -x cuda -fsplit-machine-functions -S %s 2>&1

[PATCH] D158226: [CUDA/NVPTX] Improve handling of memcpy for -Os compilations.

2023-08-17 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. Herald added subscribers: mattd, gchakrabarti, asavonic, bixia, hiraditya, yaxunl. Herald added a project: All. tra published this revision for review. tra added a reviewer: alexfh. Herald added subscribers: llvm-commits, cfe-commits, wangpc, jholewinski. Herald added pr

[PATCH] D158247: [CUDA][HIP] Fix overloading resolution in global variable initializer

2023-08-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Same reproducer but for CUDA: https://godbolt.org/z/WhjTMffnx Comment at: clang/include/clang/Sema/Sema.h:4753 + /// Otherwise, use \p D to determiine the host/device target. bool CheckCallingConvAttr(const ParsedAttr &attr, CallingConv &CC, +

[PATCH] D158226: [CUDA/NVPTX] Improve handling of memcpy for -Os compilations.

2023-08-18 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG72757343fa86: [CUDA/NVPTX] Improve handling of memcpy for -Os compilations. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE

[PATCH] D158238: Implement __builtin_fmaximum/fminimum*

2023-08-18 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. Herald added a subscriber: bixia. Herald added a project: All. tra updated this revision to Diff 551336. tra added a comment. tra updated this revision to Diff 551338. tra published this revision for review. tra added a reviewer: fhahn. Herald added subscribers: cfe-commi

[PATCH] D158238: Implement __builtin_fmaximum/fminimum*

2023-08-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. @fhahn who else should take a look at the patch? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D158238/new/ https://reviews.llvm.org/D158238 ___ cfe-commits mailing list cfe-commits@l

[PATCH] D158238: Implement __builtin_fmaximum/fminimum*

2023-08-18 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 551646. tra added a comment. Fixed test RUN lines Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D158238/new/ https://reviews.llvm.org/D158238 Files: clang/include/clang/Basic/Builtins.def clang/lib/AST/ExprCons

[PATCH] D157750: Properly handle -fsplit-machine-functions for fatbinary compilation

2023-08-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c:9 + +// Check that -fsplit-machine-functions is passed to both x86 and cuda compilation and does not cause driver error. +// MFS2: -fsplit-machine-functions MaskR

[PATCH] D156014: [Clang][NVPTX] Permit use of the alias attribute for NVPTX targets

2023-08-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Sema/SemaDeclAttr.cpp:1995 } - if (S.Context.getTargetInfo().getTriple().isNVPTX()) { -S.Diag(AL.getLoc(), diag::err_alias_not_supported_on_nvptx); jhuber6 wrote: > jhuber6 wrote: > > tra wrote: > > > tra w

[PATCH] D141375: [SYCL][OpenMP] Fix compilation errors for unsupported __bf16 intrinsics

2023-09-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Herald added a subscriber: jplehr. Comment at: clang/lib/Sema/Sema.cpp:1978-1979 !Context.getTargetInfo().hasInt128Type()) || +(Ty->isBFloat16Type() && !Context.getTargetInfo().hasBFloat16Type() && + !LangOpts.CUDAIsDevice) ||

[PATCH] D141375: [SYCL][OpenMP] Fix compilation errors for unsupported __bf16 intrinsics

2023-09-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Sema/Sema.cpp:1978-1979 !Context.getTargetInfo().hasInt128Type()) || +(Ty->isBFloat16Type() && !Context.getTargetInfo().hasBFloat16Type() && + !LangOpts.CUDAIsDevice) || LongDoubleMismatched) { --

[PATCH] D154822: Support '-fgpu-default-stream=per-thread' for NVIDIA CUDA

2023-07-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added a reviewer: tra. tra added a comment. Looking at CUDA headers, it appears that changing only compiler-generated-glue may be insufficient. A lot of other CUDA API calls need to be changed to `_ptsz` variant and for that we need to have `CUDA_API_PER_THREAD_DEFAULT_STREAM` defined. Re

[PATCH] D154797: [CUDA][HIP] Rename and fix `-fcuda-approx-transcendentals`

2023-07-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Looks good in general. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:7221-7223 +bool UseApproxTranscendentals = false; +if (Args.hasFlag(options::OPT_ffast_math, options::OPT_fno_fast_math, + false)) ``` bool

[PATCH] D154797: [CUDA][HIP] Rename and fix `-fcuda-approx-transcendentals`

2023-07-10 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Frontend/InitPreprocessor.cpp:1294 +if (!LangOpts.HIP) + Builder.defineMacro("__CLANG_CUDA_APPROX_TRANSCENDENTALS__"); +Builder.defineMacro("

[PATCH] D154300: [CUDA][HIP] Fix template argument deduction

2023-07-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Sema/SemaOverload.cpp:12758-12764 +std::optional MorePreferableByCUDA = +CheckCUDAPreference(FD, Result); +// If FD has different CUDA preference than Result. +if (MorePreferableByCUDA) { +

[PATCH] D154822: [clang] Support '-fgpu-default-stream=per-thread' for NVIDIA CUDA

2023-07-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > Can anyone push this? I can help with this. How do you want your commit to be attributed? The patch currently has `boxu.zhang `. Do you want it to be changed to something else? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D

[PATCH] D154822: [clang] Support '-fgpu-default-stream=per-thread' for NVIDIA CUDA

2023-07-13 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGf05b58a9468c: [clang] Support '-fgpu-default-stream=per-thread' for NVIDIA CUDA (authored by boxu-zhang, committed by tra). Repository: rG LLVM G

[PATCH] D158778: [CUDA] Propagate __float128 support from the host.

2023-08-24 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. Herald added subscribers: mattd, gchakrabarti, asavonic, kerbowa, bixia, tpr, yaxunl, jvesely. Herald added a project: All. tra edited the summary of this revision. tra published this revision for review. tra added reviewers: jlebar, yaxunl. tra added a comment. Herald a

[PATCH] D158778: [CUDA] Propagate __float128 support from the host.

2023-08-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Also, https://github.com/llvm/llvm-project/issues/46903 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D158778/new/ https://reviews.llvm.org/D158778 ___ cfe-commits mailing list cfe-co

[PATCH] D158778: [CUDA] Propagate __float128 support from the host.

2023-08-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: ABataev. tra added a comment. @ABataev This patch breaks breaks two tests: - github.com/llvm/llvm-project/blob/main/clang/test/OpenMP/nvptx_unsupported_type_codegen.cpp - github.com/llvm/llvm-project/blob/main/clang/test/OpenMP/nvptx_unsupported_type_messages.cpp It's

[PATCH] D158778: [CUDA] Propagate __float128 support from the host.

2023-08-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: jhuber6. tra added a comment. In D158778#4624408 , @ABataev wrote: > Just checks removal should be fine Looks like OpenMP handles long double and __float128 differently -- it always insists on using the host's FP format for both.

[PATCH] D158778: [CUDA] Propagate __float128 support from the host.

2023-08-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D158778#4626181 , @jhuber6 wrote: > Just doing a simple example here https://godbolt.org/z/Y3E58PKMz shows that > for NVPTX we error out (as I would expect) but for AMDGPU we emit an x86 > 80-bit double. With this patch NVPTX wi

[PATCH] D138651: [CUDA][HIP] Don't diagnose use for __bf16

2022-11-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > Builds were failing because "__bf16" wasn't allowed on the target. For CUDA/NVPTX we've solved the issue by implementing storage-only support for NVPTX: https://reviews.llvm.org/D136311 It's a bit more work, but I think it would be a better fix long-term for AMDGPU, too.

[PATCH] D138862: [Clang] Do not set offload kind in a freestanding build

2022-11-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > we should expect it to be handled externally. Can you give more details on how it will work? Something/somewhere would need to generate the glue code to register kernels w/ CUDA runtime. Are you saying that we'll have some way to tell the final linking stage that all offl

[PATCH] D138862: [Clang] Do not set offload kind in a freestanding build

2022-11-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D138862#3955364 , @jhuber6 wrote: > So this patch was made because I'm trying to create a generic `libc` runtime > for the GPU. OK. That simplifies things. GPU-side non-kernel functions do not need anything from CUDA host runtim

[PATCH] D138862: [Clang] Do not set offload kind in a freestanding build

2022-11-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. OK. So, we do need a flag, at least for now. I think we'll need a new one, because `-freestanding` will likely do a lot of things we don't actually want on the host side. How about `--offload-generic` or `--[no-]offload-runtime`. Repository: rG LLVM Github Monorepo CHA

[PATCH] D138651: [CUDA][HIP] Don't diagnose use for __bf16

2022-11-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/SemaCUDA/amdgpu-bf16.cu:9 + +__device__ void devicefn() { +} We should probably also have a case verifying that actual attempt to use `__bf16` in device code is still diagnosed. Repository: rG LLVM Github Mo

[PATCH] D139045: [HIP] use detected GPU in --offload-arch

2022-11-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > This patch detects system GPU and use them in --offload-arch if not > specified. If system GPU cannot be detected clang will fall back to gfx803. I don't think auto-probing is something we want to do by default. IMO, we should be following `-march=native` behavior here an

[PATCH] D139023: [NFC][Clang][Driver][AMDGPU] Avoid temporary copies of std::string by using Twine and StringRef

2022-12-01 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM in principle. That said, I don't think `StringRef `buys us anything here as none of this code is in the hot path. It also comes with the downside that now one has to worry about lifetimes of t

[PATCH] D138651: [CUDA][HIP] Don't diagnose use for __bf16

2022-12-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/SemaCUDA/amdgpu-bf16.cu:9 + +__device__ void devicefn() { +} Pierre-vh wrote: > tra wrote: > > We should probably also have a case verifying that actual attempt to use > > `__bf16` in device code is still diagnos

[PATCH] D139045: [HIP] use detected GPU in --offload-arch

2022-12-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D139045#3962354 , @yaxunl wrote: > I understand your point. However, when users use gcc or clang and do not > specify target or CPU, the compiled program will execute on their system. > This is because gcc or clang has a default

[PATCH] D76365: [cuda][hip] Add CUDA builtin surface/texture reference support.

2020-03-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CGCUDARuntime.h:51 +}; +unsigned Kind : 2; +unsigned Extern : 1; This should be `DeviceVarKind` Comment at: clang/lib/CodeGen/CGCUDARuntime.h:53 +unsigned Extern : 1; +

[PATCH] D76520: [CUDA][HIP] Add -Xarch_device and -Xarch_host options

2020-03-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added a reviewer: echristo. tra added a comment. Does it handle options with values? E.g. if I want to pass `-mframe-pointer=none` ? I vaguely recall the current -Xarch_* implementation had some limitations. It may be worth adding a test for that. CHANGES SINCE LAST ACTION https://revie

[PATCH] D76365: [cuda][hip] Add CUDA builtin surface/texture reference support.

2020-03-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:701-713 + if (getLangOpts().CUDAIsDevice) { +// As CUDA builtin surface/texture types are replaced, skip generating TBAA +// access info. +if (AccessType->isCUDADeviceBuiltinSurfaceType()) { +

[PATCH] D76520: [CUDA][HIP] Add -Xarch_device and -Xarch_host options

2020-03-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D76520#1936837 , @yaxunl wrote: > -Xarch_ works with driver options having value, e.g. > `-fcf-protection=branch`. I added a test for that. > > `-mframe-pointer=none` is a cc1 option. That's why it cannot be passed by > -Xarch_. I

[PATCH] D76520: [CUDA][HIP] Add -Xarch_device and -Xarch_host options

2020-03-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D76520#1937294 , @yaxunl wrote: > -Xarch_ does not work for passing -cc1 options in the beginning. This patch > does not change that. > > This requires some further changes about how the options after -Xarch_ are > handled. I woul

[PATCH] D76365: [cuda][hip] Add CUDA builtin surface/texture reference support.

2020-03-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Headers/__clang_cuda_runtime_wrapper.h:82-94 #undef __CUDACC__ #if CUDA_VERSION < 9000 #define __CUDABE__ #else +#define __CUDACC__ #define __CUDA_LIBDEVICE__ #endif hliao wrote: > hliao wrote: > > tra wrote:

[PATCH] D76365: [cuda][hip] Add CUDA builtin surface/texture reference support.

2020-03-26 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Next step is to figure out what various __nv_tex_surf_handler(...) maps to for various strings (there are ~110 of them in CUDA-10.2) and implement its replacement. I think we should be able t

[PATCH] D76365: [cuda][hip] Add CUDA builtin surface/texture reference support.

2020-03-27 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Looks like the change breaks compilation for us: In file included from :1: In file included from llvm_unstable/toolchain/lib/clang/google3-trunk/include/__clang_cuda_runtime_wrapper.h:104: In file included from cuda/include/cuda_runtime.h:116: cuda/include/cuda_surfac

[PATCH] D76365: [cuda][hip] Add CUDA builtin surface/texture reference support.

2020-03-27 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D76365#1946345 , @tra wrote: > Looks like the change breaks compilation for us: > > In file included from :1: > In file included from > llvm_unstable/toolchain/lib/clang/google3-trunk/include/__clang_cuda_runtime_wrapper.h:104:

[PATCH] D76365: [cuda][hip] Add CUDA builtin surface/texture reference support.

2020-03-27 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D76365#1946415 , @hliao wrote: > That's a partial template specialization needs handling. I am revising that > patch. Please revert it first. Thanks. Reverted in fe8063e1a0e983f1b4d

[PATCH] D76948: [cuda][hip] Add CUDA builtin surface/texture reference support.

2020-03-27 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Would it be possible to update the old review with the new diff? It would make it easier to see the incremental changes you've made. If the old review can be reopened that would be great as it would keep all relevant info in one place, but I'm fine doing the review here, to

[PATCH] D76948: [cuda][hip] Add CUDA builtin surface/texture reference support.

2020-03-27 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D76948#1946878 , @hliao wrote: > I tried that before submitting this one. But, as it's in the closed state, I > cannot submit that anymore. I will attach the difference against the previous > change somewhere. I've reopened it.

[PATCH] D76365: [cuda][hip] Add CUDA builtin surface/texture reference support.

2020-03-27 Thread Artem Belevich via Phabricator via cfe-commits
tra reopened this revision. tra added a comment. This revision is now accepted and ready to land. Reopened for further work Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76365/new/ https://reviews.llvm.org/D76365

[PATCH] D76365: [cuda][hip] Add CUDA builtin surface/texture reference support.

2020-03-27 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D76365#1946938 , @hliao wrote: > The new revision is accepted, right? Just want to confirm as it seems you > accept it before I posted the new change. The approval was for the old version. I didn't undo it when I reopened the re

[PATCH] D76365: [cuda][hip] Add CUDA builtin surface/texture reference support.

2020-03-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D76365#1947479 , @hliao wrote: > Nice! I'll file a bug with NVIDIA. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76365/new/ https://reviews.llvm.org/D76365

[PATCH] D76987: Rename options --cuda-gpu-arch and --no-cuda-gpu-arch

2020-03-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a reviewer: echristo. tra accepted this revision. tra added a subscriber: echristo. tra added a comment. This revision is now accepted and ready to land. + @echristo who OK'ed the idea conditional on the actual patch. :-) LGTM overall. CHANGES SINCE LAST ACTION https://reviews.llvm.

[PATCH] D77240: [CUDA] Add missing cmath overloads

2020-04-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. We'll need to make sure that all of these new functions are vailable in all supported CUDA versions. E.g. `acoshf` does not seem to be present in CUDA-9. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77240/new/ https://reviews

[PATCH] D77239: [CUDA][NFCI] Use unqualified lookup for math functions

2020-04-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: jlebar. tra added a comment. > The other macro uses a unqualified lookup already and the qualified one > will cause problems in the OpenMP overlay. There's a bit of inconsitency here. While `__CUDA_CLANG_FN_INTEGER_OVERLOAD_2` indeed uses unqualified lookup, pretty much

[PATCH] D77240: [CUDA] Add missing cmath overloads

2020-04-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D77240#1955724 , @jdoerfert wrote: > At least that one is defined in what is "now" `__clang_cuda_math.h`: Cool. We may be OK, but we still need to verify it. Math headers are rather fragile and we need to make sure it all still w

[PATCH] D77240: [CUDA] Add missing cmath overloads

2020-04-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. We do have a problem. With your patch I see a lot of errors about function redefinitions conflicting with the ones in CUDA's `math_functions.hpp`: E.g: In file included from :1: In file included from /usr/local/google/home/tra/work/llvm/build/release+assert+zapcc/lib/c

[PATCH] D77240: [CUDA] Add missing cmath overloads

2020-04-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D77240#1957386 , @jdoerfert wrote: > I just noticed those as well. I forgot to put the new definitions into the > forward declare header. Will do it in a second. The OpenMP math overlay > doesn't have one so I forgot :( I'm not

[PATCH] D68578: [HIP] Fix device stub name

2020-02-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/AST/GlobalDecl.h:40 + Stub = 1, +}; + rjmccall wrote: > tra wrote: > > rjmccall wrote: > > > tra wrote: > > > > rjmccall wrote: > > > > > The attribute here is `CUDAGlobalAttr`; should this be named in t

[PATCH] D75001: [OpenMP][cmake] ignore warning on unknown CUDA version

2020-02-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D75001#1891284 , @Hahnfeld wrote: > In D75001#1887876 , @jdoerfert wrote: > > > I like this way better. I was hoping we could do it in our cmake only :) > > > > Give others a day or so to com

[PATCH] D75001: [OpenMP][cmake] ignore warning on unknown CUDA version

2020-02-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D75001#1891861 , @kkwli0 wrote: > @tra Will it also include -fopenmp-targets=nvptx64-nvidia-cuda? If you're asking whether the warning will be disabled for OpenMP, then no. This OpenMP target appears to rely on CUDA, and I think

[PATCH] D75310: [CUDA, clang-cl] Filter out unsupported arguments for device-side compilation.

2020-02-27 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: rnk. Herald added subscribers: sanjoy.google, bixia. Herald added a project: clang. Device-side compilation does not support some features and we need to filter them out when command line options enable them for the host. We're already doing this i

[PATCH] D69582: Let clang driver support parallel jobs

2019-10-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a reviewer: echristo. tra added a subscriber: echristo. tra added a comment. @echristo Eric, any thoughts/concerns on the overall direction for the driver? @yaxunl One concern I have is diagnostics. When the jobs are executed in parallel, I assume all of them will go to the standard er

[PATCH] D69498: IR: Invert convergent attribute handling

2019-10-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Would I be too far off the mark to summarize the situation this way? - current default for unattributed functions is unsound for GPU back-ends, but is fine for most of the other back-ends. - it's easy to unintentionally end up using/mis-optimizing unattributed functions for

[PATCH] D57829: [CUDA][HIP] Disable emitting llvm.linker.options in device compilation

2019-10-31 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Could you, please, give us a bit more context and provide more info for the questions @rjmccall and I asked before? Is the problem specifically because autolink is not supported on device side? Or is disabling autolink just a convoluted way to avoid calling EmitModuleLinkO

<    8   9   10   11   12   13   14   15   16   17   >