[PATCH] D136311: [CUDA,NVPTX] Implement __bf16 support for NVPTX.

2022-10-21 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 469676. tra added a comment. Cosmetic refectoring. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D136311/new/ https://reviews.llvm.org/D136311 Files: clang/lib/Basic/Targets/NVPTX.cpp clang/lib/Basic/Targets/NV

[PATCH] D136311: [CUDA,NVPTX] Implement __bf16 support for NVPTX.

2022-10-21 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 469683. tra added a comment. Added LLVM test for bfloat load/stores. Fixed asm output for bf16 constants. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D136311/new/ https://reviews.llvm.org/D136311 Files: clang/l

[PATCH] D136311: [CUDA,NVPTX] Implement __bf16 support for NVPTX.

2022-10-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. @jchlanda PTAL. You probably have the most context for NVPTX and bf16 instructions there. We need this change to unbreak CUDA compilation after D132329 exposed __bf16 to GPU-side compilation. https://godbolt.org/z/Kz8PYfPj5 Repository:

[PATCH] D136311: [CUDA,NVPTX] Implement __bf16 support for NVPTX.

2022-10-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. @yaxunl It appears that AMDGPU also does not support `__bf16`, but for some reason it does not error out in clang headers: https://godbolt.org/z/GrTGMn49f Any ideas why that may be the case? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.

[PATCH] D136311: [CUDA,NVPTX] Implement __bf16 support for NVPTX.

2022-10-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D136311#3882748 , @yaxunl wrote: > LGTM. Thanks. > > Do you plan to support arithmetic operators for bf16 or implement the FMA > instruction support? Yes. sm_90 has introduced a handful of new bf16 operations that will be eventu

[PATCH] D136311: [CUDA,NVPTX] Implement __bf16 support for NVPTX.

2022-10-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: llvm/lib/Target/NVPTX/NVPTXInstrInfo.td:186 + !eq(name, "v2f16"): Float16x2Regs, + !eq(name, "bf16"): Float16Regs, + !eq(name, "v2bf16"): Float16x2Regs, tra wrote: > Allen wrote: > > sorry for a basic question: w

[PATCH] D136311: [CUDA,NVPTX] Implement __bf16 support for NVPTX.

2022-10-25 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 470544. tra added a comment. whitespace fix Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D136311/new/ https://reviews.llvm.org/D136311 Files: clang/lib/Basic/Targets/NVPTX.cpp clang/lib/Basic/Targets/NVPTX.h

[PATCH] D136701: [LinkerWrapper] Perform device linking steps in parallel

2022-10-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I would argue that parallel compilation and linking may need to be disabled by default. I believe similar patches were discussed in the past regarding sub-compilations, but they are relevant for parallel linking, too. Google search shows D52193

[PATCH] D136701: [LinkerWrapper] Perform device linking steps in parallel

2022-10-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. @yaxunl - Found it: D69582 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D136701/new/ https://reviews.llvm.org/D136701 ___ cfe-commits mailing list c

[PATCH] D136311: [CUDA,NVPTX] Implement __bf16 support for NVPTX.

2022-10-25 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG0e8a414ab3d3: [CUDA, NVPTX] Added basic __bf16 support for NVPTX. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTIO

[PATCH] D136701: [LinkerWrapper] Perform device linking steps in parallel

2022-10-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D136701#3883300 , @jhuber6 wrote: > However, as an opt-in feature it would be very helpful in some cases. I'm OK with the explicit opt-in. > Like consider someone creating a static library that supports every GPU > architecture

[PATCH] D136854: [HIP] add -fhiplib-add-rpath

2022-10-27 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:4156 HelpText<"Do not add -rpath with architecture-specific resource directory to the linker flags">; +def fhiplib_add_rpath: Flag<["-"], "fhiplib-add-rpath">, Flags<[NoArgumentUnused]>, + HelpText

[PATCH] D137154: Adding nvvm_reflect clang builtin

2022-11-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: yaxunl. tra added a comment. I don't think it's a good idea. `__nvvm_reflect` is a hack that we should not propagate. The only code that currently relies on it is NVIDIA's libdevice bitcode and I'd prefer to keep it that way. Can you elaborate on what motivates this chan

[PATCH] D129507: [OffloadPackager] Add option to extract files from images

2022-08-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/offload-packager.c:12-14 +// RUN: clang-offload-packager %t.out \ +// RUN: --image=file=%t-sm_70.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \ +// RUN: --image=file=%t-gfx908.o,kind=openmp,triple=amdgcn-amd-am

[PATCH] D129507: [OffloadPackager] Add option to extract files from images

2022-08-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/offload-packager.c:12-14 +// RUN: clang-offload-packager %t.out \ +// RUN: --image=file=%t-sm_70.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \ +// RUN: --image=file=%t-gfx908.o,kind=openmp,triple=amdgcn-amd-am

[PATCH] D129507: [OffloadPackager] Add option to extract files from images

2022-08-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D129507#3730464 , @jhuber6 wrote: > Update, still having problems making the test but I figured I'd just update > now. You could try something like this: https://github.com/llvm/llvm-project/blob/d20e632853ad978e8008747d838d51b51

[PATCH] D132248: [CUDA][OpenMP] Fix the new driver crashing on multiple device-only outputs

2022-08-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > The old driver would put all the outputs in the final action list akin to a > linker job. IIRC that's where HIP and CUDA behaved differently. CUDA compilation does not allow device-only compilation for multiple targets if we have explicitly specified output. It does prod

[PATCH] D132248: [CUDA][OpenMP] Fix the new driver crashing on multiple device-only outputs

2022-08-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D132248#3735900 , @jhuber6 wrote: > Is this an architectural limitation? I'd imagine they'd just behave the same > way here in my implementation. The constraint here is that we have to stick with a single output per compiler inv

[PATCH] D132248: [CUDA][OpenMP] Fix the new driver crashing on multiple device-only outputs

2022-08-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I'm OK with that. @yaxunl -- what are your thoughts on whether this approach would work for HIP? On one hand HIP already has a lot of features that the new driver is intended to provide, so AMD may have no pressure to change to something else. On the other hand, long term

[PATCH] D129507: [OffloadPackager] Add option to extract files from images

2022-08-19 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/test/Driver/offload-packager.c:26 +// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=hip,triple=amdgcn-amd-amdhsa,arch=gfx90c +// RUN: cd $(dirname "%t") &

[PATCH] D132140: [AMDGPU] Add builtin s_sendmsg_rtn

2022-08-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl:23 + +// Test mismatched argument and return types are handled. + Is there a particular reason for this test? Argument and return value type checks should've been handled by

[PATCH] D132607: [OffloadPackager] Add ability to extract mages from other file types

2022-08-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > Should these be merged into a public interface via Object/OffloadBinary.h? I'm all for consolidating relevant code. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D132607/new/ https://reviews.llvm.org/D132607

[PATCH] D132607: [OffloadPackager] Add ability to extract mages from other file types

2022-08-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > before or after this patch. That would be your call. I personally would be biased towards doing refactoring early. Once something is in place, the temptation not to fix what already works might win. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION http

[PATCH] D133956: Cuda Check for ignored errors after calling a CUDA kernel

2022-09-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > The intention with this check is to work a lot in tandem with the one in > D133804 , which therefore prevents most > such cases. > Thus, the check is optimized for lowering false positives during static > checking and for a practice lowe

[PATCH] D133705: [HIP] Fix unbundling archive

2022-09-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D133705#3785470 , @yaxunl wrote: >> Also, using `lib*.a` as pattern to tell device libraries from the host-ony >> one will be insufficient. There will be libraries with other extensions >> (e.g. `.lo` is fairly common) and there

[PATCH] D134189: [CUDA][HIP] Fix new driver crashing when using -save-temps in RDC-mode

2022-09-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/Driver.cpp:4391 DDep.add(*PackagerAction, *C.getSingleOffloadToolChain(), - nullptr, Action::OFK_None); + nullptr, C.getActiveOffloadKinds()); } `getActiveOffloadKinds` return

[PATCH] D134189: [CUDA][HIP] Fix new driver crashing when using -save-temps in RDC-mode

2022-09-19 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with a few nits. Comment at: clang/include/clang/Driver/Action.h:304 +/// Add a action along with the associated toolchain, bound arch, and +/// offload kinds. -

[PATCH] D133705: [HIP] Fix unbundling archive

2022-09-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. It's still not quite clear to me what the patch is intended to do. The description describes the existing issues (we don't unbundle some libraries), but is not clear what we do want to have in the end. Are all library arguments expected to be unbundled? If we do not want to

[PATCH] D134546: [clang-offload-bundler] extracting compatible bundle entry

2022-09-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/OffloadBundler.cpp:1008 +auto Output = Worklist.begin(); +for (auto E = Worklist.end(); Output != E; Output++) { + if (isCodeObjectCompatible( The patch description implies that there are at lea

[PATCH] D133705: [HIP] Fix unbundling archive

2022-09-23 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. Comment at: clang/lib/Driver/Driver.cpp:2907-2908 +// which are not object files. Files with extension ".lib" is classified +// as TY_Object but they are actually archives, therefore should not be +//

[PATCH] D134546: [clang-offload-bundler] extracting compatible bundle entry

2022-09-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/OffloadBundler.cpp:1008 +auto Output = Worklist.begin(); +for (auto E = Worklist.end(); Output != E; Output++) { + if (isCodeObjectCompatible( saiislam wrote: > tra wrote: > > The patch descript

[PATCH] D133705: [HIP] Fix unbundling archive

2022-09-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/clang-offload-bundler.c:410-412 +// Check clang-offload-bundler extracts empty archives if the input file +// is not an archive when --allow-missing-bundles is specified, otherwise +// report an error. Is t

[PATCH] D133705: [HIP] Fix unbundling archive

2022-09-26 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. LGTM. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D133705/new/ https://reviews.llvm.org/D133705 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/l

[PATCH] D135076: [Clang] Make offloading flags accept '-' and '--'

2022-10-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D135076#3831563 , @MaskRay wrote: > Another idea is to reject multiple `-o` if some are used as the `Joined` > form. Note: multiple `Separate` `-o` should be allowed. > It will not catch `-offload-*` when `-o output` is not specif

[PATCH] D135269: [AMDGPU] Disable bool range metadata to workaround backend issue

2022-10-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Is there more info about the issue? What does AMDGPU currently emit for the test case? AFAICT from running it on CE (https://godbolt.org/z/ccq3vnbrM) llvm optimizes it to essentially `*y = *x` and generates a 1-byte load+store for both NVPTX and AMDGPU. CHANGES SINCE LAS

[PATCH] D135269: [AMDGPU] Disable bool range metadata to workaround backend issue

2022-10-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CGExpr.cpp:1792 // attach range metadata to the load. - } else if (CGM.getCodeGenOpts().OptimizationLevel > 0) +// TODO: Enable range metadata for AMDGCN after backend issue is fixed. + } else if (CGM.getCodeGenO

[PATCH] D135328: [CUDA] Refactored CUDA version housekeeping to use less boilerplate.

2022-10-05 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. Herald added subscribers: mattd, gchakrabarti, asavonic, bixia, yaxunl. Herald added a project: All. tra updated this revision to Diff 465598. tra added a comment. tra retitled this revision from "Refactored CUDA version housekeeping to use less boilerplate." to "[CUDA]

[PATCH] D135328: [CUDA] Refactored CUDA version housekeeping to use less boilerplate.

2022-10-05 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 465601. tra added a comment. Use int max for the "new" CUDA version value. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D135328/new/ https://reviews.llvm.org/D135328 Files: clang/lib/Basic/Cuda.cpp clang/lib/B

[PATCH] D135306: [CUDA] Add support for CUDA-11.8 and sm_{87,89,90} GPUs.

2022-10-05 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. Herald added subscribers: mattd, gchakrabarti, asavonic, bixia, hiraditya, yaxunl. Herald added a project: All. tra updated this revision to Diff 465512. tra added a comment. tra updated this revision to Diff 465557. tra updated this revision to Diff 465577. tra updated

[PATCH] D135328: [CUDA] Refactored CUDA version housekeeping to use less boilerplate.

2022-10-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Basic/Cuda.cpp:59 +CudaVersion ToCudaVersion(llvm::VersionTuple Version) { + int IVer = Version.getMajor() * 10 + Version.getMinor().value_or(0); + for (auto *I = CudaNameVersionMap; I->Version != CudaVersion::UNKNOWN; ++I) -

[PATCH] D135306: [CUDA] Add support for CUDA-11.8 and sm_{87,89,90} GPUs.

2022-10-05 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 465612. tra added a comment. fixed typos. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D135306/new/ https://reviews.llvm.org/D135306 Files: clang/docs/ReleaseNotes.rst clang/include/clang/Basic/BuiltinsNVPTX.d

[PATCH] D135306: [CUDA] Add support for CUDA-11.8 and sm_{87,89,90} GPUs.

2022-10-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:30-32 +#define SM_89 "sm_87|" SM_90 +#define SM_87 "sm_89|" SM_89 +#define SM_86 "sm_89|" SM_87 yaxunl wrote: > typo? Good catch. Fixed. Repository: rG LLVM Github Monorepo CH

[PATCH] D135328: [CUDA] Refactored CUDA version housekeeping to use less boilerplate.

2022-10-06 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 465799. tra added a comment. Use VersionTuple instead of a manually encoded integer version. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D135328/new/ https://reviews.llvm.org/D135328 Files: clang/lib/Basic/Cuda

[PATCH] D135306: [CUDA] Add support for CUDA-11.8 and sm_{87,89,90} GPUs.

2022-10-06 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 465800. tra added a comment. rebase Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D135306/new/ https://reviews.llvm.org/D135306 Files: clang/docs/ReleaseNotes.rst clang/include/clang/Basic/BuiltinsNVPTX.def c

[PATCH] D135328: [CUDA] Refactored CUDA version housekeeping to use less boilerplate.

2022-10-06 Thread Artem Belevich via Phabricator via cfe-commits
tra marked an inline comment as done. tra added inline comments. Comment at: clang/lib/Basic/Cuda.cpp:59 +CudaVersion ToCudaVersion(llvm::VersionTuple Version) { + int IVer = Version.getMajor() * 10 + Version.getMinor().value_or(0); + for (auto *I = CudaNameVersionMap; I->Versi

[PATCH] D135389: [Clang] Emit a warning for ambiguous joined '-o' arguments

2022-10-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM in principle. Comment at: clang/lib/Driver/Driver.cpp:337 +std::string Nearest; +if (getOpts().findNearest("-" + ArgString, Nearest, IncludedFlagsBitmask, + ExcludedFlagsBitmask) == 1) This looks fo

[PATCH] D135389: [Clang] Emit a warning for ambiguous joined '-o' arguments

2022-10-06 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Test nit. LGTM otherwise. Comment at: clang/test/Driver/unknown-arg.c:73 +// O-WARN-NEXT: warning: joined argument treated as '-o utput'; did you mean '--output'? +// O-WARN-NOT:

[PATCH] D135389: [Clang] Emit a warning for ambiguous joined '-o' arguments

2022-10-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/unknown-arg.c:74 + +// RUN: %clang -### --offload-arch=sm_70 -o ffload-device-only -Werror=unknown-argument %s Has this patch been updated with incomplete changes? This RUN line seems to miss `2>&1 | File

[PATCH] D135389: [Clang] Emit a warning for ambiguous joined '-o' arguments

2022-10-06 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. Comment at: clang/test/Driver/unknown-arg.c:74 + +// RUN: %clang -### --offload-arch=sm_70 -o ffload-device-only -Werror=unknown-argument %s jhuber6 wrote: > tra wrote: > > Has this patch been updated with i

[PATCH] D135328: [CUDA] Refactored CUDA version housekeeping to use less boilerplate.

2022-10-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. @yaxunl - Are you OK with the patch? PTAL. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D135328/new/ https://reviews.llvm.org/D135328 ___ cfe-commits mailing list cfe-commits@lists.l

[PATCH] D135306: [CUDA] Add support for CUDA-11.8 and sm_{87,89,90} GPUs.

2022-10-07 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 466171. tra added a comment. Updated a failing test. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D135306/new/ https://reviews.llvm.org/D135306 Files: clang/docs/ReleaseNotes.rst clang/include/clang/Basic/Buil

[PATCH] D135328: [CUDA] Refactored CUDA version housekeeping to use less boilerplate.

2022-10-07 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rGf3a2cbcf97f5: Refactored CUDA version housekeeping to use less boilerplate. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D135328/new/ htt

[PATCH] D135306: [CUDA] Add support for CUDA-11.8 and sm_{87,89,90} GPUs.

2022-10-07 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG9a01cca66036: Add support for CUDA-11.8 and sm_{87,89,90} GPUs. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D135614: [OpenMP][CUDA][AMDGPU] Accept case insensitive subarchitecture names

2022-10-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Is that really something we need/want to do? I've never seen anyone complaining about this particular issue. clang/gcc are case-sensitive for similar options, like `-march`: https://godbolt.org/z/3jKzTda7h I think offloading options should behave consistently in that regard

[PATCH] D135715: [Clang] Do not build the OffloadActionBuilder when using the new driver

2022-10-11 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Driver/Driver.cpp:3987-3988 // required. if (!UseNewOffloadingDriver) - if (OffloadBuilder.addHostDependenceToDeviceActions(Current, InputA

[PATCH] D135724: [HIP] Fix unbundling archive

2022-10-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/CommonArgs.cpp:1839 auto Ext = IsMSVC ? ".lib" : ".a"; - if (!Lib.startswith(":") && llvm::sys::fs::exists(Lib)) { -ArchiveOfBundles = Lib; -FoundAOB = true; + if (!Lib.startswith(":") && !Lib.startsw

[PATCH] D135791: [Clang] Do not crash when an invalid offload architecture is set

2022-10-12 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with few nits. Comment at: clang/lib/Driver/Driver.cpp:4191 /// or CUDA architecture. -static StringRef getCanonicalArchString(Compilation &C, -

[PATCH] D135796: [HIP] Detect HIP for Debian/Fedora

2022-10-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:309 + ROCmSearchDirs.emplace_back(D.SysRoot + "/usr/local", + /*StrictChecking=*/true); Should it be done for Debian/Fedora only? See clang/include/clan

[PATCH] D135724: [HIP] Fix unbundling archive

2022-10-12 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. LGTM Comment at: clang/lib/Driver/ToolChains/CommonArgs.cpp:1839 auto Ext = IsMSVC ? ".lib" : ".a"; - if (!Lib.startswith(":") && llvm::sys::fs::exists(Lib)) { -ArchiveOfBundles = Lib; -FoundAOB = true; + if (!Lib.st

[PATCH] D135791: [Clang] Do not crash when an invalid offload architecture is set

2022-10-12 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. Comment at: clang/lib/Driver/Driver.cpp:4211 if (IsNVIDIAGpuArch(Arch)) -return Args.MakeArgStringRef(CudaArchToString(Arch)); +return StringRef(Args.MakeArgStringRef(CudaArchToString(Arch))); Th

[PATCH] D135796: [HIP] Detect HIP for Debian/Fedora

2022-10-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:309 + ROCmSearchDirs.emplace_back(D.SysRoot + "/usr/local", + /*StrictChecking=*/true); yaxunl wrote: > tra wrote: > > Should it be done for Debian/Fedor

[PATCH] D135832: Do not append terminating NUL to the string with embedded GPU binary.

2022-10-12 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. Herald added a subscriber: bixia. Herald added a project: All. tra retitled this revision from "Do not append terminating NUL to the binary string with embedded fatbin." to "Do not append terminating NUL to the string with embedded GPU binary.". tra added a reviewer: ya

[PATCH] D135832: Do not append terminating NUL to the string with embedded GPU binary.

2022-10-14 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 467957. tra added a comment. Reworked string/array generation. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D135832/new/ https://reviews.llvm.org/D135832 Files: clang/lib/CodeGen/CGCUDANV.cpp clang/test/CodeGe

[PATCH] D135832: Do not append terminating NUL to the string with embedded GPU binary.

2022-10-14 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CGCUDANV.cpp:95 llvm::ConstantInt::get(SizeTy, 0)}; -auto ConstStr = CGM.GetAddrOfConstantCString(Str, Name.c_str()); +auto ConstStr = CGM.GetAddrOfConstantCString(Str, Name.c_str(),

[PATCH] D135832: Do not append terminating NUL to the string with embedded GPU binary.

2022-10-17 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGa10eb07d1acc: Do not append terminating NUL to the binary string with embedded fatbin. (authored by tra). Repository: rG LLVM Github Monorepo CHA

[PATCH] D140433: [Clang] Add `nvptx-arch` tool to query installed NVIDIA GPUs

2023-01-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/tools/nvptx-arch/CMakeLists.txt:19 +if (NOT CUDA_FOUND OR NOT cuda-library) + message(STATUS "Not building nvptx-arch: cuda runtime not found") + return() Nit: libcuda.so is part of the NVIDIA driver which provides N

[PATCH] D140433: [Clang] Add `nvptx-arch` tool to query installed NVIDIA GPUs

2023-01-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/tools/nvptx-arch/NVPTXArch.cpp:34 + const char *ErrStr = nullptr; + CUresult Result = cuGetErrorString(Err, &ErrStr); + if (Result != CUDA_SUCCESS) jhuber6 wrote: > tra wrote: > > One problem with this approach is t

[PATCH] D141078: [CUDA][HIP] Support '--offload-arch=native' for the new driver

2023-01-05 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/test/Driver/amdgpu-hip-system-arch.c:17 // RUN: | FileCheck %s --check-prefix=NO-OUTPUT-ERROR +// RUN: %clang -### --target=x86_64-unknown-linux-gnu -nog

[PATCH] D141437: [HIP] Use .hipi as preprocessor output extension

2023-01-11 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp:105 + " cui - cuda-cpp-output\n" + " hipi - hip-cpp-outpu\n" +

[PATCH] D137251: [clang][cuda/hip] Allow `__noinline__` lambdas

2022-11-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM in principle. Please wait for @rsmith's OK. Comment at: clang/lib/Parse/ParseExprCXX.cpp:1297-1308 +while (true) { + if (Tok.is(tok::kw___attribute)) { +ParseGNUAttributes(Attr, nullptr, &D); + } else if (Tok.is(tok::kw___noinline

[PATCH] D136919: [X86][RFC] Change mangle name of __bf16 from u6__bf16 to DF16b

2022-11-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. NVPTX has no current uses of this, so if the mangling needs to change, now would be the good time. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D136919/new/ https://reviews.llvm.org/D136919 ___

[PATCH] D136796: [llvm-objdump][Offload] Use common offload extraction method

2022-11-03 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM, modulo separating clang-related change into a separate patch. Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1262-1265 +if (identify_magic((*BufferO

[PATCH] D137154: Adding nvvm_reflect clang builtin

2022-11-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:827 +BUILTIN(__nvvm_reflect, "icC*", "nc") + Do we actually need this patch at all. `extern "C" int __nvvm_reflect(const char *);` appears to work just fine already: https://go

[PATCH] D136919: [X86][RFC] Change mangle name of __bf16 from u6__bf16 to DF16b

2022-11-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > The mangling should be aligned with its semantics. It's fine to use u6__bf16 > if a target doesn't want to support arithmetic operations. We (speaking for CUDA/NVPTX) do want to support math on bfloat, just didn't get to implementing it yet. NVPTX will likely start suppor

[PATCH] D137154: Adding nvvm_reflect clang builtin

2022-11-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:827 +BUILTIN(__nvvm_reflect, "icC*", "nc") + hdelan wrote: > tra wrote: > > Do we actually need this patch at all. > > > > `extern "C" int __nvvm_reflect(const char *);` appears

[PATCH] D136919: [X86][RFC] Change mangle name of __bf16 from u6__bf16 to DF16b

2022-11-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D136919#3909044 , @rjmccall wrote: > If Steve's argument is true, that suggests that it would be a waste for NVPTX > to directly support BF16 arithmetic, at least not the same way it would > support `float` or `double`. (Providi

[PATCH] D137154: Adding nvvm_reflect clang builtin

2022-11-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: jhuber6. tra added a comment. In D137154#3917692 , @hdelan wrote: > Thanks for feedback. Instead of adding `__nvvm_reflect` as a clang builtin, > would it be acceptable if I modified the NVVMReflect pass That would be less proble

[PATCH] D137154: Adding nvvm_reflect clang builtin

2022-11-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > Yes, this probably would become untenable with really large sizes. I think that horse had left the barn long ago. Vast majority of NVIDIA GPU cycles these days spent in libraries shipped by NVIDIA and those are huge (O(1GB)), numerous (cuDNN, TensorRT, cuBLAS, cuSPARSE, c

[PATCH] D142484: [LinkerWrapper] Only import static libraries with needed symbols

2023-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: MaskRay. tra added a comment. @MaskRay - we seem to be reinventing the linker here and could use your expertise. Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1217-1218 +/// 1) It defines an undefined symbol in a regular object f

[PATCH] D142459: [clang] Deprecate uses of GlobalObject::getAlignment

2023-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CGCUDANV.cpp:491 new llvm::LoadInst(Var->getType(), ManagedVar, "ld.managed", false, - llvm::Align(Var->getAlignment()), I); + Var->getAlign().valueOrOne(),

[PATCH] D142484: [LinkerWrapper] Only import static libraries with needed symbols

2023-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1217-1218 +/// 1) It defines an undefined symbol in a regular object filie. +/// 2) It defines a global symbol without hidden visibility that has not +/// yet been defined. +Ex

[PATCH] D142484: [LinkerWrapper] Only import static libraries with needed symbols

2023-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. We could also use more test cases. E.g. weak symbols (should not cause object extraction) Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1197 + +Resolved |= ((OldSym & Sym_Undefined && !(OldSym & Sym_Weak)) && + !(*

[PATCH] D142459: [clang] Deprecate uses of GlobalObject::getAlignment

2023-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: yaxunl. tra added inline comments. Comment at: clang/lib/CodeGen/CGCUDANV.cpp:491 new llvm::LoadInst(Var->getType(), ManagedVar, "ld.managed", false, - llvm::Align(Var->getAlignment()), I); +

[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D141717/new/ https://reviews.llvm.org/D141717 ___ cfe-commi

[PATCH] D142484: [LinkerWrapper] Only import static libraries with needed symbols

2023-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Please wait a bit before landing it, in case @MaskRay has something to say. Comment at: clang/test/Driver/linker-wrapper-libs.c:38-39 + +// LIBRARY-RESOLVES: clang{{.*}} -o

[PATCH] D142484: [LinkerWrapper] Only import static libraries with needed symbols

2023-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > I'm somewhat hoping to get this in before the fork that happens in a few > hours. OK. We can fix it later if it turns out that we've missed something. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D142484/new/ https://review

[PATCH] D123534: [dwarf] Emit a DIGlobalVariable for constant strings.

2023-01-27 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. This patch appears to break CUDA compilation: https://github.com/llvm/llvm-project/issues/58491 We apparently emit the symbol with a character (`.`) that can't be used on the target. Normally such characters get renamed/escaped, as you can see in the generated function nam

[PATCH] D142907: LangRef: Add "dynamic" option to "denormal-fp-math"

2023-01-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. My $.02, mostly on the style and the mechanics of applying the new attribute. FP semantics aspects are above my pay grade. Comment at: clang/lib/CodeGen/CGCall.cpp:2059-2061 + F.removeFnAttrs(AttrsToRemove); + addDenormalModeAttrs(Merged, MergedF32, Func

[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. For what it's worth, NVIDIA has started deprecating 32-bit binaries long ago (https://forums.developer.nvidia.com/t/deprecation-plans-for-32-bit-linux-x86-cuda-toolkit-and-cuda-driver/31356) and the process had finally come to the end with the release of CUDA-12: CUDA-12 r

[PATCH] D142907: LangRef: Add "dynamic" option to "denormal-fp-math"

2023-01-31 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM for the parts I've commented on. Comment at: clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu:77 + +// CHECK: kernel_f32({{.*}}) #[[$KERNELATTR:[0-9]+]] +__global__ void kernel_f32(float* out, float* a, float* b, float* c) {

[PATCH] D141467: [clang][Driver][CUDA] Get rid of unused LibPath

2023-01-11 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Given that it's indeed unused, I'm fine with removing it. That said, it's somewhat odd that in your setup clang was able to find everything but the library directory. You generally would need to hav

[PATCH] D141555: [CUDA] added cmath wrappers to unbreak CUDA compilation after D79555

2023-01-12 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. Herald added subscribers: mattd, bixia, yaxunl. Herald added a project: All. tra updated this revision to Diff 488733. tra added a comment. tra updated this revision to Diff 488734. tra published this revision for review. tra added reviewers: jlebar, philnik. Herald added

[PATCH] D141555: [CUDA] added cmath wrappers to unbreak CUDA compilation after D79555

2023-01-12 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 488736. tra added a comment. Reformatted the code Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D141555/new/ https://reviews.llvm.org/D141555 Files: clang/lib/Headers/CMakeLists.txt clang/lib/Headers/cuda_wrapp

[PATCH] D141555: [CUDA] added cmath wrappers to unbreak CUDA compilation after D79555

2023-01-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D141555#4048843 , @jlebar wrote: > LGTM. Do we need changes to the test-suite to cover this too? (test-suite > being in a separate repo, so it would be a separate patch.) test-suite on CUDA bots already covering this and the

[PATCH] D141555: [CUDA] added cmath wrappers to unbreak CUDA compilation after D79555

2023-01-12 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG1ad5f6af816a: [CUDA] added cmath wrappers to unbreak CUDA compilation after D79555 (authored by tra). Repository: rG LLVM Github Monorepo CHANGES

[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Textual output for "-S -emit-llvm" is the canonical behavior, so I would prefer it working that way in as many cases as possible and only override it when necessary. Would it be possible to enforce binary IR generation in cases you need it? Or to prove that this is equival

[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D141717#4052587 , @jhuber6 wrote: > Well you'll get textual output for the host output, but the device code > embedded in the host module will be bitcode instead. So the final output from > the compiler is still textual IR. It ju

[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D141717#4052824 , @jhuber6 wrote: > For `-E` we don't embed anything, That was just an exaggerated example of top-level options affecting sub-compilation output where you can't magically tweak it to produce the kind of output yo

[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > So are you suggesting that we complete the whole pipeline? So -S -emit-llvm > gives host IR, but the device will go all the way to object? That would match my expectations and would solve the " it can't be handled by LTO or anything else." issue you've highlighted above.

[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2023-01-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM overall, with few nits. Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:448-450 + // If we are invoking `nvlink` internally we need to output a `.cubin` file. + // Checking if the output is a temporary is the cleanest way to determine + // this. Pu

<    7   8   9   10   11   12   13   14   15   16   >