[PATCH] D128090: [Clang][OpenMP] Process multi-arch compilation options given via -march

2022-07-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > Overloading the meaning of `-march` might not work here. Typically `-march` > is just checked via `Args.getLastArg()`, so repeated uses just override the > last one. I'm not exactly sure what the expected use is however, maybe @tra > can help there. Although we could cons

[PATCH] D128090: [Clang][OpenMP] Process multi-arch compilation options given via -march

2022-07-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D128090#3648999 , @jhuber6 wrote: > Right now there's `CLANG_OPENMP_NVPTX_DEFAULT_ARCH`, which is defined by > CMake to be the architecture of the system used to build clang That does not make sense to me. Most of the time clang

[PATCH] D128090: [Clang][OpenMP] Process multi-arch compilation options given via -march

2022-07-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D128090#3649125 , @jhuber6 wrote: > It just defaults to `sm_35` if CUDA isn't present on the system IIRC. > Alternatively we could ship a tool to derive it at compile time. As it happens, recent CUDA releases ship with `bin/__nvc

[PATCH] D129655: [CUDA] Allow the new driver to compile CUDA in non-RDC mode

2022-07-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Nice. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6999 +if (IsRDCMode) + CmdArgs.push_back("-fgpu-rdc"); + } else if (IsCuda && !HostOffloadingInputs.empty() && !IsRDCMode) { This should not be necessary -- we've already add

[PATCH] D128090: [Clang][OpenMP] Process multi-arch compilation options given via -march

2022-07-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D128090#3649235 , @jhuber6 wrote: > Interesting, may be worthwhile to query that if it exists, though AMD does > this with `amdgpu-arch` which has led to problems for me in the past. But > even if it could be wrong it will still

[PATCH] D129655: [CUDA] Allow the new driver to compile CUDA in non-RDC mode

2022-07-13 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:7009 + // Host-side offloading compilation receives all device-side outputs. Include + // them in the host compilation depen

[PATCH] D129885: [CUDA] Make the new driver properly ignore non-CUDA inputs

2022-07-15 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D129885/new/ https://reviews.llvm.org/D129885 ___ cfe-c

[PATCH] D129536: [CUDA][FIX] Make shfl[_sync] for unsigned long long non-recursive

2022-07-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D129536#3663957 , @jdoerfert wrote: > @tra, unsure about the crash. For me this passes fine (no gpu), is anything > missing? The tests in the patch are running with `-emit-llvm`, so they are not actually lowering to NVPTX and th

[PATCH] D129536: [CUDA][FIX] Make shfl[_sync] for unsigned long long non-recursive

2022-07-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D129536#3666860 , @jdoerfert wrote: > The assertion is arguably not great but doesn't really matter, does it? How > would I detect if they are supported? The latest revision of the patch is fine in this regard. My comment pointin

[PATCH] D130387: [CUDA/SPIR-V] Force passing aggregate type byval

2022-07-22 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with a couple of nits. Comment at: clang/lib/CodeGen/TargetInfo.cpp:10450 + +if (isAggregateTypeForABI(Ty)) { + return getNaturalAlignIndirect(Ty, /* byval */ true);

[PATCH] D130387: [CUDA/SPIR-V] Force passing aggregate type byval

2022-07-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D130387#3672961 , @shangwuyao wrote: > Accidentally submitted early... The landed revision seems to have my comments addressed. Was there something missing? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION ht

[PATCH] D130387: [CUDA/SPIR-V] Force passing aggregate type byval

2022-07-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. It's OK. If something comes up you can address it in a follow-up patch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D130387/new/ https://reviews.llvm.org/D130387 ___ cfe-commits mai

[PATCH] D124866: [CUDA][HIP] support __noinline__ as keyword

2022-05-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D124866#3501203 , @aaron.ballman wrote: >>> Should we do `__forceinline__` at the same time so that there's consistency? > > Primarily to reduce user confusion. It's kind of weird for `__noinline__` to > be a keyword and `__force

[PATCH] D123471: [CUDA] Create offloading entries when using the new driver

2022-05-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CGCUDARuntime.h:58 + /// Mark the entry as a global variable. + OffloadGlobalEntry = 0x0, + /// Mark the entry as a managed global variable. jhuber6 wrote: > tra wrote: > > jhuber6 wrote: > >

[PATCH] D125265: [OpenMP] Don't include the device wrappers if -nostdinc is used

2022-05-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:1311 if (JA.isDeviceOffloading(Action::OFK_OpenMP) && + !Args.hasArg(options::OPT_nostdlib) && (getToolChain().getTriple().isNVPTX() || Did you mean `nostdinc` ? Reposit

[PATCH] D125265: [OpenMP] Don't include the device wrappers if -nostdinc is used

2022-05-09 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM in general. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:1311 if (JA.isDeviceOffloading(Action::OFK_OpenMP) && + !Args.hasArg(options::OPT_nostdlib) && (ge

[PATCH] D123471: [CUDA] Create offloading entries when using the new driver

2022-05-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CGCUDARuntime.h:58 + /// Mark the entry as a global variable. + OffloadGlobalEntry = 0x0, + /// Mark the entry as a managed global variable. jhuber6 wrote: > tra wrote: > > jhuber6 wrote: > >

[PATCH] D124866: [CUDA][HIP] support __noinline__ as keyword

2022-05-09 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. In D124866#3501641 , @yaxunl wrote: > If we are to add `__forceinline__` as a keyword, I feel it better be a > separate patch to be cleaner. Fine with me.

[PATCH] D125165: [Clang] Introduce clang-offload-binary tool to bundle device files

2022-05-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/docs/ClangOffloadBinary.rst:8 + +.. _clang-offload-binary: + Naming nit: `binary` may not be the best term for what we're trying to do here. Perhaps something like `package`, `container` or `collection` would work bet

[PATCH] D124866: [CUDA][HIP] support __noinline__ as keyword

2022-05-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/SemaCUDA/noinline.cu:8 +__attribute__((noinline)) void fun2() { } +__attribute__((__noinline__)) void fun3() { } aaron.ballman wrote: > yaxunl wrote: > > aaron.ballman wrote: > > > yaxunl wrote: > > > > aaron.ball

[PATCH] D125165: [Clang] Introduce clang-offload-packager tool to bundle device files

2022-05-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: tstellar. tra added a comment. LGTM in principle. Given that we're introducing a new tool dependency we may want to get a stamp from someone dealing with build and release. @tstellar -- do we need to change anything else for the new binary to ship with clang releases?

[PATCH] D123810: [Cuda] Add initial support for wrapping CUDA images in the new driver.

2022-05-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:188 + static inline OffloadKind getEmptyKey() { +return static_cast(0x); + } Extend `enum OffloadKind` to include these special kinds. Com

[PATCH] D125165: [Clang] Introduce clang-offload-packager tool to bundle device files

2022-05-10 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. OK. LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D125165/new/ https://reviews.llvm.org/D125165 ___

[PATCH] D123810: [Cuda] Add initial support for wrapping CUDA images in the new driver.

2022-05-10 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Please add a TODO around the items that need further work, so we don't forget about them. Review comments tend to fade from memory. Comment at: clang/tools/clang-linker-wrapper/Cl

[PATCH] D123812: [CUDA] Add wrapper code generation for registering CUDA images

2022-05-10 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/tools/clang-linker-wrapper/OffloadWrapper.cpp:351 +/// required for texture / surface / managed variables. +Function *createRegisterGlobalsFunction(Modu

[PATCH] D123810: [Cuda] Add initial support for wrapping CUDA images in the new driver.

2022-05-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:193 + } + static unsigned getHashValue(const OffloadKind &Val) { return Val * 37U; } + jhuber6 wrote: > tra wrote: > > Is there a particular reason for multiplying by

[PATCH] D125555: [clang] Add __has_target_feature

2022-05-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/docs/LanguageExtensions.rst:275 + // On amdgcn target + #if __has_target_feature("s-memtime-inst") +x = __builtin_amdgcn_s_memtime(); Do you have a better example? This particular case could've been handled by e

[PATCH] D125555: [clang] Add __has_target_feature

2022-05-16 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Oops. forgot to hit 'submit' last week, so there's some overlap with Aaron's question. Comment at: clang/docs/LanguageExtensions.rst:275 + // On amdgcn target + #if __has_target_feature("s-memtime-inst") +x = __builtin_amdgcn_s_memtime(); ---

[PATCH] D124382: [Clang] Recognize target address space in superset calculation

2022-05-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. My concerns have been addressed. I'll defer the final LGTM to @Anastasia. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D124382/new/ https://reviews.llvm.org/D124382 ___ cfe-commits m

[PATCH] D125829: [clang] Fix __has_builtin

2022-05-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: echristo. tra added a comment. @echristo - FYI, in case you have an opinion on target builtin feature checks. Comment at: clang/include/clang/Basic/Builtins.h:263 + ///false if it is disabled. + bool isRequiredTargetFeaturesEnabled( + unsigned

[PATCH] D125829: [clang] Fix __has_builtin

2022-05-18 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with a possible nit. Comment at: clang/lib/Basic/BuiltinTargetFeatures.h:32 +/// pairs. +class TargetFeatures { + struct FeatureListStatus { yaxunl wrote: >

[PATCH] D125909: [AMDGPU] emit macro __GFX9__ etc

2022-05-18 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Basic/Targets/AMDGPU.cpp:414 + assert(CanonName.startswith("gfx") && "Invalid amdgcn canonical name"); + Builder.defineMacro(Twine("__") + Twine

[PATCH] D126226: [OpenMP] Add `-Xoffload-linker` to forward input to the device linker

2022-05-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. `-Xoffload-linker= ` The syntax is confusing. Normally only `triple` would be the argument for `-Xoffload-linker` option. Having both `-Xoffload-linker` and `-Xoffload-linker=` variants also looks odd to me. In effect you're making `-Xoffload-linker=foo` a full option (as

[PATCH] D126226: [OpenMP] Add `-Xoffload-linker` to forward input to the device linker

2022-05-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D126226#3532147 , @jhuber6 wrote: > We already use this approach for the `-Xopenmp-target= ` option > that forwards arguments to the given toolchain, so that's what I was using as > a basis. Yup. I've noticed that. It's unfortun

[PATCH] D126226: [OpenMP] Add `-Xoffload-linker` to forward input to the device linker

2022-05-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D126226#3532257 , @MaskRay wrote: > It's better to avoid `JoinedAndSeparate` for new options. It is for `--xxx > val` and `--xxxval` but not intended for the option this patch will add. I'm not sure I understand your argument. Th

[PATCH] D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given

2022-05-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > The one downside to this is that we are no longer stable under multiple > compilations of the same file. This is a moderately serious issue. Some users care about the build reproducibility. Recompiling the same sources and getting different results will trigger all sorts

[PATCH] D126226: [OpenMP] Add `-Xoffload-linker` to forward input to the device linker

2022-05-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D126226#3532471 , @jhuber6 wrote: > In D126226#3532423 , @tra wrote: > >> We keep running into the same old underlying issue that we do not have a >> good way to name/reference specific pa

[PATCH] D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given

2022-05-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D125904#3532545 , @jhuber6 wrote: > clang a.c -c -o a-0.o // Has some externalized static variable. > clang a.c -c -o a-1.o > clang a-0.o a-1.o // Redefined symbol error Ah. OK. This is a bit less of a concern. As long as we take

[PATCH] D126226: [OpenMP] Add `-Xoffload-linker` to forward input to the device linker

2022-05-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:826 Group; +def Xoffload_linker : Separate<["-"], "Xoffload-linker">, + HelpText<"Pass to the offload linker">, MetaVarName<"">, This option still stands out as a sore thumb. Could

[PATCH] D126226: [OpenMP] Add `-Xoffload-linker` to forward input to the device linker

2022-05-23 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Couple of nits. LGTM otherwise. Comment at: clang/include/clang/Driver/Options.td:827 +def Xoffload_linker : JoinedAndSeparate<["-"], "Xoffload-linker">, + HelpText<"Pass to the

[PATCH] D126158: [MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration.

2022-05-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I would suggest separating it into separate LLVM and MLIR patches. LLVM changes look OK to me. No idea about MLIR. we would probably want to lower fp16 fdiv the same way in LLVM, too, but that would also have to be a separate patch. Repository: rG LLVM Github Monorepo

[PATCH] D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given

2022-05-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:6836 + + // If the CUID is not specified we try to generate a unique postfix. + if (getLangOpts().CUID.empty()) { > However, [CUID] is not always availible. The question is -- when and

[PATCH] D25845: [CUDA] Ignore implicit target attributes during function template instantiation.

2019-08-23 Thread Artem Belevich via Phabricator via cfe-commits
tra marked an inline comment as done. tra added inline comments. Comment at: cfe/trunk/lib/Sema/SemaDecl.cpp:8416 +// in the CheckFunctionTemplateSpecialization() call below. +if (getLangOpts().CUDA & !isFunctionTemplateSpecialization) + maybeAddCUDAHostDeviceAttrs(N

[PATCH] D66665: [CUDA] Use activemask.b32 instruction to implement __activemask w/ CUDA-9.2+

2019-08-23 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: timshen. Herald added subscribers: sanjoy.google, bixia. vote.ballot instruction is gone in recent CUDA versions and vote.sync.ballot can not be used because it needs a thread mask parameter. Fortunately PTX 6.2 (introduced with CUDA-9.2) provides a

[PATCH] D66665: [CUDA] Use activemask.b32 instruction to implement __activemask w/ CUDA-9.2+

2019-09-03 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rL370792: [CUDA] Use activemask.b32 instruction to implement __activemask w/ CUDA-9.2+ (authored by tra, committed by ). Herald added a project: LLVM. Herald added a subscriber: llvm-commits. Changed prior

[PATCH] D84362: [NFC] Add missing functions to PartialDiagnostic

2020-09-14 Thread Artem Belevich via Phabricator via cfe-commits
tra added subscribers: aaron.ballman, rtrieu. tra added a comment. So, the idea here is to do some sort of duck-typing and allow DiagBuilder to work with both `DiagnosticBuilder` and `PartialDiagnostic`. What bothers me is that unlike `Diagnostic` `PartialDiagnostic` seems to be commingling fun

[PATCH] D84362: [NFC] Add missing functions to PartialDiagnostic

2020-09-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D84362#2274790 , @yaxunl wrote: > There are use patterns expecting `PartialDiagnosticInst << X << Y` to > continue to be a `PartialDiagnostic&`, e.g. > > PartialDiagnosticAt PDAt(SourceLoc, PartialDiagnosticInst << X << Y); > > H

[PATCH] D84362: [NFC] Add missing functions to PartialDiagnostic

2020-09-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D84362#2275029 , @yaxunl wrote: > Perhaps we could inherit `PartialDiagnostic` from `DiagnosticBuilder` base class. This would probably be the least invasive approach as it would preserve existing uses while allowing reuse of co

[PATCH] D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic

2020-09-16 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. This is a very nice cleanup. Thank you, Sam. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D84362/new/ https://reviews.llvm.org/D84362 ___ cfe-co

[PATCH] D87791: [CUDA][HIP] Fix -gsplit-dwarf option

2020-09-16 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Does this naming scheme the same as used for `.o` files? We may want to keep them in sync. Other than that, LGTM. Comment at: clang/lib/Driver/ToolChains/CommonArgs.cpp:909 + au

[PATCH] D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic

2020-09-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Apparently this patch triggers compiler crashes on some of our code. I'll try to create a reproducer, but it would be great to revert the patch for now. *** SIGSEGV (@0x7f68892d8fe8), received by PID 8210 (TID 8225) on cpu 65; stack trace: *** PC: @ 0x7f686d8a169d

[PATCH] D87791: [CUDA][HIP] Fix -gsplit-dwarf option

2020-09-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D87791#2278417 , @yaxunl wrote: > Therefore in either case there is no need to rename the intermediate .o files > since they are temporary files which have unique names. > > The .dwo files are not temporary files. They are supposed

[PATCH] D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic

2020-09-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D84362#2279688 , @tra wrote: > Apparently this patch triggers compiler crashes on some of our code. I'll try > to create a reproducer, but it would be great to revert the patch for now. It's likely the same issue as the one report

[PATCH] D87791: [CUDA][HIP] Fix -gsplit-dwarf option

2020-09-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D87791#2279864 , @yaxunl wrote: > It is requested by our debugger team, so it should work with amdgpu. Is the naming scheme for GPU-side DWO files dictated by debugger? If that's the case, it may be worth adding a comment about th

[PATCH] D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic

2020-09-18 Thread Artem Belevich via Phabricator via cfe-commits
tra reopened this revision. tra added a comment. This revision is now accepted and ready to land. In D84362#2282890 , @yaxunl wrote: > I have a fix for the issue reported in D84364 > . Would you like to try? Thanks. I can

[PATCH] D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic

2020-09-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D84362#2282992 , @yaxunl wrote: > The fix is for the change in D84364 . It has > no effect on the change in this review. Are you sure the issue you saw is due > to change in this review instead of

[PATCH] D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic

2020-09-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D84362#2283065 , @yaxunl wrote: > I think this is probably a different issue. The issue reported in D84364 > was introduced by change in D84364 > . It's possible.

[PATCH] D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic

2020-09-21 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. In D84362#2283078 , @tra wrote: > It's possible. Unfortunately it's only triggered by our internal tool and > it's hard to create a public reproducer for it. I'll debug and try to fix it > on Monday. It a

[PATCH] D88250: [CUDA] Added dim3/uint3 conversion functions to builtin vars.

2020-09-24 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. Herald added subscribers: sanjoy.google, bixia, yaxunl. Herald added a project: clang. tra requested review of this revision. This is needed to compile some headers in CUDA-11 that assume that threadIdx is implicitly convertible to dim3. Wit

[PATCH] D88250: [CUDA] Added dim3/uint3 conversion functions to builtin vars.

2020-09-24 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 294125. tra edited the summary of this revision. tra added a comment. Fixed compatibility with pre-c++11 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D88250/new/ https://reviews.llvm.org/D88250 Files: clang/lib/

[PATCH] D88250: [CUDA] Added dim3/uint3 conversion functions to builtin vars.

2020-09-24 Thread Artem Belevich via Phabricator via cfe-commits
tra marked an inline comment as done. tra added a comment. In D88250#2293270 , @jlebar wrote: > I know it comes in a separate change, but can we add a check to the > test-suite? Will do. Comment at: clang/lib/Headers/__clang_cuda_runt

[PATCH] D88250: [CUDA] Added dim3/uint3 conversion functions to builtin vars.

2020-09-24 Thread Artem Belevich via Phabricator via cfe-commits
tra marked an inline comment as done. tra added a comment. In D88250#2293346 , @tra wrote: > In D88250#2293270 , @jlebar wrote: > >> I know it comes in a separate change, but can we add a check to the >> test-suite

[PATCH] D88250: [CUDA] Added dim3/uint3 conversion functions to builtin vars.

2020-09-24 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG30514f0afa3e: [CUDA] Added conversion functions to builtin vars. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D88345: [CUDA] Allow `static const {__constant__, __device__}` variables.

2020-09-25 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: jlebar, yaxunl. Herald added subscribers: sanjoy.google, bixia. Herald added a project: clang. tra requested review of this revision. While CUDA documentation claims that such variables are not allowed[1], NVCC has been accepting them since CUDA-10.0

[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-09-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D88345#2296118 , @jlebar wrote: > wha... As you know, `const` doesn't mean anything, that can be const-casted > away. And then you'll be able to observe that this nominally-static variable > is just a normal variable. Yes, I'm a

[PATCH] D88425: Skip -fPIE for AMDGPU and HIP toolchain

2020-09-28 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM Comment at: clang/test/Driver/hip-fpie-option.hip:34 + +// DEV-DAG: {{".*clang.*".* "-triple" "amdgcn-amd-amdhsa".* "-mrelocation-model" "pic" "-pic-level" "[1|2]" "-mframe-p

[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-09-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D88345#2298688 , @jlebar wrote: > OK, backing up, what are the semantics of `static` on `__constant__`, > `__device__`, and `__shared__`? > > - My understanding is that `__shared__` behaves the same whether or not it's > static.

[PATCH] D86376: [HIP] Improve kernel launching latency

2020-08-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. How much does this inlining buy you in practice? I.e. what's a typical launch latency before/after the patch? For CUDA, config push/pop is negligible compared to the cost of actually launching the kernel on the GPU. It is measurable if the launch is asynchronous, but queuei

[PATCH] D86376: [HIP] Improve kernel launching latency

2020-08-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I'm OK with how the patch is implemented. I'm still on the fence regarding whether it should be implemented. In D86376#2234458 , @yaxunl wrote: > `__hipPushConfiguration/__hipPopConfiguration' and kernel stub can cause 40 > ns overhe

[PATCH] D86376: [HIP] Improve kernel launching latency

2020-08-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D86376#2234719 , @yaxunl wrote: >> This patch appears to be somewhere in the gray area to me. My prior >> experience with CUDA suggests that it will make little to no difference. On >> the other hand, AMD GPUs may be different eno

[PATCH] D86376: [HIP] Improve kernel launching latency

2020-08-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D86376#2236501 , @yaxunl wrote: > My previous measurements did not warming up, which caused some one time > overhead due to device initialization and loading of device binary. With warm > up, the call of `__hipPushCallConfigure/__

[PATCH] D86376: [HIP] Simplify kernel launching

2020-08-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D86376#2239391 , @yaxunl wrote: > For example, in HIP program, there is a kernel `void foo(int*)`. If a C++ > program wants to launch it, the desirable way is > > void foo(int*); > hipLaunchKernel(foo, grids, blocks, args, shme

[PATCH] D84364: [CUDA][HIP] Defer overloading resolution diagnostics for host device functions

2020-09-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM. Nice! To sum it up -- the patch introduces `-fgpu-defer-diag` flag which allows deferring overload resolution diagnostics, if overload set included candidates from both sides. We may be deferring cases when we don't have to (e.g. `df()->()callee2()` should've err

[PATCH] D91281: [CUDA][HIP] Diagnose reference of host variable

2020-12-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. It appears that we need to add special handling for texture/surface references. Nominally they are host-side objects, but they are accessed/used from device functions as far as Sema is concerned. E.g. https://godbolt.org/z/z1YnE3 NVCC and older clang compile it, but the re

[PATCH] D92893: [CUDA] Do not diagnose host/device variable access in dependent types.

2020-12-08 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: yaxunl, hliao. Herald added a subscriber: bixia. tra requested review of this revision. Herald added a project: clang. `isCUDADeviceBuiltinSurfaceType()`/`isCUDADeviceBuiltinTextureType()` do not work on dependent types as they rely on specific type

[PATCH] D91281: [CUDA][HIP] Diagnose reference of host variable

2020-12-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D91281#2441147 , @tra wrote: > I think `isCUDADeviceBuiltinTextureType` has problem handling texture refs > within templates. Proposed fix: https://reviews.llvm.org/D92893 Repository: rG LLVM Github Monorepo CHANGES SINCE LAS

[PATCH] D92617: [DWARF] Allow toolchain to adjust specified DWARF version.

2020-12-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/DiagnosticDriverKinds.td:296 +def warn_drv_dwarf_version_limited_by_target : Warning< + "debug information option '%0' is not supported. It needs DWARF-%2 but target '%1' only provides DWARF-%3.">, + InGroup; ---

[PATCH] D92954: [clang-offload-bundler] Add option -list

2020-12-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp:763 + +// Create an intermediate temporary file for reading the bundles. +TempFileHandlerRAII TempFiles; Having to create a temporary file in order to *list*

[PATCH] D92617: [DWARF] Allow toolchain to adjust specified DWARF version.

2020-12-09 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 310622. tra added a comment. Herald added a reviewer: jdoerfert. Herald added a subscriber: sstefan1. Reorganized tests for unsupported debug options & dwarf version clamping. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llv

[PATCH] D92617: [DWARF] Allow toolchain to adjust specified DWARF version.

2020-12-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/cuda-unsupported-debug-options.cu:18 +// RUN: %clang -### -target x86_64-linux-gnu -c %s -gdwarf-5 -gembed-source 2>&1 | FileCheck %s --check-prefix=DWARF-CLAMP +// CHECK: debug information option '{{-gz|-fdebug-info-for-

[PATCH] D92617: [DWARF] Allow toolchain to adjust specified DWARF version.

2020-12-09 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rG016e4ebfde28: [DWARF] Allow toolchain to adjust specified DWARF version. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D92617/new/ https:/

[PATCH] D92954: [clang-offload-bundler] Add option -list

2020-12-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp:188 + + if (Error Err = Func()) +return Err; Now, if we could save the triple in a `BundleInfo` when it's parsed, and pass `BundleInfo` to `Func()` that

[PATCH] D93068: [clang-offload-bundler] Add option -fail-on-missing-bundles

2020-12-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Comment at: clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp:980 + if (FailOnMissingBundles && !Worklist.empty()) { +std::string ErrMsg = "Can't find bundles for"; +std::set Sorted; Do we need to report complete list of m

[PATCH] D92893: [CUDA] Do not diagnose host/device variable access in dependent types.

2020-12-10 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 311065. tra edited the summary of this revision. tra added a comment. Herald added a reviewer: aaron.ballman. Found another corner case (reference within a template with the surface/texture attibute.) and figured out a better fix. Added a test case. Repository:

[PATCH] D92893: [CUDA] Do not diagnose host/device variable access in dependent types.

2020-12-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: rsmith. tra added a comment. My first variant of the patch only helped with some cases when the surface/texture attribute type was used. Trying to reduce real-world failure resulted in an example that I've added as the test case which was still failing with this patch app

[PATCH] D93181: [NFC][AMDGPU] Reformat AMD GPU targets in cuda.cpp

2020-12-14 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Basic/Cuda.cpp:75 SM(80), // Ampere -GFX(600), // tahiti -GFX(601), // pitcairn, verde -GFX(602), // oland, hainan -GFX(700), // kaveri -GFX(701), // hawaii -GFX(702), // 290,29

[PATCH] D92893: [CUDA] Do not diagnose host/device variable access in dependent types.

2020-12-14 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 311657. tra added a comment. Use __device__ in the test case. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D92893/new/ https://reviews.llvm.org/D92893 Files: clang/include/clang/Basic/Attr.td clang/test/SemaCU

[PATCH] D92893: [CUDA] Do not diagnose host/device variable access in dependent types.

2020-12-14 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG0936655bac78: [CUDA] Do not diagnose host/device variable access in dependent types. (authored by tra). Repository: rG LLVM Github Monorepo CHANG

[PATCH] D93068: [clang-offload-bundler] Add option -allow-missing-bundles

2020-12-14 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. The patch could use an OK with OMP folks, considering that we've changed the way offload bunder is invoked for OMP. Comment at: clang/tools/clang-offload-bundler/ClangOfflo

[PATCH] D92720: [HIP] unbundle bundled preprocessor output

2020-12-14 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. In D92720#2448890 , @yaxunl wrote: > Output of `-E` for HIP combined host/device compilation is a plain text. It > has C++ comments inserted between preproce

[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-11-03 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGbe86b6773b6b: [CUDA] Allow local static variables with target attributes. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LA

[PATCH] D90436: [Bundler] Use argv[0] as the default choice for the Executable name.

2020-11-03 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGcdbf6bfdc7d1: [HIP] Use argv[0] as the default choice for the Executable name. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SIN

[PATCH] D90409: [HIP] Math Headers to use type promotion

2020-11-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D90409#2371679 , @jlebar wrote: >> LGTM. I think the change would make sense for CUDA, too. @jlebar - WDYT? > > I agree that the C and C++ standard libraries should behave the same in CUDA > mode and host mode! > > But if doing so

[PATCH] D90409: [HIP] Math Headers to use type promotion

2020-11-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D90409#2371969 , @yaxunl wrote: > nvcc does not support fma(float,float,char) It does, it just needs an explicit flag to match clang's treatment of `constexpr` functions as HD. Repository: rG LLVM Github Monorepo CHANGES SINC

[PATCH] D90409: [HIP] Math Headers to use type promotion

2020-11-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D90409#2372042 , @yaxunl wrote: >> Practically the behavior is the same since they all promote integer types to >> double. This matches the C++ behavior. However the HIP change will make it >> conform to C++ for a target supportin

[PATCH] D91088: [CUDA][HIP] Fix capturing reference to host variable

2020-11-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: rsmith. tra added a comment. The use case in the tests makes sense, but I'd like to have a second opinion on whether it may have unintended consequences. Lambdas tend to bring interesting corner cases. @rsmith Richard, do you see any issues with this approach? ===

[PATCH] D91088: [CUDA][HIP] Fix capturing reference to host variable

2020-11-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Ouch. This is definitely a bug. https://godbolt.org/z/faPWcv Granted, the compilation will eventually fail in most cases due to ptxas failing to resolve the reference to the variable we do not generate on the device, but it's something we should catch in Sema. Filed https

[PATCH] D91088: [CUDA][HIP] Fix capturing reference to host variable

2020-11-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D91088#2388496 , @yaxunl wrote: > added diagnosing referencing host variable in device functions Thank you for fixing this. The new changes look good, but I think they should be in a separate patch. Comment at:

[PATCH] D91281: [CUDA][HIP] Diagnose reference of host variable

2020-11-11 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Comment at: clang/lib/Sema/SemaExpr.cpp:357 + if (LangOpts.CUDAIsDevice) { +auto *FD = dyn_cast_or_null(CurContext); This could use a comment why we o

[PATCH] D90174: [HIP] Fix regressions due to fp contract change

2020-11-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D90174#2387518 , @scanon wrote: > Strictly speaking, fp-contract=fast probably should have been a separate flag > entirely (since there's no _expression_ being contracted in fast). > Unfortunately, that ship has sailed, and it doe

<    3   4   5   6   7   8   9   10   11   12   >