[PATCH] D97318: [clang][CodeGen] Allow fp16 arg pass by register

2021-02-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/TargetInfo.cpp:2825 + // gcc where 16 bit integer is used in place of _Float16 or __fp16. + Lo = Integer; } Do we need to set `Hi`, too? We do set it for `int128`. CHANGES SINCE LAST ACTION

[PATCH] D97340: [HIP] Support Spack packages

2021-02-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:25-31 +// Look for sub-directory starts with Prefix under Path. If there is one and +// only one matching sub-directory found, append the sub-directory to Path. If +// there is no matching sub-directory

[PATCH] D97340: [HIP] Support Spack packages

2021-02-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:32 +static llvm::SmallString<0> findSPACKPackage(const Driver &D, + const llvm::SmallString<0> &Path, + StringRef

[PATCH] D103563: [HIP] Fix amdgcn builtin for long type

2021-06-03 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. Still LGTM. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D103563/new/ https://reviews.llvm.org/D103563 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mai

[PATCH] D103579: [LTO] Fix -fwhole-program-vtables handling after HIP ThinLTO patch

2021-06-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/hip-options.hip:63 // RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \ -// RUN: --cuda-gpu-arch=gfx906 -foffload-lto=thin %s 2>&1 \ -// RUN: | FileCheck -check-prefix=THINLTO %s +// RUN: --cuda

[PATCH] D103579: [LTO] Fix -fwhole-program-vtables handling after HIP ThinLTO patch

2021-06-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/hip-options.hip:63 // RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \ -// RUN: --cuda-gpu-arch=gfx906 -foffload-lto=thin %s 2>&1 \ -// RUN: | FileCheck -check-prefix=THINLTO %s +// RUN: --cuda

[PATCH] D103579: [LTO] Fix -fwhole-program-vtables handling after HIP ThinLTO patch

2021-06-03 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM for CUDA. @yaxunl Sam, does the change make sense for HIP? Comment at: clang/test/Driver/cuda-options.cu:190-192 +// RUN: | FileCheck -check-prefix DEVICE -check-prefix DEVIC

[PATCH] D103658: CUDA/HIP: Change device-use-host-var.cu's NOT "external" check to include "addrspace"

2021-06-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/CodeGenCUDA/device-use-host-var.cu:68 // NEG-NOT: @_ZL13var_host_only -// NEG-NOT: external +// NEG-NOT: external addrspace This may be too specific. What if we end up generating a variable in generic AS which

[PATCH] D101630: [HIP] Fix device-only compilation

2021-06-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D101630#2798975 , @yaxunl wrote: > For sure we will need -fgpu-bundle-device-output to control bundling of > intermediate files. Then adding -emit-gpu-object and -emit-gpu-bundle may be > redundant and can cause confusion. What i

[PATCH] D101630: [HIP] Fix device-only compilation

2021-06-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D101630#2799425 , @yaxunl wrote: > But how do we control emitting LLVM IR with or without bundle? `-emit-llvm > -emit-gpu-object` or `-emit-llvm -emit-gpu-bundle`? `-emit-*` is usually for > specifying a specific file type. Hmm.

[PATCH] D103108: [CUDA][HIP] Promote const variables to constant

2021-06-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/CodeGenCUDA/device-use-host-var.cu:68 +// NEG-NOT: @_ZL13var_host_only +// NEG-NOT: external yaxunl wrote: > sbc100 wrote: > > This seems to break if the pathname where the llvm checkout lives contains > > the

[PATCH] D101630: [HIP] Add --gpu-bundle-output

2021-06-07 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Driver/Driver.cpp:2903 bool GPUSanitize; +Optional BundleOutput; We should document the behavior we expect from the `--gpu-bun

[PATCH] D104505: [HIP] Defer operator overloading errors

2021-06-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I don't think we want to do this. struct S { S& operator <<(int x); }; __device__ void foo() { S s; s<<1; } :7:6: error: invalid operands to binary expression ('S' and 'int') s<<1; ~^ ~ :2:8: note: candidate function not viable:

[PATCH] D104505: [HIP] Defer operator overloading errors

2021-06-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D104505#2831644 , @yaxunl wrote: > However, this does cause source level incompatibilities, i.e. CUDA code that > passes nvcc does not pass clang. This patch somehow addresses that without > compromising clang's more sophisticate

[PATCH] D104505: [HIP] Defer operator overloading errors

2021-06-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D104505#2833271 , @yaxunl wrote: > Such host/device overloading resolution induced issue is not limited to > device functions calling host functions. It does not change the fact that the code in the test above is invalid, regard

[PATCH] D104505: [HIP] Defer operator overloading errors

2021-06-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D104505#2833943 , @yaxunl wrote: > We don't defer such diags by default. We only defer them under option > -fgpu-defer-diags, which users have to specify explicitly. Thank you for pointing this out. I've missed that all the test

[PATCH] D104505: [HIP] Defer operator overloading errors

2021-06-23 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/test/SemaCUDA/deferred-oeverload.cu:55 callee3(); // dev-error {{no matching function for call to 'callee3'}} callee4(); // com-error {{no matching functi

[PATCH] D102507: [HIP] Support in device code

2021-06-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. The key difference between C++ and CUDA/HIP, as implemented in clang, is that `__host__` and `__device__` attributes are considered during function overloading in CUDA and HIP, so `__host__ void foo()`, `__device__ void foo()` and `__host__ __device__ void foo()` are three

[PATCH] D104847: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions

2021-06-24 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Nice. Thank you for adding support for these missing instructions! LGTM, modulo a few of cosmetic nits. Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:762 +// Builtins t

[PATCH] D106960: [OffloadArch] Library to query properties of current offload archicture

2021-08-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D106960#2925610 , @ye-luo wrote: > my second GPU is NVIDIA 3060Ti (sm_86) > I build my app daily with -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_80. > > About sm_80 binary able ot run on sm_86 > https://docs.nvidia.com/cuda/ampe

[PATCH] D107492: [clang] Replace asm with __asm__ in cuda header

2021-08-04 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM in general. Comment at: clang/lib/Headers/__clang_cuda_device_functions.h:37 #if defined(__cplusplus) -__DEVICE__ void __brkpt() { asm volatile("brkpt;"); } +__DEVICE__ void

[PATCH] D107492: [clang] Replace asm with __asm__ in cuda header

2021-08-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D107492#2926871 , @JonChesterfield wrote: > Therefore I'd like to leave it as `__asm__ volatile`. Being the one who introduced inconsistent use of `__volatile__` and `volatile` in this header, I'm pretty sure that PTX's notion o

[PATCH] D106401: [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA.

2021-08-05 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 364653. tra added a comment. Updated post D106769 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106401/new/ https://reviews.llvm.org/D106401 Files: clang/lib/Driver/ToolChains/

[PATCH] D106401: [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA.

2021-08-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I've updated the patch and added a test to verify that the knob does work as expected. Please take a look. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106401/new/ https://reviews.llvm.org/D106401 ___

[PATCH] D106401: [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA.

2021-08-06 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 364847. tra added a comment. rebase to HEAD. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106401/new/ https://reviews.llvm.org/D106401 Files: clang/lib/Driver/ToolChains/Cuda.cpp llvm/lib/Transforms/Scalar/Me

[PATCH] D106401: [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA.

2021-08-06 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG6a9cf21f5a2d: [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA. (authored by tra). Repository: rG LLVM Github Monorepo

[PATCH] D107718: [cuda] Mark builtin texture/surface reference variable as 'externally_initialized'.

2021-08-09 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. CUDA lowers texture/surface types into a special handle. I do not think `externally_initialized` matters for it. AFAICT this change is a no-op for CUDA. Repository: rG LLVM Github Monorepo CHANG

[PATCH] D77670: [CUDA] Add partial support for recent CUDA versions.

2021-08-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D77670#2943753 , @Hahnfeld wrote: > @tra The split between `LATEST` and `LATEST_SUPPORTED` leads to very weird > warning and error messages: Agreed, it's far from ideal. There's also more than one issue involved. > clang-14: warn

[PATCH] D77670: [CUDA] Add partial support for recent CUDA versions.

2021-08-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D77670#292 , @Hahnfeld wrote: >> It's also time to bump the default GPU target to something that's supported >> by the CUDA versions we reasonably expect to see. That should probably be >> sm_35 as that's probably the oldest G

[PATCH] D108235: [CUDA] Bump default GPU architecture to sm_35.

2021-08-17 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: Hahnfeld. Herald added subscribers: ormris, steven_wu, bixia, hiraditya, yaxunl, emaste. tra requested review of this revision. Herald added a project: clang. It's the oldest GPU architecture currently supported by all CUDA versions clang can use.

[PATCH] D108235: [CUDA] Bump default GPU architecture to sm_35.

2021-08-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. cuda-options-freebsd.cu has been removed as it does not test anything useful -- it's a somewhat out of date copy of cuda-options.cu with replaced host triple, which does not matter for processing GPU options. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D108239: [CUDA] Add support for CUDA-11.4.

2021-08-17 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: yaxunl. Herald added subscribers: dexonsmith, bixia, hiraditya, jholewinski. tra requested review of this revision. Herald added projects: clang, LLVM. Herald added a subscriber: llvm-commits. Repository: rG LLVM Github Monorepo https://reviews.l

[PATCH] D108247: [CUDA] Improve CUDA version detection and diagnostics.

2021-08-17 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: Hahnfeld, yaxunl. Herald added subscribers: dexonsmith, bixia. tra requested review of this revision. Herald added a project: clang. Always use cuda.h to detect CUDA version. It's a more universal approach compared to version.txt which is no longer p

[PATCH] D108248: [CUDA] Bump the latest supported CUDA version to 11.4.

2021-08-17 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: Hahnfeld, yaxunl. Herald added subscribers: dexonsmith, bixia. tra requested review of this revision. Herald added a project: clang. This should reduce the amount of noise issued by clang for the recent-ish CUDA versions. Clang still does not supp

[PATCH] D77670: [CUDA] Add partial support for recent CUDA versions.

2021-08-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. See the patch stack at https://reviews.llvm.org/D108248 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77670/new/ https://reviews.llvm.org/D77670 ___ cfe-commits mailing list cfe-comm

[PATCH] D108247: [CUDA] Improve CUDA version detection and diagnostics.

2021-08-18 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 367246. tra edited the summary of this revision. tra added a comment. Do not report the version if we don't know it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D108247/new/ https://reviews.llvm.org/D108247 Files

[PATCH] D108247: [CUDA] Improve CUDA version detection and diagnostics.

2021-08-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/cuda-version-check.cu:75 -// UNKNOWN_VERSION_V: unknown CUDA version: version.txt:{{.*}}; assuming the latest supported version -// UNKNOWN_VERSION_H: unknown CUDA version: cuda.h: CUDA_VERSION={{.*}}; assuming the late

[PATCH] D108248: [CUDA] Bump the latest supported CUDA version to 11.4.

2021-08-18 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 367259. tra edited the summary of this revision. tra added a comment. Prepend space to version string. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D108248/new/ https://reviews.llvm.org/D108248 Files: clang/incl

[PATCH] D108248: [CUDA] Bump the latest supported CUDA version to 11.4.

2021-08-18 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 367260. tra added a comment. Undo unintentional change. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D108248/new/ https://reviews.llvm.org/D108248 Files: clang/include/clang/Basic/Cuda.h Index: clang/include/c

[PATCH] D108247: [CUDA] Improve CUDA version detection and diagnostics.

2021-08-18 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 367261. tra added a comment. Prepend space to the version string. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D108247/new/ https://reviews.llvm.org/D108247 Files: clang/include/clang/Basic/Cuda.h clang/includ

[PATCH] D108247: [CUDA] Improve CUDA version detection and diagnostics.

2021-08-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:103-104 +std::string VersionString = CudaVersionToString(Version); +if (!VersionString.empty()) + VersionString += " "; +D.Diag(diag::warn_drv_new_cuda_version) Hahnfeld

[PATCH] D108239: [CUDA] Add support for CUDA-11.4.

2021-08-18 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 367267. tra added a comment. Fixed typos Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D108239/new/ https://reviews.llvm.org/D108239 Files: clang/include/clang/Basic/Cuda.h clang/lib/Basic/Cuda.cpp clang/lib/

[PATCH] D108239: [CUDA] Add support for CUDA-11.4.

2021-08-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Basic/Targets/NVPTX.cpp:48-49 PTXVersion = llvm::StringSwitch(Feature) + .Case("+ptx72", 74) + .Case("+ptx71", 73) .Case("+ptx72", 72) Hahnfeld wro

[PATCH] D108248: [CUDA] Bump the latest supported CUDA version to 11.4.

2021-08-18 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 367282. tra added a comment. Updated release notes. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D108248/new/ https://reviews.llvm.org/D108248 Files: clang/docs/ReleaseNotes.rst clang/include/clang/Basic/Cuda.

[PATCH] D108235: [CUDA] Bump default GPU architecture to sm_35.

2021-08-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D108235#2952714 , @Hahnfeld wrote: > (might be good to have an entry in the release notes?) I've updated release notes in D108248 . Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTI

[PATCH] D108247: [CUDA] Improve CUDA version detection and diagnostics.

2021-08-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:209-211 + Version = FS.exists(LibDevicePath + "/libdevice.10.bc") +? Version = CudaVersion::NEW +: Version = CudaVersion::CUDA_70; Hahnfeld wr

[PATCH] D108247: [CUDA] Improve CUDA version detection and diagnostics.

2021-08-19 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 367532. tra added a comment. Fixed an error spotted by reviewer. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D108247/new/ https://reviews.llvm.org/D108247 Files: clang/include/clang/Basic/Cuda.h clang/include

[PATCH] D104847: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions

2021-06-25 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. LGTM. Would you like me to land the patch on your behalf? Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16397 unsigned NumEltsD; std::array Variants; A comment here describing expected arrangement of the va

[PATCH] D104847: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions

2021-06-29 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rG3644726a78e3: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA… (authored by steffenlarsen, committed by tra). Changed prior to commit: https://reviews.llvm.org/D104847?vs

[PATCH] D105226: [Clang] allow overriding -fbasic-block-sections

2021-06-30 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: tmsriram, jlebar. Herald added subscribers: pengfei, bixia. tra requested review of this revision. Herald added a project: clang. We should not error out on non-x86 targets if `-fbasic-block-sections=none` is in effect. Also, filter it out for GPU-

[PATCH] D105226: [Clang] allow overriding -fbasic-block-sections

2021-06-30 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 355681. tra edited the summary of this revision. tra added a comment. comment typo fix. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D105226/new/ https://reviews.llvm.org/D105226 Files: clang/lib/Driver/ToolChai

[PATCH] D105226: [Clang] allow overriding -fbasic-block-sections

2021-06-30 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGcab5f89cfd9e: [Clang] allow overriding -fbasic-block-sections (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D105135: [Internalize] Preserve variables externally initialized.

2021-07-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/CodeGenCUDA/host-used-device-var.cu:21-24 -__device__ int v1; - -// DEV-NEG-NOT: @v2 -__constant__ int v2; These should be changed to positive checks to verify that they are emitted. Ditto for other tests. =

[PATCH] D105295: [CUDA] Only allow NVIDIA offload-arch during CUDA compilation.

2021-07-01 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: yaxunl. Herald added a subscriber: bixia. tra requested review of this revision. Herald added a project: clang. Otherwise, if someone specifies a valid AMD arch, we may end up triggering an assertion on unexpected arch later on. Current tests didn'

[PATCH] D104847: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions

2021-07-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:727 TARGET_BUILTIN(__bmma_m8n8k128_ld_c, "vi*iC*UiIi", "", AND(SM_75,PTX63)) TARGET_BUILTIN(__bmma_m8n8k128_mma_xor_popc_b1, "vi*iC*iC*iC*Ii", "", AND(SM_75,PTX63)) TARGET_BUILTIN(__bmma_m8n8k1

[PATCH] D104847: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions

2021-07-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/CodeGen/builtins-nvptx-mma.cu:781-786 + // CHECK_PTX70_SM80: call {{.*}} @llvm.nvvm.wmma.m16n16k8.load.c.col.stride.f32 + // expected-error-re@+1 {{'__mma_tf32_m16n16k8_ld_c' needs target feature (sm_80{{.*}},(ptx70{{.* +

[PATCH] D104847: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions

2021-07-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/CodeGen/builtins-nvptx-mma.cu:781-786 + // CHECK_PTX70_SM80: call {{.*}} @llvm.nvvm.wmma.m16n16k8.load.c.col.stride.f32 + // expected-error-re@+1 {{'__mma_tf32_m16n16k8_ld_c' needs target feature (sm_80{{.*}},(ptx70{{.* +

[PATCH] D105384: [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.

2021-07-02 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: steffenlarsen. Herald added subscribers: bixia, hiraditya, yaxunl, jholewinski. tra requested review of this revision. Herald added a subscriber: jdoerfert. Herald added projects: clang, LLVM. Extends the changes in D104847

[PATCH] D105384: [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.

2021-07-12 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 358041. tra marked an inline comment as done. tra edited the summary of this revision. tra added a comment. Addressed review comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D105384/new/ https://reviews.llvm.

[PATCH] D105384: [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.

2021-07-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:35-38 +def make_mma_ops(geoms, types_a, types_b, types_c, types_d, b1ops=None): ops = [] + if b1ops is None: +b1ops = [""] steffenlarsen wrote: > Default initializers that us

[PATCH] D105295: [CUDA] Only allow NVIDIA offload-arch during CUDA compilation.

2021-07-13 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG01d3a3dcabaf: [CUDA] Only allow NVIDIA offload-arch during CUDA compilation. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE

[PATCH] D105384: [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.

2021-07-13 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 358365. tra added a comment. Updated LD/ST generation. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D105384/new/ https://reviews.llvm.org/D105384 Files: clang/include/clang/Basic/BuiltinsNVPTX.def clang/lib/Co

[PATCH] D105295: [CUDA] Only allow NVIDIA offload-arch during CUDA compilation.

2021-07-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Ugh. I broke the cuda-bad-arch.cu test. Comment at: clang/test/Driver/cuda-bad-arch.cu:30 // RUN: | FileCheck -check-prefix OK %s // RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=gfx90a -c %s 2>&1 \ // RUN: | FileCheck -check-prefix OK %s --

[PATCH] D105295: [CUDA] Only allow NVIDIA offload-arch during CUDA compilation.

2021-07-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D105295#2874935 , @tra wrote: > Ugh. I broke the cuda-bad-arch.cu test. Should be fixed in 25629bb45f0a4b8c8e99dbde4f4a7e3d980b9fd7 Repository: rG LLVM Git

[PATCH] D105384: [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.

2021-07-14 Thread Artem Belevich via Phabricator via cfe-commits
tra marked an inline comment as done. tra added inline comments. Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:84 + # It uses __mma_tf32_m16n16k8_ld_c but __mma_m16n16k8_st_c_f32. + make_ldst_ops(["m16n16k8"], ["a", "b", "c", "d"], ["tf32", "f32"])) ---

[PATCH] D112284: [Clang][NFC] Clang CUDA codegen clean-up

2021-10-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D112284#3086499 , @bondhugula wrote: > @tra While on this, I also wanted to ask as to why clang cuda codegen is > using an argument on the global ctor and the dtor it's generating. It's a good question, and I don't have a good a

[PATCH] D112492: [HIP] Do not use kernel handle for MSVC target

2021-11-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. As phrased the summary would likely be rather confusing for anyone other than you and me. > Currently Visual Studio 2019 has a linker issue which causes linking error > when a template kernel is instantiated in different compilation units. It's not clear what exactly is the

[PATCH] D112041: [InferAddressSpaces] Support assumed addrspaces from addrspace predicates.

2021-11-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM in general, modulo remaining nits. Comment at: llvm/lib/Transforms/Scalar/InferAddressSpaces.cpp:196 void inferAddressSpaces(ArrayRef Postorder, - ValueToAddrSpaceMapTy *InferredAddrSpace) const; + V

[PATCH] D112492: [HIP] Do not use kernel handle for MSVC target

2021-11-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > With these changes, we should have consistent name mangling for kernel stubs > and kernel launching mechanism on Linux and Windows. Nice! Thank you for figuring out the root causes. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D112492/new/ https://reviews.llvm.o

[PATCH] D113249: [CUDA] Bump CUDA version to 11.5

2021-11-05 Thread Artem Belevich via Phabricator via cfe-commits
tra requested changes to this revision. tra added a comment. This revision now requires changes to proceed. I think we're missing few more changes here: - The driver needs to enable ptx75 when it constructs cc1 command line in clang/lib/Driver/ToolChains/Cuda.cpp - We also need to handle PTX75 i

[PATCH] D113249: [CUDA] Bump CUDA version to 11.5

2021-11-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D113249#3112279 , @Hahnfeld wrote: > Experimental support for `__int128` is new in CUDA 11.5, not sure if Clang > enables this for CUDA. I think we've added support for i128 a while back: https://godbolt.org/z/18bEbhMYb > The r

[PATCH] D111443: [Driver] Fix ToolChain::getSanitizerArgs

2021-11-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I'll defer to @eugenis. Overall it looks OK to be. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D111443/new/ https://reviews.llvm.org/D111443 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/

[PATCH] D113249: [CUDA] Bump CUDA version to 11.5

2021-11-08 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. In D113249#3113666 , @carlosgalvezp wrote: >> - The driver needs to enable ptx75 when it constructs cc1 command line in >> clang/lib/Driver/ToolChains/Cuda

[PATCH] D113491: [HIP] Fix device stub name for Windows

2021-11-09 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM in general. Comment at: clang/lib/AST/MicrosoftMangle.cpp:975-976 + llvm::SmallString<128> Buf; + mangleSourceName((llvm::Twine("__device_stub__") + II->getN

[PATCH] D113490: [NFC] Let Microsoft mangler accept GlobalDecl

2021-11-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added a reviewer: rnk. tra added a subscriber: rnk. tra added a comment. + @rnk as it's a windows-specific change. LGTM in general. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D113490/new/ https://reviews.llvm.org/D113490 ___ cfe-commit

[PATCH] D112492: [CUDA][HIP] Allow comdat for kernels

2021-11-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added subscribers: kpyzhov, rnk. tra added inline comments. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:4290-4293 - // Do not set COMDAT attribute for CUDA/HIP stub functions to prevent - // them being "merged" by the COMDAT Folding linker optimization. - if (D.hasAttr

[PATCH] D112492: [CUDA][HIP] Allow comdat for kernels

2021-11-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Yes, we do need to merge identical functions with **identical names** for templates. The comdat-folding issue is different. IIUIC, it allows merging two functions with identical code and **different names**, into one function with two names. That will break CUDA as we do n

[PATCH] D86376: [HIP] Emit kernel symbol

2021-03-01 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. So, to summarize how the patch changes the under-the-hood kernel launch machinery: - device-side is unchanged. Kernel function is generated with the real kernel name - host-side stub is still gener

[PATCH] D97708: [CUDA] Remove `noreturn` attribute from __assertfail().

2021-03-01 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. Herald added subscribers: bixia, yaxunl. tra requested review of this revision. Herald added a project: clang. `noreturn` complicates control flow and tends to trigger a known bug in ptxas if the assert is used within loops in sufficiently c

[PATCH] D97708: [CUDA] Remove `noreturn` attribute from __assertfail().

2021-03-01 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 327197. tra edited the summary of this revision. tra added a comment. Added a comment. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D97708/new/ https://reviews.llvm.org/D97708 Files: clang/lib/Headers/__clang_cu

[PATCH] D97340: [HIP] Support Spack packages

2021-03-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:3535-3536 HelpText<"Print the registered targets">; +def print_rocm_search_dirs : Flag<["-", "--"], "print-rocm-search-dirs">, + HelpText<"Print the paths used for finding ROCm installation">; def p

[PATCH] D97708: [CUDA] Remove `noreturn` attribute from __assertfail().

2021-03-01 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG32e064527623: [CUDA] Remove `noreturn` attribute from __assertfail(). (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST A

[PATCH] D97340: [HIP] Support Spack packages

2021-03-02 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM modulo couple of nits. Comment at: clang/include/clang/Driver/Options.td:3535-3536 HelpText<"Print the registered targets">; +def print_rocm_search_dirs : Flag<["-", "--"],

[PATCH] D98068: Remove asserts for LocalInstantiationScope

2021-03-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Godbolt appears to be OK with the code for both gcc and clang: https://godbolt.org/z/enec44 Debug build does assert here: clang++: /work/llvm/repo/clang/lib/Sema/SemaTemplateInstantiate.cpp:3630: void clang::LocalInstantiationScope::InstantiatedLocal(const clang::Decl *,

[PATCH] D98193: [CUDA][HIP] Allow non-ODR use of host var in device

2021-03-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/SemaCUDA/device-use-host-var.cu:41 *out = global_const_var; + *out = global_const_struct_var.x; I do not think it should be allowed. We end up instantiating the variable on device, even though the variable

[PATCH] D98193: [CUDA][HIP] Allow non-ODR use of host var in device

2021-03-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added a reviewer: rsmith. tra added a subscriber: rsmith. tra added a comment. LGTM I've added @rsmith to double check that we're handling it correctly. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D98193/new/ https://reviews.llvm.org/D98193 ___

[PATCH] D107054: [Clang][CUDA] Add descriptors, mappings, and features for missing CUDA and PTX versions

2021-11-18 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. I think this patch has been obsoleted by https://reviews.llvm.org/D113249 which has already landed. My apologies for letting the patch slip through the cracks. CHANGES SINCE LAST ACTION https://

[PATCH] D114326: Update the list of CUDA versions up to 11.5

2021-11-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:131 + std::initializer_list Versions = { + "11.5", "11.4", "11.3", "11.2", "11.1", "11.0", "10.2", "10.1", + "10.0", "9.2", "9.1", "9.0", "8.0", "7.5", "7.0"}; mojca wrote

[PATCH] D110618: [HIPSPV][2/4] Add HIPSPV tool chain

2021-11-22 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM in general, modulo push_back/append nits. Comment at: clang/include/clang/Driver/Options.td:3701 " do not include the default CUDA/HIP wrapper headers">; +def nohipwrapperi

[PATCH] D110549: [HIPSPV][1/4] Refactor HIP tool chain

2021-11-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: echristo. tra added inline comments. Comment at: clang/lib/Driver/ToolChains/HIPUtility.cpp:119-133 + // Add MC directives to embed target binaries. We ensure that each + // section and image is 16-byte aligned. This is not mandatory, but + // increases

[PATCH] D114326: Update the list of CUDA versions up to 11.5

2021-11-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:131 + std::initializer_list Versions = { + "11.5", "11.4", "11.3", "11.2", "11.1", "11.0", "10.2", "10.1", + "10.0", "9.2", "9.1", "9.0", "8.0", "7.5", "7.0"}; tra wrote:

[PATCH] D114326: Update the list of CUDA versions up to 11.5

2021-11-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:131 + std::initializer_list Versions = { + "11.5", "11.4", "11.3", "11.2", "11.1", "11.0", "10.2", "10.1", + "10.0", "9.2", "9.1", "9.0", "8.0", "7.5", "7.0"}; carlosgalve

[PATCH] D114326: Update the list of CUDA versions up to 11.5

2021-11-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D114326#3154228 , @mojca wrote: > Somewhat off-topic from a discussion earlier in the thread. > What's the purpose of the following code then if users are supposed to > explicitly specify the `-L` flag anyway? Good point, it is i

[PATCH] D114601: Read path to CUDA from env. variable CUDA_PATH on Windows

2021-11-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:137 } else if (HostTriple.isOSWindows()) { -for (const char *Ver : Versions) - Candidates.emplace_back( Do we want to keep this as the fall-back for cases when `CUDA_PATH` is

[PATCH] D114326: Update the list of CUDA versions up to 11.5

2021-11-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D114326#3159122 , @mojca wrote: > @tra: this is not yet 100% ready since the unit tests are now failing > (expecting to find CUDA 8.0). > I can fix the unit test, but I suppose that someone needs to install > additional SDK somew

[PATCH] D114601: Read path to CUDA from env. variable CUDA_PATH on Windows

2021-11-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:137 } else if (HostTriple.isOSWindows()) { -for (const char *Ver : Versions) - Candidates.emplace_back( mojca wrote: > tra wrote: > > Do we want to keep this as the fall-back

[PATCH] D114326: Update the list of CUDA versions up to 11.5

2021-11-29 Thread Artem Belevich via Phabricator via cfe-commits
tra requested changes to this revision. tra added a comment. This revision now requires changes to proceed. With D114601 , this patch would no longer be needed. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D114326/

[PATCH] D114812: [HIP] Add pre-defined macro `__HIPCC_RDC__`

2021-11-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D114812#3162282 , @yaxunl wrote: > I am not sure whether we want to define a similar macro for cuda-clang. > > Maybe `__CLANG_RDC__` is better? I think it would make sense. For CUDA compatibility we can the define __CUDACC_RDC__

[PATCH] D136311: [CUDA] Propagate __bf16 type info from the host compilation.

2022-10-21 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. Herald added subscribers: mattd, gchakrabarti, asavonic, bixia, yaxunl. Herald added a project: All. tra updated this revision to Diff 469453. tra added a comment. Herald added a subscriber: hiraditya. tra updated this revision to Diff 469460. tra updated this revision to

[PATCH] D136311: [CUDA] Propagate __bf16 type info from the host compilation.

2022-10-21 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 469663. tra added a comment. Make __bf16 available regradless of its availability on the host. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D136311/new/ https://reviews.llvm.org/D136311 Files: clang/lib/Basic/Ta

<    6   7   8   9   10   11   12   13   14   15   >