from:"Shangwu Yao via Phabricator via cfe\-commits"

[PATCH] D123049: Emit OpenCL metadata when targeting SPIR-V

2022-04-04 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao created this revision.
shangwuyao added reviewers: jlebar, yaxunl, tra.
Herald added subscribers: ldrumm, ThomasRaoux, Anastasia.
Herald added a project: All.
shangwuyao requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

This is required for converting function calls such as get_global_id()
into SPIR-V builtins.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D123049

Files:
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/CodeGenCUDASPIRV/kernel-cc.cu


Index: clang/test/CodeGenCUDASPIRV/kernel-cc.cu
===
--- clang/test/CodeGenCUDASPIRV/kernel-cc.cu
+++ clang/test/CodeGenCUDASPIRV/kernel-cc.cu
@@ -7,3 +7,6 @@
 // CHECK: define spir_kernel void @_Z6kernelv()
 
 __attribute__((global)) void kernel() { return; }
+
+// CHECK: !opencl.ocl.version = !{[[OCL:![0-9]+]]}
+// CHECK: [[OCL]] = !{i32 2, i32 0}
Index: clang/lib/Frontend/CompilerInvocation.cpp
===
--- clang/lib/Frontend/CompilerInvocation.cpp
+++ clang/lib/Frontend/CompilerInvocation.cpp
@@ -3312,6 +3312,10 @@
 // whereas respecting contract flag in backend.
 Opts.setDefaultFPContractMode(LangOptions::FPM_FastHonorPragmas);
   } else if (Opts.CUDA) {
+if (T.isSPIRV()) {
+  // Emit OpenCL version metadata in LLVM IR when targeting SPIR-V.
+  Opts.OpenCLVersion = 200;
+}
 // Allow fuse across statements disregarding pragmas.
 Opts.setDefaultFPContractMode(LangOptions::FPM_Fast);
   }
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -784,7 +784,7 @@
   LangOpts.OpenMP);
 
   // Emit OpenCL specific module metadata: OpenCL/SPIR version.
-  if (LangOpts.OpenCL) {
+  if (LangOpts.OpenCL || (LangOpts.CUDAIsDevice && getTriple().isSPIRV())) {
 EmitOpenCLMetadata();
 // Emit SPIR version.
 if (getTriple().isSPIR()) {


Index: clang/test/CodeGenCUDASPIRV/kernel-cc.cu
===
--- clang/test/CodeGenCUDASPIRV/kernel-cc.cu
+++ clang/test/CodeGenCUDASPIRV/kernel-cc.cu
@@ -7,3 +7,6 @@
 // CHECK: define spir_kernel void @_Z6kernelv()
 
 __attribute__((global)) void kernel() { return; }
+
+// CHECK: !opencl.ocl.version = !{[[OCL:![0-9]+]]}
+// CHECK: [[OCL]] = !{i32 2, i32 0}
Index: clang/lib/Frontend/CompilerInvocation.cpp
===
--- clang/lib/Frontend/CompilerInvocation.cpp
+++ clang/lib/Frontend/CompilerInvocation.cpp
@@ -3312,6 +3312,10 @@
 // whereas respecting contract flag in backend.
 Opts.setDefaultFPContractMode(LangOptions::FPM_FastHonorPragmas);
   } else if (Opts.CUDA) {
+if (T.isSPIRV()) {
+  // Emit OpenCL version metadata in LLVM IR when targeting SPIR-V.
+  Opts.OpenCLVersion = 200;
+}
 // Allow fuse across statements disregarding pragmas.
 Opts.setDefaultFPContractMode(LangOptions::FPM_Fast);
   }
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -784,7 +784,7 @@
   LangOpts.OpenMP);
 
   // Emit OpenCL specific module metadata: OpenCL/SPIR version.
-  if (LangOpts.OpenCL) {
+  if (LangOpts.OpenCL || (LangOpts.CUDAIsDevice && getTriple().isSPIRV())) {
 EmitOpenCLMetadata();
 // Emit SPIR version.
 if (getTriple().isSPIR()) {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D123049: Emit OpenCL metadata when targeting SPIR-V

2022-04-04 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao added a comment.

In D123049#3426849 , @yaxunl wrote:

> Is this because your HIP threadIdx etc are implemented using OpenCL builtins 
> so that the emitted LLVM IR contains calls of OpenCL builtins?

Yes, that's correct.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123049/new/

https://reviews.llvm.org/D123049

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D123049: Emit OpenCL metadata when targeting SPIR-V

2022-04-05 Thread Shangwu Yao via Phabricator via cfe-commits

This revision was not accepted when it landed; it landed in state "Needs 
Review".
This revision was automatically updated to reflect the committed changes.
Closed by commit rG15a1769631ff: Emit OpenCL metadata when targeting SPIR-V 
(authored by shangwuyao).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123049/new/

https://reviews.llvm.org/D123049

Files:
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/CodeGenCUDASPIRV/kernel-cc.cu


Index: clang/test/CodeGenCUDASPIRV/kernel-cc.cu
===
--- clang/test/CodeGenCUDASPIRV/kernel-cc.cu
+++ clang/test/CodeGenCUDASPIRV/kernel-cc.cu
@@ -7,3 +7,6 @@
 // CHECK: define spir_kernel void @_Z6kernelv()
 
 __attribute__((global)) void kernel() { return; }
+
+// CHECK: !opencl.ocl.version = !{[[OCL:![0-9]+]]}
+// CHECK: [[OCL]] = !{i32 2, i32 0}
Index: clang/lib/Frontend/CompilerInvocation.cpp
===
--- clang/lib/Frontend/CompilerInvocation.cpp
+++ clang/lib/Frontend/CompilerInvocation.cpp
@@ -3328,6 +3328,10 @@
 // whereas respecting contract flag in backend.
 Opts.setDefaultFPContractMode(LangOptions::FPM_FastHonorPragmas);
   } else if (Opts.CUDA) {
+if (T.isSPIRV()) {
+  // Emit OpenCL version metadata in LLVM IR when targeting SPIR-V.
+  Opts.OpenCLVersion = 200;
+}
 // Allow fuse across statements disregarding pragmas.
 Opts.setDefaultFPContractMode(LangOptions::FPM_Fast);
   }
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -784,7 +784,7 @@
   LangOpts.OpenMP);
 
   // Emit OpenCL specific module metadata: OpenCL/SPIR version.
-  if (LangOpts.OpenCL) {
+  if (LangOpts.OpenCL || (LangOpts.CUDAIsDevice && getTriple().isSPIRV())) {
 EmitOpenCLMetadata();
 // Emit SPIR version.
 if (getTriple().isSPIR()) {


Index: clang/test/CodeGenCUDASPIRV/kernel-cc.cu
===
--- clang/test/CodeGenCUDASPIRV/kernel-cc.cu
+++ clang/test/CodeGenCUDASPIRV/kernel-cc.cu
@@ -7,3 +7,6 @@
 // CHECK: define spir_kernel void @_Z6kernelv()
 
 __attribute__((global)) void kernel() { return; }
+
+// CHECK: !opencl.ocl.version = !{[[OCL:![0-9]+]]}
+// CHECK: [[OCL]] = !{i32 2, i32 0}
Index: clang/lib/Frontend/CompilerInvocation.cpp
===
--- clang/lib/Frontend/CompilerInvocation.cpp
+++ clang/lib/Frontend/CompilerInvocation.cpp
@@ -3328,6 +3328,10 @@
 // whereas respecting contract flag in backend.
 Opts.setDefaultFPContractMode(LangOptions::FPM_FastHonorPragmas);
   } else if (Opts.CUDA) {
+if (T.isSPIRV()) {
+  // Emit OpenCL version metadata in LLVM IR when targeting SPIR-V.
+  Opts.OpenCLVersion = 200;
+}
 // Allow fuse across statements disregarding pragmas.
 Opts.setDefaultFPContractMode(LangOptions::FPM_Fast);
   }
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -784,7 +784,7 @@
   LangOpts.OpenMP);
 
   // Emit OpenCL specific module metadata: OpenCL/SPIR version.
-  if (LangOpts.OpenCL) {
+  if (LangOpts.OpenCL || (LangOpts.CUDAIsDevice && getTriple().isSPIRV())) {
 EmitOpenCLMetadata();
 // Emit SPIR version.
 if (getTriple().isSPIR()) {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D119207: [CUDA][SPIRV] Convert CUDA kernels to SPIR-V kernels

2022-02-07 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao created this revision.
Herald added subscribers: carlosgalvezp, ThomasRaoux, Anastasia, yaxunl.
shangwuyao requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

This patch converts CUDA pointer kernel arguments with default address space to 
CrossWorkGroup address space (__global in OpenCL). This is because Generic or 
Function (OpenCL's private) is not supported as storage class for kernel 
pointer types.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D119207

Files:
  clang/lib/Basic/Targets/SPIR.h
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGenCUDASPIRV/kernel-argument.cu


Index: clang/test/CodeGenCUDASPIRV/kernel-argument.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/kernel-argument.cu
@@ -0,0 +1,17 @@
+// Tests CUDA kernel arguments get global address space when targetting SPIR-V.
+
+// REQUIRES: clang-driver
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv32 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv64 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// CHECK: define spir_kernel void @_Z6kernelPi(i32 addrspace(1)* noundef 
%output.coerce)
+
+__attribute__((global)) void kernel(int* output) { *output = 1; }
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -10319,10 +10319,10 @@
 }
 
 ABIArgInfo SPIRVABIInfo::classifyKernelArgumentType(QualType Ty) const {
-  if (getContext().getLangOpts().HIP) {
+  if (getContext().getLangOpts().CUDAIsDevice) {
 // Coerce pointer arguments with default address space to CrossWorkGroup
-// pointers for HIPSPV. When the language mode is HIP, the SPIRTargetInfo
-// maps cuda_device to SPIR-V's CrossWorkGroup address space.
+// pointers for HIPSPV/CUDASPV. When the language mode is HIP/CUDA, the
+// SPIRTargetInfo maps cuda_device to SPIR-V's CrossWorkGroup address 
space.
 llvm::Type *LTy = CGT.ConvertType(Ty);
 auto DefaultAS = getContext().getTargetAddressSpace(LangAS::Default);
 auto GlobalAS = getContext().getTargetAddressSpace(LangAS::cuda_device);
Index: clang/lib/Basic/Targets/SPIR.h
===
--- clang/lib/Basic/Targets/SPIR.h
+++ clang/lib/Basic/Targets/SPIR.h
@@ -144,16 +144,16 @@
 // FIXME: SYCL specification considers unannotated pointers and references
 // to be pointing to the generic address space. See section 5.9.3 of
 // SYCL 2020 specification.
-// Currently, there is no way of representing SYCL's and HIP's default
+// Currently, there is no way of representing SYCL's and HIP/CUDA's default
 // address space language semantic along with the semantics of embedded C's
 // default address space in the same address space map. Hence the map needs
 // to be reset to allow mapping to the desired value of 'Default' entry for
-// SYCL and HIP.
+// SYCL and HIP/CUDA.
 setAddressSpaceMap(
 /*DefaultIsGeneric=*/Opts.SYCLIsDevice ||
-// The address mapping from HIP language for device code is only 
defined
-// for SPIR-V.
-(getTriple().isSPIRV() && Opts.HIP && Opts.CUDAIsDevice));
+// The address mapping from HIP/CUDA language for device code is only
+// defined for SPIR-V.
+(getTriple().isSPIRV() && Opts.CUDAIsDevice));
   }
 
   void setSupportedOpenCLOpts() override {


Index: clang/test/CodeGenCUDASPIRV/kernel-argument.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/kernel-argument.cu
@@ -0,0 +1,17 @@
+// Tests CUDA kernel arguments get global address space when targetting SPIR-V.
+
+// REQUIRES: clang-driver
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv32 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv64 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// CHECK: define spir_kernel void @_Z6kernelPi(i32 addrspace(1)* noundef %output.coerce)
+
+__attribute__((global)) void kernel(int* output) { *output = 1; }
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -10319,10 +10319,10 @@
 }
 
 ABIArgInfo SPIRVABIInfo::classifyKernelArgumentType(QualType Ty) const {
-  if (getContext().getLangOpts().HIP) {
+

[PATCH] D119207: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

2022-02-08 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao added inline comments.

Comment at: clang/lib/CodeGen/TargetInfo.cpp:10322
 ABIArgInfo SPIRVABIInfo::classifyKernelArgumentType(QualType Ty) const {
-  if (getContext().getLangOpts().HIP) {
+  if (getContext().getLangOpts().CUDAIsDevice) {
 // Coerce pointer arguments with default address space to CrossWorkGroup

jlebar wrote:
> I am surprised by this change.  Is the language mode HIP only when compiling 
> for device?  Or are you intentionally changing the behavior in HIP mode?
> 
> Same in SPIR.h
We are targeting SPIRV so //I think// "compiling for device" is implied, I will 
let others comment on this to see if the assumption is correct. So this 
function can only be called when compiling for device, and won't be called when 
compiling for host. 

Also tried compiling for device and host separately to see where exactly does 
the code diverge (to make sure those two functions are not called when 
compiling for host):
1. This `classifyKernelArgumentType()` function is called from [[ 
https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CGCall.cpp#L774-L777
 | here ]], which is only enabled when the calling convention is `SPIR_KERNEL`. 
And when compiling for host, the calling convention is `C`.

2. For the SPIR.h file, the `TargetInfo::adjust` function is called both when 
compiling for host and for device, see [[ 
https://github.com/llvm/llvm-project/blob/main/clang/lib/Basic/Targets/SPIR.h#L142-L157
 | here ]], while the `setAddressSpaceMap` function is only called when 
compiling for device (SPIRV).

In conclusion, those two functions won't be reached when compiling for host.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119207/new/

https://reviews.llvm.org/D119207

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D119207: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

2022-02-16 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao added a comment.

Thanks for the review, if it looks good, can we get this to land now? Otherwise 
more comments are welcome!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119207/new/

https://reviews.llvm.org/D119207

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D119207: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

2022-02-17 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao added a comment.

In D119207#3330385 , @dyung wrote:

> Hi, the test you added is failing on the PS4 Linux bot, can you take a look?
>
> https://lab.llvm.org/buildbot/#/builders/139/builds/17199

Looks like the compiled SPIR-V is slightly different for different build 
settings, for `llvm-clang-x86_64-sie-ubuntu-fast`, it is compiled to

  define hidden spir_kernel void @_Z6kernelPi(i32 addrspace(1)* noundef 
%output.coerce) #0 { 

so it is missing that extra `hidden` keyword. 
And for `clang-ve-ninja`, it is compiled to

  define spir_kernel void @_Z6kernelPi(i32 addrspace(1)* noundef %0) #0 { 

so the kernel argument identifier is slightly different (`%0` vs 
`%output.coerce`).

I could fix that, I wonder why it didn't trigger the same issue (for the 
`hidden` keyword) with this test 

 tho, it is basically the same.

And why does those build test run only after merging? For future reference, can 
I try to run those myself before submitting?

For this change, should we do a rollback and then re-land it after applying the 
fix?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119207/new/

https://reviews.llvm.org/D119207

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130387: [CUDA/SPIR-V] Force passing aggregate type byval

2022-07-22 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao created this revision.
shangwuyao added reviewers: jlebar, mkuper, tra, yaxunl.
Herald added subscribers: mattd, ThomasRaoux.
Herald added a project: All.
shangwuyao requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

This patch forces copying aggregate type in kernel arguments by value when
compiling CUDA targeting SPIR-V. The original behavior is not passing by value
when there is any of destructor, copy constructor and move constructor defined
by user. This patch makes the behavior of SPIR-V generated from CUDA follow
the CUDA spec
(https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#global-function-argument-processing),
and matches the NVPTX
implementation (
https://github.com/llvm/llvm-project/blob/41958f76d8a2c47484fa176cba1de565cfe84de7/clang/lib/CodeGen/TargetInfo.cpp#L7241).


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D130387

Files:
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGenCUDASPIRV/copy-aggregate-byval.cu


Index: clang/test/CodeGenCUDASPIRV/copy-aggregate-byval.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/copy-aggregate-byval.cu
@@ -0,0 +1,25 @@
+// Tests CUDA kernel arguments get copied by value when targeting SPIR-V, even 
with
+// destructor, copy constructor or move constructor defined by user.
+
+// RUN: %clang -Xclang -no-opaque-pointers -emit-llvm --cuda-device-only 
--offload=spirv32 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// RUN: %clang -Xclang -no-opaque-pointers -emit-llvm --cuda-device-only 
--offload=spirv64 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+class GpuData {
+ public:
+  __attribute__((host)) __attribute__((device)) GpuData(int* src) {}
+  __attribute__((host)) __attribute__((device)) ~GpuData() {}
+  __attribute__((host)) __attribute__((device)) GpuData(const GpuData& other) 
{}
+  __attribute__((host)) __attribute__((device)) GpuData(GpuData&& other) {}
+};
+
+// CHECK: define
+// CHECK-SAME: spir_kernel void @_Z6kernel7GpuData(%class.GpuData* noundef 
byval(%class.GpuData) align
+
+__attribute__((global)) void kernel(GpuData output) {}
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -10446,6 +10446,10 @@
   LTy = llvm::PointerType::getWithSamePointeeType(PtrTy, GlobalAS);
   return ABIArgInfo::getDirect(LTy, 0, nullptr, false);
 }
+
+if (isAggregateTypeForABI(Ty)) {
+  return getNaturalAlignIndirect(Ty, /* byval */ true);
+}
   }
   return classifyArgumentType(Ty);
 }


Index: clang/test/CodeGenCUDASPIRV/copy-aggregate-byval.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/copy-aggregate-byval.cu
@@ -0,0 +1,25 @@
+// Tests CUDA kernel arguments get copied by value when targeting SPIR-V, even with
+// destructor, copy constructor or move constructor defined by user.
+
+// RUN: %clang -Xclang -no-opaque-pointers -emit-llvm --cuda-device-only --offload=spirv32 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// RUN: %clang -Xclang -no-opaque-pointers -emit-llvm --cuda-device-only --offload=spirv64 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+class GpuData {
+ public:
+  __attribute__((host)) __attribute__((device)) GpuData(int* src) {}
+  __attribute__((host)) __attribute__((device)) ~GpuData() {}
+  __attribute__((host)) __attribute__((device)) GpuData(const GpuData& other) {}
+  __attribute__((host)) __attribute__((device)) GpuData(GpuData&& other) {}
+};
+
+// CHECK: define
+// CHECK-SAME: spir_kernel void @_Z6kernel7GpuData(%class.GpuData* noundef byval(%class.GpuData) align
+
+__attribute__((global)) void kernel(GpuData output) {}
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -10446,6 +10446,10 @@
   LTy = llvm::PointerType::getWithSamePointeeType(PtrTy, GlobalAS);
   return ABIArgInfo::getDirect(LTy, 0, nullptr, false);
 }
+
+if (isAggregateTypeForABI(Ty)) {
+  return getNaturalAlignIndirect(Ty, /* byval */ true);
+}
   }
   return classifyArgumentType(Ty);
 }
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130387: [CUDA/SPIR-V] Force passing aggregate type byval

2022-07-22 Thread Shangwu Yao via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG31d8dbd1e5b4: [CUDA/SPIR-V] Force passing aggregate type 
byval (authored by shangwuyao).

Changed prior to commit:
  https://reviews.llvm.org/D130387?vs=446934&id=446966#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130387/new/

https://reviews.llvm.org/D130387

Files:
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGenCUDASPIRV/copy-aggregate-byval.cu


Index: clang/test/CodeGenCUDASPIRV/copy-aggregate-byval.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/copy-aggregate-byval.cu
@@ -0,0 +1,25 @@
+// Tests CUDA kernel arguments get copied by value when targeting SPIR-V, even 
with
+// destructor, copy constructor or move constructor defined by user.
+
+// RUN: %clang -Xclang -no-opaque-pointers -emit-llvm --cuda-device-only 
--offload=spirv32 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// RUN: %clang -Xclang -no-opaque-pointers -emit-llvm --cuda-device-only 
--offload=spirv64 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+class GpuData {
+ public:
+  __attribute__((host)) __attribute__((device)) GpuData(int* src) {}
+  __attribute__((host)) __attribute__((device)) ~GpuData() {}
+  __attribute__((host)) __attribute__((device)) GpuData(const GpuData& other) 
{}
+  __attribute__((host)) __attribute__((device)) GpuData(GpuData&& other) {}
+};
+
+// CHECK: define
+// CHECK-SAME: spir_kernel void @_Z6kernel7GpuData(%class.GpuData* noundef 
byval(%class.GpuData) align
+
+__attribute__((global)) void kernel(GpuData output) {}
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -10449,6 +10449,15 @@
   LTy = llvm::PointerType::getWithSamePointeeType(PtrTy, GlobalAS);
   return ABIArgInfo::getDirect(LTy, 0, nullptr, false);
 }
+
+// Force copying aggregate type in kernel arguments by value when
+// compiling CUDA targeting SPIR-V. This is required for the object
+// copied to be valid on the device.
+// This behavior follows the CUDA spec
+// 
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#global-function-argument-processing,
+// and matches the NVPTX implementation.
+if (isAggregateTypeForABI(Ty))
+  return getNaturalAlignIndirect(Ty, /* byval */ true);
   }
   return classifyArgumentType(Ty);
 }


Index: clang/test/CodeGenCUDASPIRV/copy-aggregate-byval.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/copy-aggregate-byval.cu
@@ -0,0 +1,25 @@
+// Tests CUDA kernel arguments get copied by value when targeting SPIR-V, even with
+// destructor, copy constructor or move constructor defined by user.
+
+// RUN: %clang -Xclang -no-opaque-pointers -emit-llvm --cuda-device-only --offload=spirv32 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// RUN: %clang -Xclang -no-opaque-pointers -emit-llvm --cuda-device-only --offload=spirv64 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+class GpuData {
+ public:
+  __attribute__((host)) __attribute__((device)) GpuData(int* src) {}
+  __attribute__((host)) __attribute__((device)) ~GpuData() {}
+  __attribute__((host)) __attribute__((device)) GpuData(const GpuData& other) {}
+  __attribute__((host)) __attribute__((device)) GpuData(GpuData&& other) {}
+};
+
+// CHECK: define
+// CHECK-SAME: spir_kernel void @_Z6kernel7GpuData(%class.GpuData* noundef byval(%class.GpuData) align
+
+__attribute__((global)) void kernel(GpuData output) {}
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -10449,6 +10449,15 @@
   LTy = llvm::PointerType::getWithSamePointeeType(PtrTy, GlobalAS);
   return ABIArgInfo::getDirect(LTy, 0, nullptr, false);
 }
+
+// Force copying aggregate type in kernel arguments by value when
+// compiling CUDA targeting SPIR-V. This is required for the object
+// copied to be valid on the device.
+// This behavior follows the CUDA spec
+// https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#global-function-argument-processing,
+// and matches the NVPTX implementation.
+if (isAggregateTypeForABI(Ty))
+  return getNaturalAlignIndirect(Ty, /* byval */ true);
   }
   return classifyArgumentType(Ty);
 }
__

[PATCH] D130387: [CUDA/SPIR-V] Force passing aggregate type byval

2022-07-22 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao added a comment.

Accidentally submitted early...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130387/new/

https://reviews.llvm.org/D130387

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130387: [CUDA/SPIR-V] Force passing aggregate type byval

2022-07-22 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao added a comment.

In D130387#3672969 , @tra wrote:

> In D130387#3672961 , @shangwuyao 
> wrote:
>
>> Accidentally submitted early...
>
> The landed revision seems to have my comments addressed. Was there something 
> missing?

No, just want to wait a bit longer to see if someone else has comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130387/new/

https://reviews.llvm.org/D130387

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D140226: [NVPTX] Introduce attribute to mark kernels without a language mode

2022-12-19 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao added inline comments.



Comment at: clang/include/clang/Basic/Attr.td:1198
 
-def CUDAGlobal : InheritableAttr {
-  let Spellings = [GNU<"global">, Declspec<"__global__">];
+def CUDAGlobal : InheritableAttr, TargetSpecificAttr {
+  let Spellings = [GNU<"global">, Declspec<"__global__">, 
Clang<"nvptx_kernel">];

jhuber6 wrote:
> tra wrote:
> > Nice.
> > 
> > This reminded me that we have a project compiling CUDA, but targeting 
> > SPIR-V instead of NVPTX. It looks like this will likely break them. The 
> > project is out-of-tree, but I'd still need to figure out how to keep them 
> > working.  I guess it would be easy enough to expand TargetNVPTX to 
> > TargetNVPTXOrSpirV. I'm mostly concerned about logistics of making it 
> > happen without disruption.
> > 
> > 
> This might've broken more stuff after looking into it, I forgot that `AMDGPU` 
> still uses the same CUDA attributes, and the host portion of CUDA also checks 
> these. It would be nice if there was a way to say "CUDA" or "NVPTX", 
> wondering if that's possible in the tablegen here.
What's the plan here for keeping the SPIR-V and AMDGPU working? Would it work 
if we simply get rid of the `TargetSpecificAttr`?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140226/new/

https://reviews.llvm.org/D140226

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D144047: [CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on host/device

2023-02-14 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao created this revision.
shangwuyao added reviewers: jlebar, tra, yaxunl.
Herald added subscribers: mattd, carlosgalvezp, ThomasRaoux.
Herald added a project: All.
shangwuyao requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

This change matches the CUDA/SPIRV behavior with CUDA/NVPTX, and makes some 
builtin types
and __GCC_ATOMIC_XXX_LOCK_FREE macros the same between the host and device. 
This is only
done when host triple is provided and known, otherwise the behavior is 
unchanged.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D144047

Files:
  clang/lib/Basic/Targets/SPIR.h
  clang/test/CodeGenCUDASPIRV/cuda-types.cu

Index: clang/test/CodeGenCUDASPIRV/cuda-types.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/cuda-types.cu
@@ -0,0 +1,56 @@
+// Check that types, widths, __CLANG_ATOMIC* macros, etc. match on the host and
+// device sides of CUDA compilations. Note that we filter out long double and
+// maxwidth of _BitInt(), as this is intentionally different on host and device.
+//
+// Also ignore __CLANG_ATOMIC_LLONG_LOCK_FREE on i386. The default host CPU for
+// an i386 triple is typically at least an i586, which has cmpxchg8b (Clang
+// feature, "cx8"). Therefore, __CLANG_ATOMIC_LLONG_LOCK_FREE is 2 on the host,
+// but the value should be 1 for the device.
+//
+// Unlike CUDA, the width of SPIR-V POINTER type could differ between host and
+// device, because SPIR-V explicitly sets POINTER type width. So it is the
+// user's responsibility to choose the offload with the right POINTER size,
+// otherwise the values for __CLANG_ATOMIC_POINTER_LOCK_FREE could be different.
+
+// RUN: mkdir -p %t
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv32 -target i386-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv32 -target i386-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-device-defines-filtered
+// RUN: diff %t/i386-host-defines-filtered %t/i386-device-defines-filtered
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv32 -target i386-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-msvc-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv32 -target i386-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-msvc-device-defines-filtered
+// RUN: diff %t/i386-msvc-host-defines-filtered %t/i386-msvc-device-defines-filtered
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-device-defines-filtered
+// RUN: diff %t/x86_64-host-defines-filtered %t/x86_64-device-defines-filtered
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv64 -target powerpc64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/powerpc64-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv64 -target powerpc64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/powerpc64-device-defines-filtered
+// RUN: diff %t/powerpc64-host-defines-filtered %t/powerpc64-device-defines-filtered
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-msvc-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-msvc-device-defines-filtered
+// RUN: diff %t/x86_64-msvc-host-defines-filtered %t/x86_64-msvc-device-defines-filtered
+
Index: clang/lib/Basic/Targets/SPIR.h
===

[PATCH] D144047: [CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on host/device

2023-02-15 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao added a comment.

In D144047#4129154 , @yaxunl wrote:

> Making the builtin types consistent is necessary to keep struct layout 
> consistent across host and device, but why do we need to make  
> __GCC_ATOMIC_XXX_LOCK_FREE macros the same between the host and device? Is 
> there any concrete issue if they are not the same?

The reason is the same as NVPTX, see 
https://github.com/llvm/llvm-project/blob/22882c39df71397cc6f9774d18e87d06e016c55f/clang/lib/Basic/Targets/NVPTX.cpp#L137-L141.
 Without it, we won't be able to use libraries that statically check the 
__atomic_always_lock_free. I could add the comments in the code if that makes 
things more clear.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144047/new/

https://reviews.llvm.org/D144047

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D144047: [CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on host/device

2023-02-15 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao updated this revision to Diff 497700.
shangwuyao added a comment.

Amend with comments


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144047/new/

https://reviews.llvm.org/D144047

Files:
  clang/lib/Basic/Targets/SPIR.h
  clang/test/CodeGenCUDASPIRV/cuda-types.cu

Index: clang/test/CodeGenCUDASPIRV/cuda-types.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/cuda-types.cu
@@ -0,0 +1,56 @@
+// Check that types, widths, __CLANG_ATOMIC* macros, etc. match on the host and
+// device sides of CUDA compilations. Note that we filter out long double and
+// maxwidth of _BitInt(), as this is intentionally different on host and device.
+//
+// Also ignore __CLANG_ATOMIC_LLONG_LOCK_FREE on i386. The default host CPU for
+// an i386 triple is typically at least an i586, which has cmpxchg8b (Clang
+// feature, "cx8"). Therefore, __CLANG_ATOMIC_LLONG_LOCK_FREE is 2 on the host,
+// but the value should be 1 for the device.
+//
+// Unlike CUDA, the width of SPIR-V POINTER type could differ between host and
+// device, because SPIR-V explicitly sets POINTER type width. So it is the
+// user's responsibility to choose the offload with the right POINTER size,
+// otherwise the values for __CLANG_ATOMIC_POINTER_LOCK_FREE could be different.
+
+// RUN: mkdir -p %t
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv32 -target i386-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv32 -target i386-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-device-defines-filtered
+// RUN: diff %t/i386-host-defines-filtered %t/i386-device-defines-filtered
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv32 -target i386-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-msvc-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv32 -target i386-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-msvc-device-defines-filtered
+// RUN: diff %t/i386-msvc-host-defines-filtered %t/i386-msvc-device-defines-filtered
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-device-defines-filtered
+// RUN: diff %t/x86_64-host-defines-filtered %t/x86_64-device-defines-filtered
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv64 -target powerpc64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/powerpc64-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv64 -target powerpc64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/powerpc64-device-defines-filtered
+// RUN: diff %t/powerpc64-host-defines-filtered %t/powerpc64-device-defines-filtered
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-msvc-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-msvc-device-defines-filtered
+// RUN: diff %t/x86_64-msvc-host-defines-filtered %t/x86_64-msvc-device-defines-filtered
+
Index: clang/lib/Basic/Targets/SPIR.h
===
--- clang/lib/Basic/Targets/SPIR.h
+++ clang/lib/Basic/Targets/SPIR.h
@@ -13,6 +13,7 @@
 #ifndef LLVM_CLANG_LIB_BASIC_TARGETS_SPIR_H
 #define LLVM_CLANG_LIB_BASIC_TARGETS_SPIR_H
 
+#include "Targets.h"
 #include "clang/Basic/TargetInfo.h"
 #include "clang/Basic/TargetOptions.h"
 #include "llvm/Support/Compiler.h"
@@ -79,8 +80,10 @@
 
 // Base

[PATCH] D144047: [CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on host/device

2023-02-15 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao added a comment.

In D144047#4129247 , @yaxunl wrote:

> In D144047#4129182 , @shangwuyao 
> wrote:
>
>> In D144047#4129154 , @yaxunl wrote:
>>
>>> Making the builtin types consistent is necessary to keep struct layout 
>>> consistent across host and device, but why do we need to make  
>>> __GCC_ATOMIC_XXX_LOCK_FREE macros the same between the host and device? Is 
>>> there any concrete issue if they are not the same?
>>
>> The reason is the same as NVPTX, see 
>> https://github.com/llvm/llvm-project/blob/22882c39df71397cc6f9774d18e87d06e016c55f/clang/lib/Basic/Targets/NVPTX.cpp#L137-L141.
>>  Without it, we won't be able to use libraries that statically check the 
>> __atomic_always_lock_free. I could add the comments in the code if that 
>> makes things more clear.
>
> I see. Better add some comments about that.
>
> This also means backend needs to handle atomic operations not supported by 
> hardware.

Yeah. It is probably the application developer's responsibility to not request 
atomics that are not supported by the hardware?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144047/new/

https://reviews.llvm.org/D144047

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D144047: [CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on host/device

2023-02-16 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao updated this revision to Diff 498145.
shangwuyao added a comment.

Run clang-format.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144047/new/

https://reviews.llvm.org/D144047

Files:
  clang/lib/Basic/Targets/SPIR.h
  clang/test/CodeGenCUDASPIRV/cuda-types.cu

Index: clang/test/CodeGenCUDASPIRV/cuda-types.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/cuda-types.cu
@@ -0,0 +1,56 @@
+// Check that types, widths, __CLANG_ATOMIC* macros, etc. match on the host and
+// device sides of CUDA compilations. Note that we filter out long double and
+// maxwidth of _BitInt(), as this is intentionally different on host and device.
+//
+// Also ignore __CLANG_ATOMIC_LLONG_LOCK_FREE on i386. The default host CPU for
+// an i386 triple is typically at least an i586, which has cmpxchg8b (Clang
+// feature, "cx8"). Therefore, __CLANG_ATOMIC_LLONG_LOCK_FREE is 2 on the host,
+// but the value should be 1 for the device.
+//
+// Unlike CUDA, the width of SPIR-V POINTER type could differ between host and
+// device, because SPIR-V explicitly sets POINTER type width. So it is the
+// user's responsibility to choose the offload with the right POINTER size,
+// otherwise the values for __CLANG_ATOMIC_POINTER_LOCK_FREE could be different.
+
+// RUN: mkdir -p %t
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv32 -target i386-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv32 -target i386-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-device-defines-filtered
+// RUN: diff %t/i386-host-defines-filtered %t/i386-device-defines-filtered
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv32 -target i386-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-msvc-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv32 -target i386-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-msvc-device-defines-filtered
+// RUN: diff %t/i386-msvc-host-defines-filtered %t/i386-msvc-device-defines-filtered
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-device-defines-filtered
+// RUN: diff %t/x86_64-host-defines-filtered %t/x86_64-device-defines-filtered
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv64 -target powerpc64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/powerpc64-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv64 -target powerpc64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/powerpc64-device-defines-filtered
+// RUN: diff %t/powerpc64-host-defines-filtered %t/powerpc64-device-defines-filtered
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-msvc-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-msvc-device-defines-filtered
+// RUN: diff %t/x86_64-msvc-host-defines-filtered %t/x86_64-msvc-device-defines-filtered
+
Index: clang/lib/Basic/Targets/SPIR.h
===
--- clang/lib/Basic/Targets/SPIR.h
+++ clang/lib/Basic/Targets/SPIR.h
@@ -13,6 +13,7 @@
 #ifndef LLVM_CLANG_LIB_BASIC_TARGETS_SPIR_H
 #define LLVM_CLANG_LIB_BASIC_TARGETS_SPIR_H
 
+#include "Targets.h"
 #include "clang/Basic/TargetInfo.h"
 #include "clang/Basic/TargetOptions.h"
 #include "llvm/Support/Compiler.h"
@@ -79,8 +80,10 @@
 
 // Base cl

[PATCH] D144047: [CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on host/device

2023-02-21 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao added a comment.

Friendly ping :-)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144047/new/

https://reviews.llvm.org/D144047

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D144047: [CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on host/device

2023-02-22 Thread Shangwu Yao via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rG8bd13ad6c537: [CUDA][SPIRV] Match builtin types and 
__GCC_ATOMIC_XXX_LOCK_FREE macros on… (authored by shangwuyao).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144047/new/

https://reviews.llvm.org/D144047

Files:
  clang/lib/Basic/Targets/SPIR.h
  clang/test/CodeGenCUDASPIRV/cuda-types.cu

Index: clang/test/CodeGenCUDASPIRV/cuda-types.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/cuda-types.cu
@@ -0,0 +1,56 @@
+// Check that types, widths, __CLANG_ATOMIC* macros, etc. match on the host and
+// device sides of CUDA compilations. Note that we filter out long double and
+// maxwidth of _BitInt(), as this is intentionally different on host and device.
+//
+// Also ignore __CLANG_ATOMIC_LLONG_LOCK_FREE on i386. The default host CPU for
+// an i386 triple is typically at least an i586, which has cmpxchg8b (Clang
+// feature, "cx8"). Therefore, __CLANG_ATOMIC_LLONG_LOCK_FREE is 2 on the host,
+// but the value should be 1 for the device.
+//
+// Unlike CUDA, the width of SPIR-V POINTER type could differ between host and
+// device, because SPIR-V explicitly sets POINTER type width. So it is the
+// user's responsibility to choose the offload with the right POINTER size,
+// otherwise the values for __CLANG_ATOMIC_POINTER_LOCK_FREE could be different.
+
+// RUN: mkdir -p %t
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv32 -target i386-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv32 -target i386-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-device-defines-filtered
+// RUN: diff %t/i386-host-defines-filtered %t/i386-device-defines-filtered
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv32 -target i386-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-msvc-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv32 -target i386-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-msvc-device-defines-filtered
+// RUN: diff %t/i386-msvc-host-defines-filtered %t/i386-msvc-device-defines-filtered
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-device-defines-filtered
+// RUN: diff %t/x86_64-host-defines-filtered %t/x86_64-device-defines-filtered
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv64 -target powerpc64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/powerpc64-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv64 -target powerpc64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/powerpc64-device-defines-filtered
+// RUN: diff %t/powerpc64-host-defines-filtered %t/powerpc64-device-defines-filtered
+
+// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-msvc-host-defines-filtered
+// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
+// RUN:   | grep -E '__CLANG_ATOMIC' \
+// RUN:   | grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-msvc-device-defines-filtered
+// RUN: diff %t/x86_64-msvc-host-defines-filtered %t/x86_64-msvc-device-defines-filtered
+
Index: clang/lib/Basic/Targets/SPIR.h
===
--- clang/lib/Basic/Targets/SPIR.h
+++ clang/lib/Basic/Targets/SPIR.h
@@ -13,6 +13,7 @@
 #ifndef LLVM_CLANG_LIB_BASIC_TARGETS_SPIR_H
 #define LLVM_CLANG_LIB_BASIC_TARGETS_SPIR_H
 
+#include "Targets.h"
 #include "clang/Basic/TargetInf

[PATCH] D120366: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

2022-02-22 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao created this revision.
shangwuyao added reviewers: jlebar, tra, yaxunl.
Herald added subscribers: carlosgalvezp, ThomasRaoux, Anastasia.
shangwuyao requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

This patch converts CUDA pointer kernel arguments with default address space to
CrossWorkGroup address space (__global in OpenCL). This is because Generic or
Function (OpenCL's private) is not supported as storage class for kernel 
pointer types.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D120366

Files:
  clang/lib/Basic/Targets/SPIR.h
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGenCUDASPIRV/kernel-argument.cu


Index: clang/test/CodeGenCUDASPIRV/kernel-argument.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/kernel-argument.cu
@@ -0,0 +1,18 @@
+// Tests CUDA kernel arguments get global address space when targetting SPIR-V.
+
+// REQUIRES: clang-driver
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv32 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv64 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// CHECK: define
+// CHECK-SAME: spir_kernel void @_Z6kernelPi(i32 addrspace(1)* noundef
+
+__attribute__((global)) void kernel(int* output) { *output = 1; }
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -10320,10 +10320,10 @@
 }
 
 ABIArgInfo SPIRVABIInfo::classifyKernelArgumentType(QualType Ty) const {
-  if (getContext().getLangOpts().HIP) {
+  if (getContext().getLangOpts().CUDAIsDevice) {
 // Coerce pointer arguments with default address space to CrossWorkGroup
-// pointers for HIPSPV. When the language mode is HIP, the SPIRTargetInfo
-// maps cuda_device to SPIR-V's CrossWorkGroup address space.
+// pointers for HIPSPV/CUDASPV. When the language mode is HIP/CUDA, the
+// SPIRTargetInfo maps cuda_device to SPIR-V's CrossWorkGroup address 
space.
 llvm::Type *LTy = CGT.ConvertType(Ty);
 auto DefaultAS = getContext().getTargetAddressSpace(LangAS::Default);
 auto GlobalAS = getContext().getTargetAddressSpace(LangAS::cuda_device);
Index: clang/lib/Basic/Targets/SPIR.h
===
--- clang/lib/Basic/Targets/SPIR.h
+++ clang/lib/Basic/Targets/SPIR.h
@@ -144,16 +144,16 @@
 // FIXME: SYCL specification considers unannotated pointers and references
 // to be pointing to the generic address space. See section 5.9.3 of
 // SYCL 2020 specification.
-// Currently, there is no way of representing SYCL's and HIP's default
+// Currently, there is no way of representing SYCL's and HIP/CUDA's default
 // address space language semantic along with the semantics of embedded C's
 // default address space in the same address space map. Hence the map needs
 // to be reset to allow mapping to the desired value of 'Default' entry for
-// SYCL and HIP.
+// SYCL and HIP/CUDA.
 setAddressSpaceMap(
 /*DefaultIsGeneric=*/Opts.SYCLIsDevice ||
-// The address mapping from HIP language for device code is only 
defined
-// for SPIR-V.
-(getTriple().isSPIRV() && Opts.HIP && Opts.CUDAIsDevice));
+// The address mapping from HIP/CUDA language for device code is only
+// defined for SPIR-V.
+(getTriple().isSPIRV() && Opts.CUDAIsDevice));
   }
 
   void setSupportedOpenCLOpts() override {


Index: clang/test/CodeGenCUDASPIRV/kernel-argument.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/kernel-argument.cu
@@ -0,0 +1,18 @@
+// Tests CUDA kernel arguments get global address space when targetting SPIR-V.
+
+// REQUIRES: clang-driver
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv32 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv64 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// CHECK: define
+// CHECK-SAME: spir_kernel void @_Z6kernelPi(i32 addrspace(1)* noundef
+
+__attribute__((global)) void kernel(int* output) { *output = 1; }
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -10320,10 +10320,10 @@
 }
 
 ABIArgInfo SPIRVABIInfo::classifyKernelArgumentType(QualType Ty) const {
-  i

[PATCH] D120366: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

2022-02-22 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao added a comment.

Looking into the (new) build failure on Windows, since the change has already 
been reviewed, will try to commit after resolving the build failure.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120366/new/

https://reviews.llvm.org/D120366

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120366: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

2022-02-24 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao updated this revision to Diff 411241.
shangwuyao added a comment.

Disabled a hip test on Windows that's breaking on head.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120366/new/

https://reviews.llvm.org/D120366

Files:
  clang/lib/Basic/Targets/SPIR.h
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGenCUDASPIRV/kernel-argument.cu
  clang/test/Driver/hip-link-bundle-archive.hip


Index: clang/test/Driver/hip-link-bundle-archive.hip
===
--- clang/test/Driver/hip-link-bundle-archive.hip
+++ clang/test/Driver/hip-link-bundle-archive.hip
@@ -1,4 +1,5 @@
 // REQUIRES: clang-driver, x86-registered-target, amdgpu-registered-target
+// UNSUPPORTED: system-windows
 
 // RUN: touch %T/libhipBundled.a
 
Index: clang/test/CodeGenCUDASPIRV/kernel-argument.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/kernel-argument.cu
@@ -0,0 +1,18 @@
+// Tests CUDA kernel arguments get global address space when targetting SPIR-V.
+
+// REQUIRES: clang-driver
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv32 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv64 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// CHECK: define
+// CHECK-SAME: spir_kernel void @_Z6kernelPi(i32 addrspace(1)* noundef
+
+__attribute__((global)) void kernel(int* output) { *output = 1; }
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -10320,10 +10320,10 @@
 }
 
 ABIArgInfo SPIRVABIInfo::classifyKernelArgumentType(QualType Ty) const {
-  if (getContext().getLangOpts().HIP) {
+  if (getContext().getLangOpts().CUDAIsDevice) {
 // Coerce pointer arguments with default address space to CrossWorkGroup
-// pointers for HIPSPV. When the language mode is HIP, the SPIRTargetInfo
-// maps cuda_device to SPIR-V's CrossWorkGroup address space.
+// pointers for HIPSPV/CUDASPV. When the language mode is HIP/CUDA, the
+// SPIRTargetInfo maps cuda_device to SPIR-V's CrossWorkGroup address 
space.
 llvm::Type *LTy = CGT.ConvertType(Ty);
 auto DefaultAS = getContext().getTargetAddressSpace(LangAS::Default);
 auto GlobalAS = getContext().getTargetAddressSpace(LangAS::cuda_device);
Index: clang/lib/Basic/Targets/SPIR.h
===
--- clang/lib/Basic/Targets/SPIR.h
+++ clang/lib/Basic/Targets/SPIR.h
@@ -144,16 +144,16 @@
 // FIXME: SYCL specification considers unannotated pointers and references
 // to be pointing to the generic address space. See section 5.9.3 of
 // SYCL 2020 specification.
-// Currently, there is no way of representing SYCL's and HIP's default
+// Currently, there is no way of representing SYCL's and HIP/CUDA's default
 // address space language semantic along with the semantics of embedded C's
 // default address space in the same address space map. Hence the map needs
 // to be reset to allow mapping to the desired value of 'Default' entry for
-// SYCL and HIP.
+// SYCL and HIP/CUDA.
 setAddressSpaceMap(
 /*DefaultIsGeneric=*/Opts.SYCLIsDevice ||
-// The address mapping from HIP language for device code is only 
defined
-// for SPIR-V.
-(getTriple().isSPIRV() && Opts.HIP && Opts.CUDAIsDevice));
+// The address mapping from HIP/CUDA language for device code is only
+// defined for SPIR-V.
+(getTriple().isSPIRV() && Opts.CUDAIsDevice));
   }
 
   void setSupportedOpenCLOpts() override {


Index: clang/test/Driver/hip-link-bundle-archive.hip
===
--- clang/test/Driver/hip-link-bundle-archive.hip
+++ clang/test/Driver/hip-link-bundle-archive.hip
@@ -1,4 +1,5 @@
 // REQUIRES: clang-driver, x86-registered-target, amdgpu-registered-target
+// UNSUPPORTED: system-windows
 
 // RUN: touch %T/libhipBundled.a
 
Index: clang/test/CodeGenCUDASPIRV/kernel-argument.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/kernel-argument.cu
@@ -0,0 +1,18 @@
+// Tests CUDA kernel arguments get global address space when targetting SPIR-V.
+
+// REQUIRES: clang-driver
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv32 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv64 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc

[PATCH] D120366: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

2022-02-24 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao added a comment.

In D120366#3344221 , @jlebar wrote:

> - What's different in this patch vs the previous one?

Previous patch broke at two different post-commit build configurations. The 
generated SPIR-V are:

  define hidden spir_kernel void @_Z6kernelPi(i32 addrspace(1)* noundef 
%output.coerce) #0 {

  define spir_kernel void @_Z6kernelPi(i32 addrspace(1)* noundef %0) #0 {

And the original test:

  // CHECK: define spir_kernel void @_Z6kernelPi(i32 addrspace(1)* noundef 
%output.coerce)

Changed that to below so that it could handle those two build configurations 
correctly.

  // CHECK: define
  // CHECK-SAME: spir_kernel void @_Z6kernelPi(i32 addrspace(1)* noundef

(The previous reverted patch could have been reopened so that the change is 
more clear, but didn't know such option exist until recently.)

> - *Disabled a hip test on Windows that's breaking on head.* Can you clarify: 
> Is this test broken at HEAD, or does it break with your patch?
>
>   If it's broken at HEAD, then it should be disabled in a separate patch.
>
>   If it breaks with your patch, can you explain why it should be disabled 
> rather than fixed?

It is broken at HEAD, will add another patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120366/new/

https://reviews.llvm.org/D120366

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120366: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

2022-02-24 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao updated this revision to Diff 411258.
shangwuyao added a comment.

Disabling the test failing at HEAD with a separate patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120366/new/

https://reviews.llvm.org/D120366

Files:
  clang/lib/Basic/Targets/SPIR.h
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGenCUDASPIRV/kernel-argument.cu


Index: clang/test/CodeGenCUDASPIRV/kernel-argument.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/kernel-argument.cu
@@ -0,0 +1,18 @@
+// Tests CUDA kernel arguments get global address space when targetting SPIR-V.
+
+// REQUIRES: clang-driver
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv32 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv64 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// CHECK: define
+// CHECK-SAME: spir_kernel void @_Z6kernelPi(i32 addrspace(1)* noundef
+
+__attribute__((global)) void kernel(int* output) { *output = 1; }
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -10320,10 +10320,10 @@
 }
 
 ABIArgInfo SPIRVABIInfo::classifyKernelArgumentType(QualType Ty) const {
-  if (getContext().getLangOpts().HIP) {
+  if (getContext().getLangOpts().CUDAIsDevice) {
 // Coerce pointer arguments with default address space to CrossWorkGroup
-// pointers for HIPSPV. When the language mode is HIP, the SPIRTargetInfo
-// maps cuda_device to SPIR-V's CrossWorkGroup address space.
+// pointers for HIPSPV/CUDASPV. When the language mode is HIP/CUDA, the
+// SPIRTargetInfo maps cuda_device to SPIR-V's CrossWorkGroup address 
space.
 llvm::Type *LTy = CGT.ConvertType(Ty);
 auto DefaultAS = getContext().getTargetAddressSpace(LangAS::Default);
 auto GlobalAS = getContext().getTargetAddressSpace(LangAS::cuda_device);
Index: clang/lib/Basic/Targets/SPIR.h
===
--- clang/lib/Basic/Targets/SPIR.h
+++ clang/lib/Basic/Targets/SPIR.h
@@ -144,16 +144,16 @@
 // FIXME: SYCL specification considers unannotated pointers and references
 // to be pointing to the generic address space. See section 5.9.3 of
 // SYCL 2020 specification.
-// Currently, there is no way of representing SYCL's and HIP's default
+// Currently, there is no way of representing SYCL's and HIP/CUDA's default
 // address space language semantic along with the semantics of embedded C's
 // default address space in the same address space map. Hence the map needs
 // to be reset to allow mapping to the desired value of 'Default' entry for
-// SYCL and HIP.
+// SYCL and HIP/CUDA.
 setAddressSpaceMap(
 /*DefaultIsGeneric=*/Opts.SYCLIsDevice ||
-// The address mapping from HIP language for device code is only 
defined
-// for SPIR-V.
-(getTriple().isSPIRV() && Opts.HIP && Opts.CUDAIsDevice));
+// The address mapping from HIP/CUDA language for device code is only
+// defined for SPIR-V.
+(getTriple().isSPIRV() && Opts.CUDAIsDevice));
   }
 
   void setSupportedOpenCLOpts() override {


Index: clang/test/CodeGenCUDASPIRV/kernel-argument.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/kernel-argument.cu
@@ -0,0 +1,18 @@
+// Tests CUDA kernel arguments get global address space when targetting SPIR-V.
+
+// REQUIRES: clang-driver
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv32 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv64 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// CHECK: define
+// CHECK-SAME: spir_kernel void @_Z6kernelPi(i32 addrspace(1)* noundef
+
+__attribute__((global)) void kernel(int* output) { *output = 1; }
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -10320,10 +10320,10 @@
 }
 
 ABIArgInfo SPIRVABIInfo::classifyKernelArgumentType(QualType Ty) const {
-  if (getContext().getLangOpts().HIP) {
+  if (getContext().getLangOpts().CUDAIsDevice) {
 // Coerce pointer arguments with default address space to CrossWorkGroup
-// pointers for HIPSPV. When the language mode is HIP, the SPIRTargetInfo
-// maps cuda_device to SPIR-V's CrossWorkGroup address space.

[PATCH] D120366: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

2022-02-24 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao added a comment.

@yaxunl I saw that you added the test recently, could you provide some context? 
I think this test is broken at HEAD as I saw it is broken for other patches 
(see this build ) as well.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120366/new/

https://reviews.llvm.org/D120366

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120529: Disable broken hip test on Windows

2022-02-24 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao created this revision.
shangwuyao added reviewers: tra, jlebar, yaxunl.
shangwuyao requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Disable a hip test that's broken only for Windows at HEAD.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D120529

Files:
  clang/test/Driver/hip-link-bundle-archive.hip


Index: clang/test/Driver/hip-link-bundle-archive.hip
===
--- clang/test/Driver/hip-link-bundle-archive.hip
+++ clang/test/Driver/hip-link-bundle-archive.hip
@@ -1,4 +1,5 @@
 // REQUIRES: clang-driver, x86-registered-target, amdgpu-registered-target
+// UNSUPPORTED: system-windows
 
 // RUN: touch %T/libhipBundled.a
 


Index: clang/test/Driver/hip-link-bundle-archive.hip
===
--- clang/test/Driver/hip-link-bundle-archive.hip
+++ clang/test/Driver/hip-link-bundle-archive.hip
@@ -1,4 +1,5 @@
 // REQUIRES: clang-driver, x86-registered-target, amdgpu-registered-target
+// UNSUPPORTED: system-windows
 
 // RUN: touch %T/libhipBundled.a
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120529: Disable broken hip test on Windows

2022-02-24 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao added a comment.

See the comments in https://reviews.llvm.org/D120366, an example test failure 
is in https://reviews.llvm.org/harbormaster/build/224364/.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120529/new/

https://reviews.llvm.org/D120529

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120529: Disable broken hip test on Windows

2022-02-24 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao added a comment.

Seems like

  
C:\\Users\\ContainerAdministrator\\AppData\\Local\\Temp\\lit-tmp-4x8dbzx6\\libbc-hipBundled-amdgcn-gfx1030-4b53df.a

somehow got interpreted by filecheck regex as:

  
C:UsersContainerAdministratorAppDataLocalTemplit-tmp-4x8dbzx6libbc-hipBundled-amdgcn-gfx1030-4b53df\\.a


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120529/new/

https://reviews.llvm.org/D120529

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120563: [HIP] Fix test hip-link-bundled-archive.hip

2022-02-25 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao accepted this revision.
shangwuyao added a comment.
This revision is now accepted and ready to land.

LGTM, thanks for fixing this.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120563/new/

https://reviews.llvm.org/D120563

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120529: Disable broken hip test on Windows

2022-02-25 Thread Shangwu Yao via Phabricator via cfe-commits

shangwuyao abandoned this revision.
shangwuyao added a comment.

Closing this since the fix landed with https://reviews.llvm.org/D120563.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120529/new/

https://reviews.llvm.org/D120529

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D123049: Emit OpenCL metadata when targeting SPIR-V

[PATCH] D123049: Emit OpenCL metadata when targeting SPIR-V

[PATCH] D123049: Emit OpenCL metadata when targeting SPIR-V

[PATCH] D119207: [CUDA][SPIRV] Convert CUDA kernels to SPIR-V kernels

[PATCH] D119207: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

[PATCH] D119207: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

[PATCH] D119207: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

[PATCH] D130387: [CUDA/SPIR-V] Force passing aggregate type byval

[PATCH] D130387: [CUDA/SPIR-V] Force passing aggregate type byval

[PATCH] D130387: [CUDA/SPIR-V] Force passing aggregate type byval

[PATCH] D130387: [CUDA/SPIR-V] Force passing aggregate type byval

[PATCH] D140226: [NVPTX] Introduce attribute to mark kernels without a language mode

[PATCH] D144047: [CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on host/device

[PATCH] D144047: [CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on host/device

[PATCH] D144047: [CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on host/device

[PATCH] D144047: [CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on host/device

[PATCH] D144047: [CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on host/device

[PATCH] D144047: [CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on host/device

[PATCH] D144047: [CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on host/device

[PATCH] D120366: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

[PATCH] D120366: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

[PATCH] D120366: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

[PATCH] D120366: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

[PATCH] D120366: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

[PATCH] D120366: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

[PATCH] D120529: Disable broken hip test on Windows

[PATCH] D120529: Disable broken hip test on Windows

[PATCH] D120529: Disable broken hip test on Windows

[PATCH] D120563: [HIP] Fix test hip-link-bundled-archive.hip

[PATCH] D120529: Disable broken hip test on Windows

30 matches

Site Navigation

Mail list logo

Footer information