from:"Austin Schuh via cfe\-commits"

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-03-24 Thread Austin Schuh via cfe-commits


https://github.com/AustinSchuh created 
https://github.com/llvm/llvm-project/pull/132883

This adds support for all the surface read and write calls to clang. It extends 
the pattern used for textures to surfaces too.

I tested this by generating all the various permutations of the calls and 
argument types in a python script, compiling them with both clang and nvcc, and 
comparing the generated ptx for equivilence.  They all agree, ignoring register 
allocation, and some places where Clang picks different memory write 
instructions.  An example kernel is:

__global__ void testKernel(cudaSurfaceObject_t surfObj, int x, float2* result) {
*result = surf1Dread(surfObj, x, cudaBoundaryModeZero);
}

>From d8ffb5acbd79869476c91433f85488f3088e38fd Mon Sep 17 00:00:00 2001
From: Austin Schuh 
Date: Mon, 24 Mar 2025 21:42:35 -0700
Subject: [PATCH] Add support for CUDA surfaces

This adds support for all the surface read and write calls to clang.
It extends the pattern used for textures to surfaces too.

I tested this by generating all the various permutations of the calls
and argument types in a python script, compiling them with both clang
and nvcc, and comparing the generated ptx for equivilence.  They all
agree, ignoring register allocation, and some places where Clang does
different memory writes.  An example kernel is:

__global__ void testKernel(cudaSurfaceObject_t surfObj, int x, float2* result)
{
*result = surf1Dread(surfObj, x, cudaBoundaryModeZero);
}

Signed-off-by: Austin Schuh 
---
 .../Headers/__clang_cuda_runtime_wrapper.h|   1 +
 .../Headers/__clang_cuda_texture_intrinsics.h | 419 +-
 2 files changed, 418 insertions(+), 2 deletions(-)

diff --git a/clang/lib/Headers/__clang_cuda_runtime_wrapper.h 
b/clang/lib/Headers/__clang_cuda_runtime_wrapper.h
index d369c86fe1064..8182c961ec32f 100644
--- a/clang/lib/Headers/__clang_cuda_runtime_wrapper.h
+++ b/clang/lib/Headers/__clang_cuda_runtime_wrapper.h
@@ -386,6 +386,7 @@ __host__ __device__ void __nv_tex_surf_handler(const char 
*name, T *ptr,
 #endif // __cplusplus >= 201103L && CUDA_VERSION >= 9000
 #include "texture_fetch_functions.h"
 #include "texture_indirect_functions.h"
+#include "surface_indirect_functions.h"
 
 // Restore state of __CUDA_ARCH__ and __THROW we had on entry.
 #pragma pop_macro("__CUDA_ARCH__")
diff --git a/clang/lib/Headers/__clang_cuda_texture_intrinsics.h 
b/clang/lib/Headers/__clang_cuda_texture_intrinsics.h
index a71952211237b..2ea83f66036d4 100644
--- a/clang/lib/Headers/__clang_cuda_texture_intrinsics.h
+++ b/clang/lib/Headers/__clang_cuda_texture_intrinsics.h
@@ -28,6 +28,7 @@
 #pragma push_macro("__Args")
 #pragma push_macro("__ID")
 #pragma push_macro("__IDV")
+#pragma push_macro("__OP_TYPE_SURFACE")
 #pragma push_macro("__IMPL_2DGATHER")
 #pragma push_macro("__IMPL_ALIAS")
 #pragma push_macro("__IMPL_ALIASI")
@@ -45,6 +46,64 @@
 #pragma push_macro("__IMPL_SI")
 #pragma push_macro("__L")
 #pragma push_macro("__STRIP_PARENS")
+#pragma push_macro("__SURF_WRITE_V2")
+#pragma push_macro("__SW_ASM_ARGS")
+#pragma push_macro("__SW_ASM_ARGS1")
+#pragma push_macro("__SW_ASM_ARGS2")
+#pragma push_macro("__SW_ASM_ARGS4")
+#pragma push_macro("__SURF_WRITE_V2")
+#pragma push_macro("__SURF_READ_V2")
+#pragma push_macro("__SW_ASM_ARGS")
+#pragma push_macro("__SW_ASM_ARGS1")
+#pragma push_macro("__SW_ASM_ARGS2")
+#pragma push_macro("__SW_ASM_ARGS4")
+#pragma push_macro("__SURF_READ1D");
+#pragma push_macro("__SURF_READ2D");
+#pragma push_macro("__SURF_READ3D");
+#pragma push_macro("__SURF_READ1DLAYERED");
+#pragma push_macro("__SURF_READ2DLAYERED");
+#pragma push_macro("__SURF_READCUBEMAP");
+#pragma push_macro("__SURF_READCUBEMAPLAYERED");
+#pragma push_macro("__1DV1");
+#pragma push_macro("__1DV2");
+#pragma push_macro("__1DV4");
+#pragma push_macro("__2DV1");
+#pragma push_macro("__2DV2");
+#pragma push_macro("__2DV4");
+#pragma push_macro("__1DLAYERV1");
+#pragma push_macro("__1DLAYERV2");
+#pragma push_macro("__1DLAYERV4");
+#pragma push_macro("__3DV1");
+#pragma push_macro("__3DV2");
+#pragma push_macro("__3DV4");
+#pragma push_macro("__2DLAYERV1");
+#pragma push_macro("__2DLAYERV2");
+#pragma push_macro("__2DLAYERV4");
+#pragma push_macro("__CUBEMAPV1");
+#pragma push_macro("__CUBEMAPV2");
+#pragma push_macro("__CUBEMAPV4");
+#pragma push_macro("__CUBEMAPLAYERV1");
+#pragma push_macro("__CUBEMAPLAYERV2");
+#pragma push_macro("__CUBEMAPLAYERV4");
+#pragma push_macro("__SURF_READXD_ALL");
+#pragma push_macro("__SURF_WRITE1D_V2");
+#pragma push_macro("__SURF_WRITE1DLAYERED_V2");
+#pragma push_macro("__SURF_WRITE2D_V2");
+#pragma push_macro("__SURF_WRITE2DLAYERED_V2");
+#pragma push_macro("__SURF_WRITE3D_V2");
+#pragma push_macro("__SURF_CUBEMAPWRITE_V2");
+#pragma push_macro("__SURF_CUBEMAPLAYEREDWRITE_V2");
+#pragma push_macro("__SURF_WRITEXD_V2_ALL");
+#pragma push_macro("__1DV1");
+#pragma push_macro("__1DV2");
+#pragma push_macro("__1DV4");
+#pragma push_macro("__2DV1");
+#pragma push_macro("__2DV2"

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-03-24 Thread Austin Schuh via cfe-commits


AustinSchuh wrote:

@Artem-B I think you added texture support originally.

A lot of the language in that file is focused on just textures, not textures 
and surfaces.  I am happy to adjust that if that is desired.  I figured a bit 
ugly, working, and early feedback was preferable.

https://github.com/llvm/llvm-project/pull/132883
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] cuda clang: Fix argument order for __reduce_max_sync (PR #132881)

2025-04-05 Thread Austin Schuh via cfe-commits



@@ -315,7 +315,7 @@ defm MATCH_ALLP_SYNC_64 : MATCH_ALLP_SYNC {
   def : NVPTXInst<(outs Int32Regs:$dst), (ins Int32Regs:$src, Int32Regs:$mask),
   "redux.sync." # BinOp # "." # PTXType # " $dst, $src, $mask;",
-  [(set i32:$dst, (Intrin i32:$src, Int32Regs:$mask))]>,
+  [(set i32:$dst, (Intrin i32:$mask, Int32Regs:$src))]>,

AustinSchuh wrote:

Cool.  I pushed a second commit in this review to do that instead.  (I can 
squash if you want, I'm not sure the best strategy to keep this readable)

https://github.com/llvm/llvm-project/pull/132881
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-04-05 Thread Austin Schuh via cfe-commits


https://github.com/AustinSchuh edited 
https://github.com/llvm/llvm-project/pull/132883
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-04-05 Thread Austin Schuh via cfe-commits


https://github.com/AustinSchuh edited 
https://github.com/llvm/llvm-project/pull/132883
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-03-25 Thread Austin Schuh via cfe-commits


https://github.com/AustinSchuh updated 
https://github.com/llvm/llvm-project/pull/132883

>From d8ffb5acbd79869476c91433f85488f3088e38fd Mon Sep 17 00:00:00 2001
From: Austin Schuh 
Date: Mon, 24 Mar 2025 21:42:35 -0700
Subject: [PATCH 1/4] Add support for CUDA surfaces

This adds support for all the surface read and write calls to clang.
It extends the pattern used for textures to surfaces too.

I tested this by generating all the various permutations of the calls
and argument types in a python script, compiling them with both clang
and nvcc, and comparing the generated ptx for equivilence.  They all
agree, ignoring register allocation, and some places where Clang does
different memory writes.  An example kernel is:

__global__ void testKernel(cudaSurfaceObject_t surfObj, int x, float2* result)
{
*result = surf1Dread(surfObj, x, cudaBoundaryModeZero);
}

Signed-off-by: Austin Schuh 
---
 .../Headers/__clang_cuda_runtime_wrapper.h|   1 +
 .../Headers/__clang_cuda_texture_intrinsics.h | 419 +-
 2 files changed, 418 insertions(+), 2 deletions(-)

diff --git a/clang/lib/Headers/__clang_cuda_runtime_wrapper.h 
b/clang/lib/Headers/__clang_cuda_runtime_wrapper.h
index d369c86fe1064..8182c961ec32f 100644
--- a/clang/lib/Headers/__clang_cuda_runtime_wrapper.h
+++ b/clang/lib/Headers/__clang_cuda_runtime_wrapper.h
@@ -386,6 +386,7 @@ __host__ __device__ void __nv_tex_surf_handler(const char 
*name, T *ptr,
 #endif // __cplusplus >= 201103L && CUDA_VERSION >= 9000
 #include "texture_fetch_functions.h"
 #include "texture_indirect_functions.h"
+#include "surface_indirect_functions.h"
 
 // Restore state of __CUDA_ARCH__ and __THROW we had on entry.
 #pragma pop_macro("__CUDA_ARCH__")
diff --git a/clang/lib/Headers/__clang_cuda_texture_intrinsics.h 
b/clang/lib/Headers/__clang_cuda_texture_intrinsics.h
index a71952211237b..2ea83f66036d4 100644
--- a/clang/lib/Headers/__clang_cuda_texture_intrinsics.h
+++ b/clang/lib/Headers/__clang_cuda_texture_intrinsics.h
@@ -28,6 +28,7 @@
 #pragma push_macro("__Args")
 #pragma push_macro("__ID")
 #pragma push_macro("__IDV")
+#pragma push_macro("__OP_TYPE_SURFACE")
 #pragma push_macro("__IMPL_2DGATHER")
 #pragma push_macro("__IMPL_ALIAS")
 #pragma push_macro("__IMPL_ALIASI")
@@ -45,6 +46,64 @@
 #pragma push_macro("__IMPL_SI")
 #pragma push_macro("__L")
 #pragma push_macro("__STRIP_PARENS")
+#pragma push_macro("__SURF_WRITE_V2")
+#pragma push_macro("__SW_ASM_ARGS")
+#pragma push_macro("__SW_ASM_ARGS1")
+#pragma push_macro("__SW_ASM_ARGS2")
+#pragma push_macro("__SW_ASM_ARGS4")
+#pragma push_macro("__SURF_WRITE_V2")
+#pragma push_macro("__SURF_READ_V2")
+#pragma push_macro("__SW_ASM_ARGS")
+#pragma push_macro("__SW_ASM_ARGS1")
+#pragma push_macro("__SW_ASM_ARGS2")
+#pragma push_macro("__SW_ASM_ARGS4")
+#pragma push_macro("__SURF_READ1D");
+#pragma push_macro("__SURF_READ2D");
+#pragma push_macro("__SURF_READ3D");
+#pragma push_macro("__SURF_READ1DLAYERED");
+#pragma push_macro("__SURF_READ2DLAYERED");
+#pragma push_macro("__SURF_READCUBEMAP");
+#pragma push_macro("__SURF_READCUBEMAPLAYERED");
+#pragma push_macro("__1DV1");
+#pragma push_macro("__1DV2");
+#pragma push_macro("__1DV4");
+#pragma push_macro("__2DV1");
+#pragma push_macro("__2DV2");
+#pragma push_macro("__2DV4");
+#pragma push_macro("__1DLAYERV1");
+#pragma push_macro("__1DLAYERV2");
+#pragma push_macro("__1DLAYERV4");
+#pragma push_macro("__3DV1");
+#pragma push_macro("__3DV2");
+#pragma push_macro("__3DV4");
+#pragma push_macro("__2DLAYERV1");
+#pragma push_macro("__2DLAYERV2");
+#pragma push_macro("__2DLAYERV4");
+#pragma push_macro("__CUBEMAPV1");
+#pragma push_macro("__CUBEMAPV2");
+#pragma push_macro("__CUBEMAPV4");
+#pragma push_macro("__CUBEMAPLAYERV1");
+#pragma push_macro("__CUBEMAPLAYERV2");
+#pragma push_macro("__CUBEMAPLAYERV4");
+#pragma push_macro("__SURF_READXD_ALL");
+#pragma push_macro("__SURF_WRITE1D_V2");
+#pragma push_macro("__SURF_WRITE1DLAYERED_V2");
+#pragma push_macro("__SURF_WRITE2D_V2");
+#pragma push_macro("__SURF_WRITE2DLAYERED_V2");
+#pragma push_macro("__SURF_WRITE3D_V2");
+#pragma push_macro("__SURF_CUBEMAPWRITE_V2");
+#pragma push_macro("__SURF_CUBEMAPLAYEREDWRITE_V2");
+#pragma push_macro("__SURF_WRITEXD_V2_ALL");
+#pragma push_macro("__1DV1");
+#pragma push_macro("__1DV2");
+#pragma push_macro("__1DV4");
+#pragma push_macro("__2DV1");
+#pragma push_macro("__2DV2");
+#pragma push_macro("__2DV4");
+#pragma push_macro("__3DV1");
+#pragma push_macro("__3DV2");
+#pragma push_macro("__3DV4");
+
 
 // Put all functions into anonymous namespace so they have internal linkage.
 // The device-only function here must be internal in order to avoid ODR
@@ -186,6 +245,20 @@ template  struct __TypeInfoT {
   using __fetch_t = typename __TypeInfoT<__base_t>::__fetch_t;
 };
 
+// Tag structs to distinguish operation types
+struct __texture_op_tag {};
+struct __surface_op_tag {};
+
+// Template specialization to determine operation type based on tag value
+template 
+struct __op_type

[clang] [llvm] cuda clang: Fix argument order for __reduce_max_sync (PR #132881)

2025-03-25 Thread Austin Schuh via cfe-commits


https://github.com/AustinSchuh updated 
https://github.com/llvm/llvm-project/pull/132881

>From 381ba9f21b4c2a2c4028143b84e9f71c8a20692f Mon Sep 17 00:00:00 2001
From: Austin Schuh 
Date: Mon, 24 Mar 2025 21:42:45 -0700
Subject: [PATCH 1/2] cuda clang: Fix argument order for __reduce_max_sync

The following cuda kernel would crash with an "an illegal instruction
was encountered" message.

__global__ void testcode(const float* data, unsigned *max_value) {
unsigned r = static_cast(data[threadIdx.x]);

const unsigned mask = __ballot_sync(0x, true);

unsigned mx = __reduce_max_sync(mask, r);
atomicMax(max_value, mx);
}

Digging into the ptx from both nvcc and clang, I discovered that the
arguments for the mask and value were swapped.  This swaps them back.

Fixes: https://github.com/llvm/llvm-project/issues/131415
Signed-off-by: Austin Schuh 
---
 llvm/lib/Target/NVPTX/NVPTXIntrinsics.td | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td 
b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index b2e05a567b4fe..1943e94c3ee7a 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -315,7 +315,7 @@ defm MATCH_ALLP_SYNC_64 : MATCH_ALLP_SYNC {
   def : NVPTXInst<(outs Int32Regs:$dst), (ins Int32Regs:$src, Int32Regs:$mask),
   "redux.sync." # BinOp # "." # PTXType # " $dst, $src, $mask;",
-  [(set i32:$dst, (Intrin i32:$src, Int32Regs:$mask))]>,
+  [(set i32:$dst, (Intrin i32:$mask, Int32Regs:$src))]>,
 Requires<[hasPTX<70>, hasSM<80>]>;
 }
 

>From aa9148ce6af19a799f37aea6d973087c8fb7c53c Mon Sep 17 00:00:00 2001
From: Austin Schuh 
Date: Tue, 25 Mar 2025 12:12:47 -0700
Subject: [PATCH 2/2] Swap in __clang_cuda_intrinsics.h instead

---
 clang/lib/Headers/__clang_cuda_intrinsics.h | 16 
 llvm/lib/Target/NVPTX/NVPTXIntrinsics.td|  2 +-
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/clang/lib/Headers/__clang_cuda_intrinsics.h 
b/clang/lib/Headers/__clang_cuda_intrinsics.h
index a04e8b6de44d0..8b230af6f6647 100644
--- a/clang/lib/Headers/__clang_cuda_intrinsics.h
+++ b/clang/lib/Headers/__clang_cuda_intrinsics.h
@@ -515,32 +515,32 @@ __device__ inline cuuint32_t __nvvm_get_smem_pointer(void 
*__ptr) {
 #if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 800
 __device__ inline unsigned __reduce_add_sync(unsigned __mask,
  unsigned __value) {
-  return __nvvm_redux_sync_add(__mask, __value);
+  return __nvvm_redux_sync_add(__value, __mask);
 }
 __device__ inline unsigned __reduce_min_sync(unsigned __mask,
  unsigned __value) {
-  return __nvvm_redux_sync_umin(__mask, __value);
+  return __nvvm_redux_sync_umin(__value, __mask);
 }
 __device__ inline unsigned __reduce_max_sync(unsigned __mask,
  unsigned __value) {
-  return __nvvm_redux_sync_umax(__mask, __value);
+  return __nvvm_redux_sync_umax(__value, __mask);
 }
 __device__ inline int __reduce_min_sync(unsigned __mask, int __value) {
-  return __nvvm_redux_sync_min(__mask, __value);
+  return __nvvm_redux_sync_min(__value, __mask);
 }
 __device__ inline int __reduce_max_sync(unsigned __mask, int __value) {
-  return __nvvm_redux_sync_max(__mask, __value);
+  return __nvvm_redux_sync_max(__value, __mask);
 }
 __device__ inline unsigned __reduce_or_sync(unsigned __mask, unsigned __value) 
{
-  return __nvvm_redux_sync_or(__mask, __value);
+  return __nvvm_redux_sync_or(__value, __mask);
 }
 __device__ inline unsigned __reduce_and_sync(unsigned __mask,
  unsigned __value) {
-  return __nvvm_redux_sync_and(__mask, __value);
+  return __nvvm_redux_sync_and(__value, __mask);
 }
 __device__ inline unsigned __reduce_xor_sync(unsigned __mask,
  unsigned __value) {
-  return __nvvm_redux_sync_xor(__mask, __value);
+  return __nvvm_redux_sync_xor(__value, __mask);
 }
 
 __device__ inline void __nv_memcpy_async_shared_global_4(void *__dst,
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td 
b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 1943e94c3ee7a..b2e05a567b4fe 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -315,7 +315,7 @@ defm MATCH_ALLP_SYNC_64 : MATCH_ALLP_SYNC {
   def : NVPTXInst<(outs Int32Regs:$dst), (ins Int32Regs:$src, Int32Regs:$mask),
   "redux.sync." # BinOp # "." # PTXType # " $dst, $src, $mask;",
-  [(set i32:$dst, (Intrin i32:$mask, Int32Regs:$src))]>,
+  [(set i32:$dst, (Intrin i32:$src, Int32Regs:$mask))]>,
 Requires<[hasPTX<70>, hasSM<80>]>;
 }
 

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] cuda clang: Fix argument order for __reduce_max_sync (PR #132881)

2025-03-25 Thread Austin Schuh via cfe-commits



@@ -315,7 +315,7 @@ defm MATCH_ALLP_SYNC_64 : MATCH_ALLP_SYNC {
   def : NVPTXInst<(outs Int32Regs:$dst), (ins Int32Regs:$src, Int32Regs:$mask),
   "redux.sync." # BinOp # "." # PTXType # " $dst, $src, $mask;",
-  [(set i32:$dst, (Intrin i32:$src, Int32Regs:$mask))]>,
+  [(set i32:$dst, (Intrin i32:$mask, Int32Regs:$src))]>,

AustinSchuh wrote:

I swear I read the code multiple times, and couldn't see the swap.  I agree 
with you now, that's the right place to swap it.  The argument order matches 
the docs in the other spots, but not in __clang_cuda_intrinsics.h.  That also 
keeps the tests working (and explains why the tests didn't catch it).

https://github.com/llvm/llvm-project/pull/132881
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-03-25 Thread Austin Schuh via cfe-commits


https://github.com/AustinSchuh updated 
https://github.com/llvm/llvm-project/pull/132883

>From d8ffb5acbd79869476c91433f85488f3088e38fd Mon Sep 17 00:00:00 2001
From: Austin Schuh 
Date: Mon, 24 Mar 2025 21:42:35 -0700
Subject: [PATCH 1/2] Add support for CUDA surfaces

This adds support for all the surface read and write calls to clang.
It extends the pattern used for textures to surfaces too.

I tested this by generating all the various permutations of the calls
and argument types in a python script, compiling them with both clang
and nvcc, and comparing the generated ptx for equivilence.  They all
agree, ignoring register allocation, and some places where Clang does
different memory writes.  An example kernel is:

__global__ void testKernel(cudaSurfaceObject_t surfObj, int x, float2* result)
{
*result = surf1Dread(surfObj, x, cudaBoundaryModeZero);
}

Signed-off-by: Austin Schuh 
---
 .../Headers/__clang_cuda_runtime_wrapper.h|   1 +
 .../Headers/__clang_cuda_texture_intrinsics.h | 419 +-
 2 files changed, 418 insertions(+), 2 deletions(-)

diff --git a/clang/lib/Headers/__clang_cuda_runtime_wrapper.h 
b/clang/lib/Headers/__clang_cuda_runtime_wrapper.h
index d369c86fe1064..8182c961ec32f 100644
--- a/clang/lib/Headers/__clang_cuda_runtime_wrapper.h
+++ b/clang/lib/Headers/__clang_cuda_runtime_wrapper.h
@@ -386,6 +386,7 @@ __host__ __device__ void __nv_tex_surf_handler(const char 
*name, T *ptr,
 #endif // __cplusplus >= 201103L && CUDA_VERSION >= 9000
 #include "texture_fetch_functions.h"
 #include "texture_indirect_functions.h"
+#include "surface_indirect_functions.h"
 
 // Restore state of __CUDA_ARCH__ and __THROW we had on entry.
 #pragma pop_macro("__CUDA_ARCH__")
diff --git a/clang/lib/Headers/__clang_cuda_texture_intrinsics.h 
b/clang/lib/Headers/__clang_cuda_texture_intrinsics.h
index a71952211237b..2ea83f66036d4 100644
--- a/clang/lib/Headers/__clang_cuda_texture_intrinsics.h
+++ b/clang/lib/Headers/__clang_cuda_texture_intrinsics.h
@@ -28,6 +28,7 @@
 #pragma push_macro("__Args")
 #pragma push_macro("__ID")
 #pragma push_macro("__IDV")
+#pragma push_macro("__OP_TYPE_SURFACE")
 #pragma push_macro("__IMPL_2DGATHER")
 #pragma push_macro("__IMPL_ALIAS")
 #pragma push_macro("__IMPL_ALIASI")
@@ -45,6 +46,64 @@
 #pragma push_macro("__IMPL_SI")
 #pragma push_macro("__L")
 #pragma push_macro("__STRIP_PARENS")
+#pragma push_macro("__SURF_WRITE_V2")
+#pragma push_macro("__SW_ASM_ARGS")
+#pragma push_macro("__SW_ASM_ARGS1")
+#pragma push_macro("__SW_ASM_ARGS2")
+#pragma push_macro("__SW_ASM_ARGS4")
+#pragma push_macro("__SURF_WRITE_V2")
+#pragma push_macro("__SURF_READ_V2")
+#pragma push_macro("__SW_ASM_ARGS")
+#pragma push_macro("__SW_ASM_ARGS1")
+#pragma push_macro("__SW_ASM_ARGS2")
+#pragma push_macro("__SW_ASM_ARGS4")
+#pragma push_macro("__SURF_READ1D");
+#pragma push_macro("__SURF_READ2D");
+#pragma push_macro("__SURF_READ3D");
+#pragma push_macro("__SURF_READ1DLAYERED");
+#pragma push_macro("__SURF_READ2DLAYERED");
+#pragma push_macro("__SURF_READCUBEMAP");
+#pragma push_macro("__SURF_READCUBEMAPLAYERED");
+#pragma push_macro("__1DV1");
+#pragma push_macro("__1DV2");
+#pragma push_macro("__1DV4");
+#pragma push_macro("__2DV1");
+#pragma push_macro("__2DV2");
+#pragma push_macro("__2DV4");
+#pragma push_macro("__1DLAYERV1");
+#pragma push_macro("__1DLAYERV2");
+#pragma push_macro("__1DLAYERV4");
+#pragma push_macro("__3DV1");
+#pragma push_macro("__3DV2");
+#pragma push_macro("__3DV4");
+#pragma push_macro("__2DLAYERV1");
+#pragma push_macro("__2DLAYERV2");
+#pragma push_macro("__2DLAYERV4");
+#pragma push_macro("__CUBEMAPV1");
+#pragma push_macro("__CUBEMAPV2");
+#pragma push_macro("__CUBEMAPV4");
+#pragma push_macro("__CUBEMAPLAYERV1");
+#pragma push_macro("__CUBEMAPLAYERV2");
+#pragma push_macro("__CUBEMAPLAYERV4");
+#pragma push_macro("__SURF_READXD_ALL");
+#pragma push_macro("__SURF_WRITE1D_V2");
+#pragma push_macro("__SURF_WRITE1DLAYERED_V2");
+#pragma push_macro("__SURF_WRITE2D_V2");
+#pragma push_macro("__SURF_WRITE2DLAYERED_V2");
+#pragma push_macro("__SURF_WRITE3D_V2");
+#pragma push_macro("__SURF_CUBEMAPWRITE_V2");
+#pragma push_macro("__SURF_CUBEMAPLAYEREDWRITE_V2");
+#pragma push_macro("__SURF_WRITEXD_V2_ALL");
+#pragma push_macro("__1DV1");
+#pragma push_macro("__1DV2");
+#pragma push_macro("__1DV4");
+#pragma push_macro("__2DV1");
+#pragma push_macro("__2DV2");
+#pragma push_macro("__2DV4");
+#pragma push_macro("__3DV1");
+#pragma push_macro("__3DV2");
+#pragma push_macro("__3DV4");
+
 
 // Put all functions into anonymous namespace so they have internal linkage.
 // The device-only function here must be internal in order to avoid ODR
@@ -186,6 +245,20 @@ template  struct __TypeInfoT {
   using __fetch_t = typename __TypeInfoT<__base_t>::__fetch_t;
 };
 
+// Tag structs to distinguish operation types
+struct __texture_op_tag {};
+struct __surface_op_tag {};
+
+// Template specialization to determine operation type based on tag value
+template 
+struct __op_type

[clang] [llvm] cuda clang: Fix argument order for __reduce_max_sync (PR #132881)

2025-03-26 Thread Austin Schuh via cfe-commits

AustinSchuh wrote:

> @AustinSchuh would you like me to merge the change for you, once the checks 
> are done?

That would be wonderful.  I don't know how to merge it (happy to learn, but I 
suspect I won't do it more than a couple of times)

https://github.com/llvm/llvm-project/pull/132881
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-08 Thread Austin Schuh via cfe-commits



@@ -2,6 +2,170 @@
 // RUN: %clang_cc1 -triple nvptx64-unknown-unknown -fcuda-is-device -O3 -o - 
%s -emit-llvm | FileCheck %s
 #include "Inputs/cuda.h"
 
+struct char1 {

AustinSchuh wrote:

That is super helpful, thanks!  I'll get back to you soon

https://github.com/llvm/llvm-project/pull/134758
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-08 Thread Austin Schuh via cfe-commits


https://github.com/AustinSchuh updated 
https://github.com/llvm/llvm-project/pull/134758

>From 1e6367407a4b23b2a85f088ba4b66a0c0afc8faa Mon Sep 17 00:00:00 2001
From: Austin Schuh 
Date: Mon, 7 Apr 2025 17:18:38 -0700
Subject: [PATCH 1/2] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA

Signed-off-by: Austin Schuh 
---
 clang/test/CodeGen/Inputs/cuda.h  | 194 --
 .../{CodeGen => CodeGenCUDA}/nvptx-surface.cu | 164 +++
 2 files changed, 164 insertions(+), 194 deletions(-)
 delete mode 100644 clang/test/CodeGen/Inputs/cuda.h
 rename clang/test/{CodeGen => CodeGenCUDA}/nvptx-surface.cu (98%)

diff --git a/clang/test/CodeGen/Inputs/cuda.h b/clang/test/CodeGen/Inputs/cuda.h
deleted file mode 100644
index 58202442e1f8c..0
--- a/clang/test/CodeGen/Inputs/cuda.h
+++ /dev/null
@@ -1,194 +0,0 @@
-/* Minimal declarations for CUDA support.  Testing purposes only.
- * This should stay in sync with clang/test/Headers/Inputs/include/cuda.h
- */
-#pragma once
-
-// Make this file work with nvcc, for testing compatibility.
-
-#ifndef __NVCC__
-#define __constant__ __attribute__((constant))
-#define __device__ __attribute__((device))
-#define __global__ __attribute__((global))
-#define __host__ __attribute__((host))
-#define __shared__ __attribute__((shared))
-#define __managed__ __attribute__((managed))
-#define __launch_bounds__(...) __attribute__((launch_bounds(__VA_ARGS__)))
-
-struct dim3 {
-  unsigned x, y, z;
-  __host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1) : x(x), 
y(y), z(z) {}
-};
-
-// Host- and device-side placement new overloads.
-void *operator new(__SIZE_TYPE__, void *p) { return p; }
-void *operator new[](__SIZE_TYPE__, void *p) { return p; }
-__device__ void *operator new(__SIZE_TYPE__, void *p) { return p; }
-__device__ void *operator new[](__SIZE_TYPE__, void *p) { return p; }
-
-#define CUDA_VERSION 10100
-
-struct char1 {
-  char x;
-  __host__ __device__ char1(char x = 0) : x(x) {}
-};
-struct char2 {
-  char x, y;
-  __host__ __device__ char2(char x = 0, char y = 0) : x(x), y(y) {}
-};
-struct char4 {
-  char x, y, z, w;
-  __host__ __device__ char4(char x = 0, char y = 0, char z = 0, char w = 0) : 
x(x), y(y), z(z), w(w) {}
-};
-
-struct uchar1 {
-  unsigned char x;
-  __host__ __device__ uchar1(unsigned char x = 0) : x(x) {}
-};
-struct uchar2 {
-  unsigned char x, y;
-  __host__ __device__ uchar2(unsigned char x = 0, unsigned char y = 0) : x(x), 
y(y) {}
-};
-struct uchar4 {
-  unsigned char x, y, z, w;
-  __host__ __device__ uchar4(unsigned char x = 0, unsigned char y = 0, 
unsigned char z = 0, unsigned char w = 0) : x(x), y(y), z(z), w(w) {}
-};
-
-struct short1 {
-  short x;
-  __host__ __device__ short1(short x = 0) : x(x) {}
-};
-struct short2 {
-  short x, y;
-  __host__ __device__ short2(short x = 0, short y = 0) : x(x), y(y) {}
-};
-struct short4 {
-  short x, y, z, w;
-  __host__ __device__ short4(short x = 0, short y = 0, short z = 0, short w = 
0) : x(x), y(y), z(z), w(w) {}
-};
-
-struct ushort1 {
-  unsigned short x;
-  __host__ __device__ ushort1(unsigned short x = 0) : x(x) {}
-};
-struct ushort2 {
-  unsigned short x, y;
-  __host__ __device__ ushort2(unsigned short x = 0, unsigned short y = 0) : 
x(x), y(y) {}
-};
-struct ushort4 {
-  unsigned short x, y, z, w;
-  __host__ __device__ ushort4(unsigned short x = 0, unsigned short y = 0, 
unsigned short z = 0, unsigned short w = 0) : x(x), y(y), z(z), w(w) {}
-};
-
-struct int1 {
-  int x;
-  __host__ __device__ int1(int x = 0) : x(x) {}
-};
-struct int2 {
-  int x, y;
-  __host__ __device__ int2(int x = 0, int y = 0) : x(x), y(y) {}
-};
-struct int4 {
-  int x, y, z, w;
-  __host__ __device__ int4(int x = 0, int y = 0, int z = 0, int w = 0) : x(x), 
y(y), z(z), w(w) {}
-};
-
-struct uint1 {
-  unsigned x;
-  __host__ __device__ uint1(unsigned x = 0) : x(x) {}
-};
-struct uint2 {
-  unsigned x, y;
-  __host__ __device__ uint2(unsigned x = 0, unsigned y = 0) : x(x), y(y) {}
-};
-struct uint3 {
-  unsigned x, y, z;
-  __host__ __device__ uint3(unsigned x = 0, unsigned y = 0, unsigned z = 0) : 
x(x), y(y), z(z) {}
-};
-struct uint4 {
-  unsigned x, y, z, w;
-  __host__ __device__ uint4(unsigned x = 0, unsigned y = 0, unsigned z = 0, 
unsigned w = 0) : x(x), y(y), z(z), w(w) {}
-};
-
-struct longlong1 {
-  long long x;
-  __host__ __device__ longlong1(long long x = 0) : x(x) {}
-};
-struct longlong2 {
-  long long x, y;
-  __host__ __device__ longlong2(long long x = 0, long long y = 0) : x(x), y(y) 
{}
-};
-struct longlong4 {
-  long long x, y, z, w;
-  __host__ __device__ longlong4(long long x = 0, long long y = 0, long long z 
= 0, long long w = 0) : x(x), y(y), z(z), w(w) {}
-};
-
-struct ulonglong1 {
-  unsigned long long x;
-  __host__ __device__ ulonglong1(unsigned long long x = 0) : x(x) {}
-};
-struct ulonglong2 {
-  unsigned long long x, y;
-  __host__ __device__ ulonglong2(unsigned long long x = 0, unsigned long long 
y = 0) : x(x), y(y) {}
-};
-s

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-08 Thread Austin Schuh via cfe-commits



@@ -2,6 +2,170 @@
 // RUN: %clang_cc1 -triple nvptx64-unknown-unknown -fcuda-is-device -O3 -o - 
%s -emit-llvm | FileCheck %s
 #include "Inputs/cuda.h"
 
+struct char1 {

AustinSchuh wrote:

Some of it was around one of the tests trying to include cuda.h inside an 
`extern "C"` section.

https://github.com/llvm/llvm-project/pull/134758
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-08 Thread Austin Schuh via cfe-commits



@@ -2,6 +2,170 @@
 // RUN: %clang_cc1 -triple nvptx64-unknown-unknown -fcuda-is-device -O3 -o - 
%s -emit-llvm | FileCheck %s
 #include "Inputs/cuda.h"
 
+struct char1 {

AustinSchuh wrote:

OK, newest patch should do what you suggested, thanks for the help!

https://github.com/llvm/llvm-project/pull/134758
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-08 Thread Austin Schuh via cfe-commits


https://github.com/AustinSchuh updated 
https://github.com/llvm/llvm-project/pull/134758

>From 1e6367407a4b23b2a85f088ba4b66a0c0afc8faa Mon Sep 17 00:00:00 2001
From: Austin Schuh 
Date: Mon, 7 Apr 2025 17:18:38 -0700
Subject: [PATCH 1/2] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA

Signed-off-by: Austin Schuh 
---
 clang/test/CodeGen/Inputs/cuda.h  | 194 --
 .../{CodeGen => CodeGenCUDA}/nvptx-surface.cu | 164 +++
 2 files changed, 164 insertions(+), 194 deletions(-)
 delete mode 100644 clang/test/CodeGen/Inputs/cuda.h
 rename clang/test/{CodeGen => CodeGenCUDA}/nvptx-surface.cu (98%)

diff --git a/clang/test/CodeGen/Inputs/cuda.h b/clang/test/CodeGen/Inputs/cuda.h
deleted file mode 100644
index 58202442e1f8c..0
--- a/clang/test/CodeGen/Inputs/cuda.h
+++ /dev/null
@@ -1,194 +0,0 @@
-/* Minimal declarations for CUDA support.  Testing purposes only.
- * This should stay in sync with clang/test/Headers/Inputs/include/cuda.h
- */
-#pragma once
-
-// Make this file work with nvcc, for testing compatibility.
-
-#ifndef __NVCC__
-#define __constant__ __attribute__((constant))
-#define __device__ __attribute__((device))
-#define __global__ __attribute__((global))
-#define __host__ __attribute__((host))
-#define __shared__ __attribute__((shared))
-#define __managed__ __attribute__((managed))
-#define __launch_bounds__(...) __attribute__((launch_bounds(__VA_ARGS__)))
-
-struct dim3 {
-  unsigned x, y, z;
-  __host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1) : x(x), 
y(y), z(z) {}
-};
-
-// Host- and device-side placement new overloads.
-void *operator new(__SIZE_TYPE__, void *p) { return p; }
-void *operator new[](__SIZE_TYPE__, void *p) { return p; }
-__device__ void *operator new(__SIZE_TYPE__, void *p) { return p; }
-__device__ void *operator new[](__SIZE_TYPE__, void *p) { return p; }
-
-#define CUDA_VERSION 10100
-
-struct char1 {
-  char x;
-  __host__ __device__ char1(char x = 0) : x(x) {}
-};
-struct char2 {
-  char x, y;
-  __host__ __device__ char2(char x = 0, char y = 0) : x(x), y(y) {}
-};
-struct char4 {
-  char x, y, z, w;
-  __host__ __device__ char4(char x = 0, char y = 0, char z = 0, char w = 0) : 
x(x), y(y), z(z), w(w) {}
-};
-
-struct uchar1 {
-  unsigned char x;
-  __host__ __device__ uchar1(unsigned char x = 0) : x(x) {}
-};
-struct uchar2 {
-  unsigned char x, y;
-  __host__ __device__ uchar2(unsigned char x = 0, unsigned char y = 0) : x(x), 
y(y) {}
-};
-struct uchar4 {
-  unsigned char x, y, z, w;
-  __host__ __device__ uchar4(unsigned char x = 0, unsigned char y = 0, 
unsigned char z = 0, unsigned char w = 0) : x(x), y(y), z(z), w(w) {}
-};
-
-struct short1 {
-  short x;
-  __host__ __device__ short1(short x = 0) : x(x) {}
-};
-struct short2 {
-  short x, y;
-  __host__ __device__ short2(short x = 0, short y = 0) : x(x), y(y) {}
-};
-struct short4 {
-  short x, y, z, w;
-  __host__ __device__ short4(short x = 0, short y = 0, short z = 0, short w = 
0) : x(x), y(y), z(z), w(w) {}
-};
-
-struct ushort1 {
-  unsigned short x;
-  __host__ __device__ ushort1(unsigned short x = 0) : x(x) {}
-};
-struct ushort2 {
-  unsigned short x, y;
-  __host__ __device__ ushort2(unsigned short x = 0, unsigned short y = 0) : 
x(x), y(y) {}
-};
-struct ushort4 {
-  unsigned short x, y, z, w;
-  __host__ __device__ ushort4(unsigned short x = 0, unsigned short y = 0, 
unsigned short z = 0, unsigned short w = 0) : x(x), y(y), z(z), w(w) {}
-};
-
-struct int1 {
-  int x;
-  __host__ __device__ int1(int x = 0) : x(x) {}
-};
-struct int2 {
-  int x, y;
-  __host__ __device__ int2(int x = 0, int y = 0) : x(x), y(y) {}
-};
-struct int4 {
-  int x, y, z, w;
-  __host__ __device__ int4(int x = 0, int y = 0, int z = 0, int w = 0) : x(x), 
y(y), z(z), w(w) {}
-};
-
-struct uint1 {
-  unsigned x;
-  __host__ __device__ uint1(unsigned x = 0) : x(x) {}
-};
-struct uint2 {
-  unsigned x, y;
-  __host__ __device__ uint2(unsigned x = 0, unsigned y = 0) : x(x), y(y) {}
-};
-struct uint3 {
-  unsigned x, y, z;
-  __host__ __device__ uint3(unsigned x = 0, unsigned y = 0, unsigned z = 0) : 
x(x), y(y), z(z) {}
-};
-struct uint4 {
-  unsigned x, y, z, w;
-  __host__ __device__ uint4(unsigned x = 0, unsigned y = 0, unsigned z = 0, 
unsigned w = 0) : x(x), y(y), z(z), w(w) {}
-};
-
-struct longlong1 {
-  long long x;
-  __host__ __device__ longlong1(long long x = 0) : x(x) {}
-};
-struct longlong2 {
-  long long x, y;
-  __host__ __device__ longlong2(long long x = 0, long long y = 0) : x(x), y(y) 
{}
-};
-struct longlong4 {
-  long long x, y, z, w;
-  __host__ __device__ longlong4(long long x = 0, long long y = 0, long long z 
= 0, long long w = 0) : x(x), y(y), z(z), w(w) {}
-};
-
-struct ulonglong1 {
-  unsigned long long x;
-  __host__ __device__ ulonglong1(unsigned long long x = 0) : x(x) {}
-};
-struct ulonglong2 {
-  unsigned long long x, y;
-  __host__ __device__ ulonglong2(unsigned long long x = 0, unsigned long long 
y = 0) : x(x), y(y) {}
-};
-s

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-21 Thread Austin Schuh via cfe-commits


https://github.com/AustinSchuh updated 
https://github.com/llvm/llvm-project/pull/134758

>From 1e6367407a4b23b2a85f088ba4b66a0c0afc8faa Mon Sep 17 00:00:00 2001
From: Austin Schuh 
Date: Mon, 7 Apr 2025 17:18:38 -0700
Subject: [PATCH 1/2] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA

Signed-off-by: Austin Schuh 
---
 clang/test/CodeGen/Inputs/cuda.h  | 194 --
 .../{CodeGen => CodeGenCUDA}/nvptx-surface.cu | 164 +++
 2 files changed, 164 insertions(+), 194 deletions(-)
 delete mode 100644 clang/test/CodeGen/Inputs/cuda.h
 rename clang/test/{CodeGen => CodeGenCUDA}/nvptx-surface.cu (98%)

diff --git a/clang/test/CodeGen/Inputs/cuda.h b/clang/test/CodeGen/Inputs/cuda.h
deleted file mode 100644
index 58202442e1f8c..0
--- a/clang/test/CodeGen/Inputs/cuda.h
+++ /dev/null
@@ -1,194 +0,0 @@
-/* Minimal declarations for CUDA support.  Testing purposes only.
- * This should stay in sync with clang/test/Headers/Inputs/include/cuda.h
- */
-#pragma once
-
-// Make this file work with nvcc, for testing compatibility.
-
-#ifndef __NVCC__
-#define __constant__ __attribute__((constant))
-#define __device__ __attribute__((device))
-#define __global__ __attribute__((global))
-#define __host__ __attribute__((host))
-#define __shared__ __attribute__((shared))
-#define __managed__ __attribute__((managed))
-#define __launch_bounds__(...) __attribute__((launch_bounds(__VA_ARGS__)))
-
-struct dim3 {
-  unsigned x, y, z;
-  __host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1) : x(x), 
y(y), z(z) {}
-};
-
-// Host- and device-side placement new overloads.
-void *operator new(__SIZE_TYPE__, void *p) { return p; }
-void *operator new[](__SIZE_TYPE__, void *p) { return p; }
-__device__ void *operator new(__SIZE_TYPE__, void *p) { return p; }
-__device__ void *operator new[](__SIZE_TYPE__, void *p) { return p; }
-
-#define CUDA_VERSION 10100
-
-struct char1 {
-  char x;
-  __host__ __device__ char1(char x = 0) : x(x) {}
-};
-struct char2 {
-  char x, y;
-  __host__ __device__ char2(char x = 0, char y = 0) : x(x), y(y) {}
-};
-struct char4 {
-  char x, y, z, w;
-  __host__ __device__ char4(char x = 0, char y = 0, char z = 0, char w = 0) : 
x(x), y(y), z(z), w(w) {}
-};
-
-struct uchar1 {
-  unsigned char x;
-  __host__ __device__ uchar1(unsigned char x = 0) : x(x) {}
-};
-struct uchar2 {
-  unsigned char x, y;
-  __host__ __device__ uchar2(unsigned char x = 0, unsigned char y = 0) : x(x), 
y(y) {}
-};
-struct uchar4 {
-  unsigned char x, y, z, w;
-  __host__ __device__ uchar4(unsigned char x = 0, unsigned char y = 0, 
unsigned char z = 0, unsigned char w = 0) : x(x), y(y), z(z), w(w) {}
-};
-
-struct short1 {
-  short x;
-  __host__ __device__ short1(short x = 0) : x(x) {}
-};
-struct short2 {
-  short x, y;
-  __host__ __device__ short2(short x = 0, short y = 0) : x(x), y(y) {}
-};
-struct short4 {
-  short x, y, z, w;
-  __host__ __device__ short4(short x = 0, short y = 0, short z = 0, short w = 
0) : x(x), y(y), z(z), w(w) {}
-};
-
-struct ushort1 {
-  unsigned short x;
-  __host__ __device__ ushort1(unsigned short x = 0) : x(x) {}
-};
-struct ushort2 {
-  unsigned short x, y;
-  __host__ __device__ ushort2(unsigned short x = 0, unsigned short y = 0) : 
x(x), y(y) {}
-};
-struct ushort4 {
-  unsigned short x, y, z, w;
-  __host__ __device__ ushort4(unsigned short x = 0, unsigned short y = 0, 
unsigned short z = 0, unsigned short w = 0) : x(x), y(y), z(z), w(w) {}
-};
-
-struct int1 {
-  int x;
-  __host__ __device__ int1(int x = 0) : x(x) {}
-};
-struct int2 {
-  int x, y;
-  __host__ __device__ int2(int x = 0, int y = 0) : x(x), y(y) {}
-};
-struct int4 {
-  int x, y, z, w;
-  __host__ __device__ int4(int x = 0, int y = 0, int z = 0, int w = 0) : x(x), 
y(y), z(z), w(w) {}
-};
-
-struct uint1 {
-  unsigned x;
-  __host__ __device__ uint1(unsigned x = 0) : x(x) {}
-};
-struct uint2 {
-  unsigned x, y;
-  __host__ __device__ uint2(unsigned x = 0, unsigned y = 0) : x(x), y(y) {}
-};
-struct uint3 {
-  unsigned x, y, z;
-  __host__ __device__ uint3(unsigned x = 0, unsigned y = 0, unsigned z = 0) : 
x(x), y(y), z(z) {}
-};
-struct uint4 {
-  unsigned x, y, z, w;
-  __host__ __device__ uint4(unsigned x = 0, unsigned y = 0, unsigned z = 0, 
unsigned w = 0) : x(x), y(y), z(z), w(w) {}
-};
-
-struct longlong1 {
-  long long x;
-  __host__ __device__ longlong1(long long x = 0) : x(x) {}
-};
-struct longlong2 {
-  long long x, y;
-  __host__ __device__ longlong2(long long x = 0, long long y = 0) : x(x), y(y) 
{}
-};
-struct longlong4 {
-  long long x, y, z, w;
-  __host__ __device__ longlong4(long long x = 0, long long y = 0, long long z 
= 0, long long w = 0) : x(x), y(y), z(z), w(w) {}
-};
-
-struct ulonglong1 {
-  unsigned long long x;
-  __host__ __device__ ulonglong1(unsigned long long x = 0) : x(x) {}
-};
-struct ulonglong2 {
-  unsigned long long x, y;
-  __host__ __device__ ulonglong2(unsigned long long x = 0, unsigned long long 
y = 0) : x(x), y(y) {}
-};
-s

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-21 Thread Austin Schuh via cfe-commits


AustinSchuh wrote:

@Artem-B , 2 things.

1) Looks like I was too slow and had to fix merge conflicts.  Please take 
another look.
2) Can you submit?  I don't have submit permission yet, and realistically, 
unless you've got more requests for improvements to surface support, odds are 
low I'll be making enough changes going forward to justify it.

https://github.com/llvm/llvm-project/pull/134758
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-10 Thread Austin Schuh via cfe-commits



@@ -2,6 +2,170 @@
 // RUN: %clang_cc1 -triple nvptx64-unknown-unknown -fcuda-is-device -O3 -o - 
%s -emit-llvm | FileCheck %s
 #include "Inputs/cuda.h"
 
+struct char1 {

AustinSchuh wrote:

I tried that round 1 of this review, and it went quite poorly:  
https://buildkite.com/llvm-project/github-pull-requests/builds/166089  I was 
not fully sure what the existing tests were testing well enough to feel 
comfortable changing them, and why this would make them fail.  I can keep 
poking, but would love any hints if you see anything.

https://github.com/llvm/llvm-project/pull/134758
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-03-29 Thread Austin Schuh via cfe-commits


AustinSchuh wrote:

> If the header is compileable without CUDA SDK (maybe with some stub headers 
> in tests Inputs), then a source file with a lot of builtin calls and 
> [autogenerated 
> checks](https://github.com/llvm/llvm-project/blob/d724bab8064685c98cdded88157b6e2245e0d7bc/llvm/utils/update_cc_test_checks.py#L2)
>  to verify produced instructions would do.

That wasn't too bad, done!

https://github.com/llvm/llvm-project/pull/132883
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-04-03 Thread Austin Schuh via cfe-commits



@@ -0,0 +1,3329 @@
+// RUN: %clang_cc1 -triple nvptx-unknown-unknown -fcuda-is-device -O3 -o - %s 
-emit-llvm | FileCheck %s
+// RUN: %clang_cc1 -triple nvptx64-unknown-unknown -fcuda-is-device -O3 -o - 
%s -emit-llvm | FileCheck %s
+#include "../Headers/Inputs/include/cuda.h"

AustinSchuh wrote:

Any suggestions for how to best not duplicate it?  I could copy it over, but 
that feels worse.

https://github.com/llvm/llvm-project/pull/132883
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-03-24 Thread Austin Schuh via cfe-commits


https://github.com/AustinSchuh edited 
https://github.com/llvm/llvm-project/pull/132883
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Clean up test dependency for CUDA surfaces (PR #134459)

2025-04-04 Thread Austin Schuh via cfe-commits


https://github.com/AustinSchuh created 
https://github.com/llvm/llvm-project/pull/134459

https://github.com/llvm/llvm-project/pull/132883 added support for cuda 
surfaces but reached into clang/test/Headers/ from clang/test/CodeGen/ to grab 
the minimal cuda.h.  Duplicate that file instead based on comments in the 
review, to fix remote test runs.

>From 52d4f06540761507c4d81cead1278bff783fc54d Mon Sep 17 00:00:00 2001
From: Austin Schuh 
Date: Fri, 4 Apr 2025 15:23:15 -0700
Subject: [PATCH] cuda clang: Clean up test dependency for CUDA surfaces

https://github.com/llvm/llvm-project/pull/132883 added support for cuda
surfaces but reached into clang/test/Headers/ from clang/test/CodeGen/
to grab the minimal cuda.h.  Duplicate that file instead based on
comments in the review, to fix remote test runs.

Signed-off-by: Austin Schuh 
---
 clang/test/CodeGen/include/cuda.h| 194 +++
 clang/test/CodeGen/nvptx-surface.cu  |   2 +-
 clang/test/Headers/Inputs/include/cuda.h |   4 +-
 3 files changed, 198 insertions(+), 2 deletions(-)
 create mode 100644 clang/test/CodeGen/include/cuda.h

diff --git a/clang/test/CodeGen/include/cuda.h 
b/clang/test/CodeGen/include/cuda.h
new file mode 100644
index 0..58202442e1f8c
--- /dev/null
+++ b/clang/test/CodeGen/include/cuda.h
@@ -0,0 +1,194 @@
+/* Minimal declarations for CUDA support.  Testing purposes only.
+ * This should stay in sync with clang/test/Headers/Inputs/include/cuda.h
+ */
+#pragma once
+
+// Make this file work with nvcc, for testing compatibility.
+
+#ifndef __NVCC__
+#define __constant__ __attribute__((constant))
+#define __device__ __attribute__((device))
+#define __global__ __attribute__((global))
+#define __host__ __attribute__((host))
+#define __shared__ __attribute__((shared))
+#define __managed__ __attribute__((managed))
+#define __launch_bounds__(...) __attribute__((launch_bounds(__VA_ARGS__)))
+
+struct dim3 {
+  unsigned x, y, z;
+  __host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1) : x(x), 
y(y), z(z) {}
+};
+
+// Host- and device-side placement new overloads.
+void *operator new(__SIZE_TYPE__, void *p) { return p; }
+void *operator new[](__SIZE_TYPE__, void *p) { return p; }
+__device__ void *operator new(__SIZE_TYPE__, void *p) { return p; }
+__device__ void *operator new[](__SIZE_TYPE__, void *p) { return p; }
+
+#define CUDA_VERSION 10100
+
+struct char1 {
+  char x;
+  __host__ __device__ char1(char x = 0) : x(x) {}
+};
+struct char2 {
+  char x, y;
+  __host__ __device__ char2(char x = 0, char y = 0) : x(x), y(y) {}
+};
+struct char4 {
+  char x, y, z, w;
+  __host__ __device__ char4(char x = 0, char y = 0, char z = 0, char w = 0) : 
x(x), y(y), z(z), w(w) {}
+};
+
+struct uchar1 {
+  unsigned char x;
+  __host__ __device__ uchar1(unsigned char x = 0) : x(x) {}
+};
+struct uchar2 {
+  unsigned char x, y;
+  __host__ __device__ uchar2(unsigned char x = 0, unsigned char y = 0) : x(x), 
y(y) {}
+};
+struct uchar4 {
+  unsigned char x, y, z, w;
+  __host__ __device__ uchar4(unsigned char x = 0, unsigned char y = 0, 
unsigned char z = 0, unsigned char w = 0) : x(x), y(y), z(z), w(w) {}
+};
+
+struct short1 {
+  short x;
+  __host__ __device__ short1(short x = 0) : x(x) {}
+};
+struct short2 {
+  short x, y;
+  __host__ __device__ short2(short x = 0, short y = 0) : x(x), y(y) {}
+};
+struct short4 {
+  short x, y, z, w;
+  __host__ __device__ short4(short x = 0, short y = 0, short z = 0, short w = 
0) : x(x), y(y), z(z), w(w) {}
+};
+
+struct ushort1 {
+  unsigned short x;
+  __host__ __device__ ushort1(unsigned short x = 0) : x(x) {}
+};
+struct ushort2 {
+  unsigned short x, y;
+  __host__ __device__ ushort2(unsigned short x = 0, unsigned short y = 0) : 
x(x), y(y) {}
+};
+struct ushort4 {
+  unsigned short x, y, z, w;
+  __host__ __device__ ushort4(unsigned short x = 0, unsigned short y = 0, 
unsigned short z = 0, unsigned short w = 0) : x(x), y(y), z(z), w(w) {}
+};
+
+struct int1 {
+  int x;
+  __host__ __device__ int1(int x = 0) : x(x) {}
+};
+struct int2 {
+  int x, y;
+  __host__ __device__ int2(int x = 0, int y = 0) : x(x), y(y) {}
+};
+struct int4 {
+  int x, y, z, w;
+  __host__ __device__ int4(int x = 0, int y = 0, int z = 0, int w = 0) : x(x), 
y(y), z(z), w(w) {}
+};
+
+struct uint1 {
+  unsigned x;
+  __host__ __device__ uint1(unsigned x = 0) : x(x) {}
+};
+struct uint2 {
+  unsigned x, y;
+  __host__ __device__ uint2(unsigned x = 0, unsigned y = 0) : x(x), y(y) {}
+};
+struct uint3 {
+  unsigned x, y, z;
+  __host__ __device__ uint3(unsigned x = 0, unsigned y = 0, unsigned z = 0) : 
x(x), y(y), z(z) {}
+};
+struct uint4 {
+  unsigned x, y, z, w;
+  __host__ __device__ uint4(unsigned x = 0, unsigned y = 0, unsigned z = 0, 
unsigned w = 0) : x(x), y(y), z(z), w(w) {}
+};
+
+struct longlong1 {
+  long long x;
+  __host__ __device__ longlong1(long long x = 0) : x(x) {}
+};
+struct longlong2 {
+  long long x, y;
+  __host__ __device__ longlong2(long long x = 0, long long y =

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-04-04 Thread Austin Schuh via cfe-commits



@@ -0,0 +1,3329 @@
+// RUN: %clang_cc1 -triple nvptx-unknown-unknown -fcuda-is-device -O3 -o - %s 
-emit-llvm | FileCheck %s
+// RUN: %clang_cc1 -triple nvptx64-unknown-unknown -fcuda-is-device -O3 -o - 
%s -emit-llvm | FileCheck %s
+#include "../Headers/Inputs/include/cuda.h"

AustinSchuh wrote:

https://github.com/llvm/llvm-project/pull/134459 has the fixup

https://github.com/llvm/llvm-project/pull/132883
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Clean up test dependency for CUDA surfaces (PR #134459)

2025-04-04 Thread Austin Schuh via cfe-commits


AustinSchuh wrote:

@slackito and @Artem-B , hopefully this addresses your feedback!

https://github.com/llvm/llvm-project/pull/134459
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

[clang] [llvm] cuda clang: Fix argument order for __reduce_max_sync (PR #132881)

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

[clang] [llvm] cuda clang: Fix argument order for __reduce_max_sync (PR #132881)

[clang] [llvm] cuda clang: Fix argument order for __reduce_max_sync (PR #132881)

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

[clang] [llvm] cuda clang: Fix argument order for __reduce_max_sync (PR #132881)

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

[clang] cuda clang: Clean up test dependency for CUDA surfaces (PR #134459)

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

[clang] cuda clang: Clean up test dependency for CUDA surfaces (PR #134459)

24 matches

Site Navigation

Mail list logo

Footer information