================
@@ -216,24 +214,23 @@ define dso_local ptx_kernel void @escape_ptr_store(ptr 
nocapture noundef writeon
 ;
 ; PTX-LABEL: escape_ptr_store(
 ; PTX:       {
-; PTX-NEXT:    .local .align 4 .b8 __local_depot4[8];
+; PTX-NEXT:    .local .align 8 .b8 __local_depot4[8];
----------------
thetheodor wrote:

Previously:
```
; PTX-NEXT:    add.u64 %rd4, %SPL, 0;
; PTX-NEXT:    ld.param.b32 %r1, [escape_ptr_store_param_1+4];
; PTX-NEXT:    st.local.b32 [%rd4+4], %r1;
; PTX-NEXT:    ld.param.b32 %r2, [escape_ptr_store_param_1];
; PTX-NEXT:    st.local.b32 [%rd4], %r2;
```
with this change:
```
; PTX-NEXT:    ld.param.b32 %rd5, [escape_ptr_store_param_1+4];
; PTX-NEXT:    shl.b64 %rd6, %rd5, 32;
; PTX-NEXT:    ld.param.b32 %rd7, [escape_ptr_store_param_1];
; PTX-NEXT:    or.b64 %rd8, %rd6, %rd7;
; PTX-NEXT:    st.local.b64 [%SPL], %rd8;
```

We replaced two 32-bit stores with one 64-store. Which I guess is increasing 
the alignment requirements

https://github.com/llvm/llvm-project/pull/154814
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to