================
@@ -216,24 +214,23 @@ define dso_local ptx_kernel void @escape_ptr_store(ptr
nocapture noundef writeon
;
; PTX-LABEL: escape_ptr_store(
; PTX: {
-; PTX-NEXT: .local .align 4 .b8 __local_depot4[8];
+; PTX-NEXT: .local .align 8 .b8 __local_depot4[8];
----------------
thetheodor wrote:
Previously:
```
; PTX-NEXT: add.u64 %rd4, %SPL, 0;
; PTX-NEXT: ld.param.b32 %r1, [escape_ptr_store_param_1+4];
; PTX-NEXT: st.local.b32 [%rd4+4], %r1;
; PTX-NEXT: ld.param.b32 %r2, [escape_ptr_store_param_1];
; PTX-NEXT: st.local.b32 [%rd4], %r2;
```
with this change:
```
; PTX-NEXT: ld.param.b32 %rd5, [escape_ptr_store_param_1+4];
; PTX-NEXT: shl.b64 %rd6, %rd5, 32;
; PTX-NEXT: ld.param.b32 %rd7, [escape_ptr_store_param_1];
; PTX-NEXT: or.b64 %rd8, %rd6, %rd7;
; PTX-NEXT: st.local.b64 [%SPL], %rd8;
```
We replaced two 32-bit stores with one 64-store. Which I guess is increasing
the alignment requirements
https://github.com/llvm/llvm-project/pull/154814
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits