================
@@ -754,13 +754,21 @@ define amdgpu_kernel void 
@constant_load_v16i16_align2(ptr addrspace(4) %ptr0) #
 ; GFX12-NEXT:    global_load_u16 v6, v8, s[0:1] offset:8
 ; GFX12-NEXT:    global_load_u16 v5, v8, s[0:1] offset:4
 ; GFX12-NEXT:    global_load_u16 v4, v8, s[0:1]
+; GFX12-NEXT:    s_wait_loadcnt 0x7
----------------
jayfoad wrote:

This wait is required to ensure that the global_load_u16 on line 749 writes to 
v3 before the global_load_d16_hi_b16 on line 758.

https://github.com/llvm/llvm-project/pull/105549
_______________________________________________
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

Reply via email to