================ @@ -669,6 +679,7 @@ define amdgpu_kernel void @global_volatile_store_1( ; GFX12-WGP-NEXT: s_wait_kmcnt 0x0 ; GFX12-WGP-NEXT: s_wait_storecnt 0x0 ; GFX12-WGP-NEXT: global_store_b32 v0, v1, s[0:1] scope:SCOPE_SYS +; GFX12-WGP-NEXT: s_wait_loadcnt 0x3f ---------------- ssahasra wrote:
Not directly related to this discussion, but this line does exist: ``` 1390 // Merge consecutive waitcnt of the same type by erasing multiples. 1391 if (WaitcntInstr || (!Wait.hasWaitExceptStoreCnt() && TrySimplify)) { ``` It is meant to preserver S_WAITCNT_soft even if there is no actual wait required. @jayfoad , you had introduced `TrySimplify` ... do you think it is okay to relax its uses? ``` 1373 if (TrySimplify **|| (Opcode != II.getOpcode() && OldWait.hasValuesSetToMax()**) 1374 ScoreBrackets.simplifyWaitcnt(OldWait); ``` Here, `hasValuesSetToMax()` is a hypothetical function that checks the encoding of each count separately to have all bits set to 1, and not just a ~0 in the data structure. https://github.com/llvm/llvm-project/pull/147257 _______________________________________________ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits