[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

Jun Wang via cfe-commits Wed, 08 Nov 2023 11:29:14 -0800

================
@@ -1809,6 +1816,23 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML,
   return HasVMemLoad && UsesVgprLoadedOutside;
 }
 
+bool SIInsertWaitcnts::insertWaitcntAfterMemOp(MachineFunction &MF) {
+  bool Modified = false;
+
+  for (auto &MBB : MF) {
----------------
jwanggit86 wrote:


Although they both insert s_waitcnt instructions, the new feature is quite 
different from the existing SIInsertWaitcnt pass. The new feature, controlled 
by a command-line option, inserts a "s_waitcnt 0" after each memory 
instruction. The logic therefore is very simple. The existing pass, however, 
has more complicated logic implemented with essentially a static analysis aided 
by its own data structures, which are not necessary for the new feature.

>From the performance point of view, it should be noted that by default this 
>feature is not activated. Therefore, extra overhead should be minimized for 
>the normal use-case scenario. A separate pass achieves this b/c there is only 
>one extra IF for each compiled function. On the other hand, integrating with 
>the existing pass would mean many more checks for the feature activation, 
>which are waste in the normal case when the feature is not activated.

With the above 2 points, I think a separate pass is advantageous over an 
integrated pass. Pls let me know your thoughts.

https://github.com/llvm/llvm-project/pull/68932
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

Reply via email to