================ @@ -1809,6 +1816,23 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML, return HasVMemLoad && UsesVgprLoadedOutside; } +bool SIInsertWaitcnts::insertWaitcntAfterMemOp(MachineFunction &MF) { + bool Modified = false; + + for (auto &MBB : MF) { ---------------- jwanggit86 wrote:
Although they both insert s_waitcnt instructions, the new feature is quite different from the existing SIInsertWaitcnt pass. The new feature, controlled by a command-line option, inserts a "s_waitcnt 0" after each memory instruction. The logic therefore is very simple. The existing pass, however, has more complicated logic implemented with essentially a static analysis aided by its own data structures, which are not necessary for the new feature. >From the performance point of view, it should be noted that by default this >feature is not activated. Therefore, extra overhead should be minimized for >the normal use-case scenario. A separate pass achieves this b/c there is only >one extra IF for each compiled function. On the other hand, integrating with >the existing pass would mean many more checks for the feature activation, >which are waste in the normal case when the feature is not activated. With the above 2 points, I think a separate pass is advantageous over an integrated pass. Pls let me know your thoughts. https://github.com/llvm/llvm-project/pull/68932 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits