https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106772
--- Comment #22 from Thomas Rodgers <rodgertq at gcc dot gnu.org> --- (In reply to Mkkt Bkkt from comment #20) > My main concern with this optimization it's not zero-overhead. > > It's not necessary when we expect we have some waiters, in that case it just > additional synchronization and contention in waiter pool (that have small > fixed size, just imagine system with 100+ cores, if we have > 16 waiting > threads some of them make fetch_add/sub on the same atomic, that can be > expensive, especially on numa) > > And at the same time, I don't understand when I need to notify and cannot > know notification not needed. > I don't understand when it useful. You are correct, it is not zero overhead. It also isn't clear what those overheads are, either. As I noted in comment #21, there is no test over a variety of workloads to inform this discussion, either. Your example of '100+ core' systems especially on NUMA is certainly a valid one. I would ask, at what point do those collisions and the resulting cache invalidation traffic swamp the cost of just making the syscall? I do plan to put these tests together, because there is another algorithm that I am exploring, that I believe will reduce the likelihood of spurious wakeups, and achieves the same result as this particular approach, without generating the same invalidation traffic. At this point, I don't anticipate doing that work until after GCC13 stage1 closes.