https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106772

--- Comment #22 from Thomas Rodgers <rodgertq at gcc dot gnu.org> ---
(In reply to Mkkt Bkkt from comment #20)
> My main concern with this optimization it's not zero-overhead.
> 
> It's not necessary when we expect we have some waiters, in that case it just
> additional synchronization and contention in waiter pool (that have small
> fixed size, just imagine system with 100+ cores, if we have > 16 waiting
> threads some of them make fetch_add/sub on the same atomic, that can be
> expensive, especially on numa)
> 
> And at the same time, I don't understand when I need to notify and cannot
> know notification not needed.
> I don't understand when it useful.

You are correct, it is not zero overhead. It also isn't clear what those
overheads are, either. As I noted in comment #21, there is no test over a
variety of workloads to inform this discussion, either.

Your example of '100+ core' systems especially on NUMA is certainly a valid
one. I would ask, at what point do those collisions and the resulting cache
invalidation traffic swamp the cost of just making the syscall? I do plan to
put these tests together, because there is another algorithm that I am
exploring, that I believe will reduce the likelihood of spurious wakeups, and
achieves the same result as this particular approach, without generating the
same invalidation traffic. At this point, I don't anticipate doing that work
until after GCC13 stage1 closes.

Reply via email to