https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106772

--- Comment #24 from Mkkt Bkkt <valera.mironow at gmail dot com> ---
(In reply to Thomas Rodgers from comment #22)
> Your example of '100+ core' systems especially on NUMA is certainly a valid
> one. I would ask, at what point do those collisions and the resulting cache
> invalidation traffic swamp the cost of just making the syscall? I do plan to
> put these tests together, because there is another algorithm that I am
> exploring, that I believe will reduce the likelihood of spurious wakeups,
> and achieves the same result as this particular approach, without generating
> the same invalidation traffic. At this point, I don't anticipate doing that
> work until after GCC13 stage1 closes.

I try to explain: 

syscall overhead is some constant commonly like 10-30ns (futex syscall can be
more expensive like 100ns in your example)

But numbers of cores are grow, arm also makes more popular (fetch_add/sub have
more cost on it compares to x86)
And people already faced with situation where fetch_add have a bigger cost than
syscall overhead:

https://pkolaczk.github.io/server-slower-than-a-laptop/
https://travisdowns.github.io/blog/2020/07/06/concurrency-costs.html

I don't think we will faced with problem like in these links in
atomic::wait/notify in real code, but I'm pretty sure in some cases it can be
more expansive than syscall part of atomic::wait/notify

Of course better to prove it, maybe someday I will do it :(

Reply via email to