[Bug libstdc++/106772] atomic::wait shouldn't touch waiter pool if used platform wait

rodgertq at gcc dot gnu.org via Gcc-bugs Wed, 28 Sep 2022 15:34:27 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106772


--- Comment #21 from Thomas Rodgers <rodgertq at gcc dot gnu.org> ---
(In reply to Mkkt Bkkt from comment #16)
> > it with sarcasm
> 
> I started with sarcasm because you restart this thread with some doubtful
> benchmarks without code for them.
> 
> I think it's very disrespectfully.

I wasn't sarcastic in what I posted. As I noted, this question has come up
before in different contexts, Bugzilla is a useful historical archive, so
updating this issue with my reasoning and a bit of data was primarily a capture
task.

So, let's try this again.

I did not post the original source because it required hacking on the libstdc++
headers. I have posted a version that does not require that, the results are
identical.

In this test, it is the example Jonathan cited in #14; incrementing an atomic
int and calling notify.

This isn't about semaphore or any other synchronization primitive. Those types
are free to make different choices that may be more appropriate to the
constrained usage of the type just like Lewis' lightweight manual reset event
does (I will also note that Lewis has reviewed this implementation, and has
written a paper to be discussed at the Fall meeting, p2616). 

There are, as Jonathan has pointed out, use cases where notify can and will be
called without a notifier having any way to determine it will wake a waiter.
One example I, as the person that is going to have to implement C++26 executors
care about is a wait free work-stealing deque, it certainly doesn't require
anything more than spinning for work on an empty queue to be algorithmically
correct, but after spinning on an empty queue, making the rounds trying to
steal work from other deques, maybe spinning a bit more, just to be sure, the
de-queuing thread which hasn't been able to acquire more work is probably going
to want to enter a wait until such time as it knows it can do productive work.
Another thread pushing work into that queue won't be able to determine if the
dequeue-ing thread is spinning for new work, work stealing, or has entered a
wait, but atomic<int>::notify() does know that, and can avoid penalizing the
thread submitting work with a syscall, if there is no thread in a wait on the
other end of the deque, which is the expected case for this algorithm.

p1135 was the paper that added atomic wait/notify. One of the co-authors of
that paper wrote the libc++ implementation. That implementation, as with
libstdc++, is not simply a wrapper over the underlying platform's wait
primitive. It has two 'backends', an exponentially timed backoff and ulock
wait/wake. libstdc++ currently has a Futex and condvar backend. Both
implementations make the choice of having a short-term spinning strategy and
long term waiting strategy (spinning, futex/ulock, condvar).

I have confirmed with the libc++ implementation's author, (who also chairs the
C++ committee's Concurrency and Parallelism study group), that it was never the
intention of p1135 or the subsequently standardized language in C++20 to imply
that wait/notify were direct portable analogs to the platform's waiting
primitives. There are users, such as yourself that want exactly that, there are
other users (like in my prior industry) where busy waiting is the desired
strategy, and in between those two choices are people who want it to work as
advertised in the standard, and to do so 'efficiently'. Both libc++ and
libstdc++ take a balanced approach somewhere between always going to OS and
always spinning.

There is an open question here that your original issue raises -

* At what point do collisions on the waiter pool, with the cache invalidation
traffic and spurious wakeups that result, swamp the gain of doing this empty
waiter check on notify?

I also, made a comment about the 'size of the waiter pool not withstanding'. I
chose a smaller size than libc++ chose, in part because Jonathan and I did not
want to make any sort of ABI commitment until this had been in a few GCC
releases. This implementation is header only at present, and still considered
experimental. libc++ committed to an ABI early.

In the sizing of the libc++ waiter pool there is the comment that 'there is no
magic in this value'. 

Not only is there no magic, there is no test of any sort that I have done, or
that has been done on libc++ to determine what effect different size pools have
under different load scenarios. So, all of it is a guess at this point. I will
likely match libc++ when I do move this into the .so.

Finally, in the most recent mailing there is p2643 which proposes additional
changes to atomic waiting. One proposal is to add a 'hinted' wait that can
allow the caller to steer the choices atomic wait/notify uses. I have conferred
with the other authors of the paper and this latter option is not without
controversy, and likely some sharp edges for the user, but I plan to raise the
discussion at the Fall WG21 meeting to see what the other members of SG1 think.

[Bug libstdc++/106772] atomic::wait shouldn't touch waiter pool if used platform wait

Reply via email to