On Mon, 11 Dec 2023, Nate Eldredge wrote:
To fix, we need something like `__args._M_old = __val;` inside the loop in
__atomic_wait_address(), so that we always wait on the exact value that the
predicate __pred() rejected. Again, there are similar instances in
atomic_timed_wait.h.
Thinking through this, there's another problem. The main loop in
__atomic_wait_address() would then be:
while (!__pred(__val))
{
__args._M_old = __val;
__detail::__wait_impl(__wait_addr, &__args);
__val = __vfn();
}
The idea being that we only call __wait_impl to wait on a value that the
predicate said was unacceptable. But looking for instance at its caller
__atomic_semaphore::_M_acquire() in bits/semaphore_base.h, the predicate
passed in is _S_do_try_acquire(), whose code is:
_S_do_try_acquire(__detail::__platform_wait_t* __counter,
__detail::__platform_wait_t __old) noexcept
{
if (__old == 0)
return false;
return __atomic_impl::compare_exchange_strong(__counter,
__old, __old - 1,
memory_order::acquire,
memory_order::relaxed);
}
It returns false if the value passed in was unacceptable (i.e. zero), *or*
if it was nonzero (let's say 1) but the compare_exchange failed because
another thread swooped in and modified the semaphore counter. In that
latter case, __atomic_wait_address() would pass 1 to __wait_impl(), which
is likewise bad. If the counter is externally changed back to 1 just
before we call __platform_wait (that's the futex call), we would go to
sleep waiting on a semaphore that is already available: deadlock.
I guess there's a couple ways to fix it.
You could have the "predicate" callback instead return a tri-state value:
"all done, stop waiting" (like current true), "value passed is not
acceptable" (like current false), and "value was acceptable but something
else went wrong". Only the second case should result in calling
__wait_impl(). In the third case, __atomic_wait_address() should
just reload the value (using __vfn()) and loop again.
Or, make the callback __pred() a pure predicate that only tests its input
value for acceptability, without any other side effects. Then have
__atomic_wait_address() simply return as soon as __pred(__val) returns
true. It would be up to the caller to actually decrement the semaphore or
whatever, and to call __atomic_wait_address() again if this fails. In
that case, __atomic_wait_address should probably return the final value
that was read, so the caller can immediately do something like a
compare-exchange using it, and not have to do an additional load and
predicate test.
Or, make __pred() a pure predicate as before, and give
__atomic_wait_address yet one more callback function argument, call it
__taker(), whose job is to acquire the semaphore etc, and have
__atomic_wait_address call it after __pred(__val) returns true.
--
Nate Eldredge
n...@thatsmathematics.com