https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104928

--- Comment #19 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jonathan Wakely <r...@gcc.gnu.org>:

https://gcc.gnu.org/g:7be4913b8f8b1e1474751656d45b301aa0c20790

commit r16-1102-g7be4913b8f8b1e1474751656d45b301aa0c20790
Author: Jonathan Wakely <jwak...@redhat.com>
Date:   Mon Jun 2 13:06:27 2025 +0100

    libstdc++: Fix std::counting_semaphore::acquire deadlock [PR104928]

    There's a deadlock in std::counting_semaphore that occurs when the
    semaphore is under contention. The bug happens when one thread tries to
    acquire the mutex, calling __semaphore_base::_S_do_try_acquire to
    atomically decrement the counter using compare_exchange_strong. If the
    counter is non-zero (and so should be possible to decrement) but another
    thread changes it (either incrementing or decrementing it) then the
    compare_exchange fails and _S_do_try_acquire returns false. Because that
    function is used by the predicate passed to __atomic_wait_address, when
    it returns false the thread does a futex wait until the value changes.
    However, when the predicate is false because the compare_exchange failed
    due to not matching the expected value, waiting for the value to change
    is incorrect. The correct behaviour would be to retry the
    compare_exchange using the new value (as long as it's still non-zero).
    Waiting for the value to change again means we can block forever,
    because it might never change again.

    The predicate should only test the value, not also attempt to alter it,
    and its return value should mean only one thing, not conflate a busy
    semaphore that cannot be acquired with a contended one that can be
    acquired by retrying.

    The correct behaviour of __semaphore_base::_M_acquire would be to
    attempt the decrement, and to retry immediately if it failed due to
    contention on the variable (i.e. due to the variable not having the
    expected value).  It should only wait for the value to change when the
    value is zero, because that's the only time we can't decrement it.

    This commit moves the _S_do_try_acquire call out of the predicate and
    loops while it is false, only doing an atomic wait when the counter's
    value is zero. The predicate used for the atomic wait now only checks
    whether the value is decrementable (non-zero), without also trying to
    perform that decrement.

    In order for the caller to tell whether it should retry a failed
    _S_do_try_acquire or should wait for the value to be non-zero, the value
    obtained by a failed compare_exchange needs to be passed back to the
    caller. _S_do_try_acquire is changed to take its parameter by reference,
    so that the caller gets the new value and can check whether it's zero.

    In order to avoid doing another atomic load after returning from an
    atomic wait, the predicate is also changed to capture the local __val by
    reference, and then assign to __val when it sees a non-zero value. That
    makes the new value available to _M_acquire, so it can be passed to
    _S_do_try_acquire as the expected value of the compare_exchange.
    Although this means that the predicate is modifying data again, not just
    checking a value, this modification is safe. It's not changing the
    semaphore's counter, only changing a local variable in the caller to
    avoid a redundant atomic load.

    Equivalent changes are made to _M_try_acquire_until and
    _M_try_acquire_for. They have the same bug, although they can escape the
    deadlock if the wait is interrupted by timing out. For _M_acquire
    there's no time out so it potentially waits forever.

    _M_try_acquire also has the same bug, but can be simplified to just
    calling _M_try_acquire_for(0ns). A timeout of zero results in calling
    __wait_impl with the __spin_only flag set, so that the value is loaded
    and checked in a spin loop but there is no futex wait. This means that
    _M_try_acquire can still succeed under light contention if the counter
    is being changed concurrently, at the cost of a little extra overhead.
    It would be possible to implement _M_try_acquire as nothing more than an
    atomic load and a compare_exchange, but it would fail when there is any
    contention.

    libstdc++-v3/ChangeLog:

            PR libstdc++/104928
            * include/bits/semaphore_base.h (_S_do_try_acquire): Take old
            value by reference.
            (_M_acquire): Move _S_do_try_acquire call out of the predicate
            and loop on its result. Make the predicate capture and update
            the local copy of the value.
            (_M_try_acquire_until, _M_try_acquire_for): Likewise.
            (_M_try_acquire): Just call _M_try_acquire_for.
            * testsuite/30_threads/semaphore/104928-2.cc: New test.
            * testsuite/30_threads/semaphore/104928.cc: New test.

    Reviewed-by: Tomasz KamiÅski <tkami...@redhat.com>

Reply via email to