https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104442
Bug ID: 104442 Summary: atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: poulhies at adacore dot com Target Milestone: --- Created attachment 52377 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52377&action=edit patch fixing the issue We are observing a deadlock in 100334.cc on vxworks. This is caused by : template<typename _Tp, typename _ValFn> void _M_do_wait_v(_Tp __old, _ValFn __vfn) { __platform_wait_t __val; if (__base_type::_M_do_spin_v(__old, __vfn, __val)) return; do { __base_type::_M_w._M_do_wait(__base_type::_M_addr, __val); } while (__detail::__atomic_compare(__old, __vfn())); } When several thread are sharing the waiter (as in 100334.cc), the notify_one() will wake all threads blocked in the _M_do_wait() above. The thread whose data changed exits the loop correctly, but the others are looping back in _M_do_wait() with the same arguments. As the waiter's value has changed since the previous iteration but not __val, the method directly returns (as if it had detected a notification) and the loop continues. On GNU/Linux, the test is PASS because the main thread is still scheduled and will do a .store(1) on all atoms, unblocking all the busy-waiting thread (but the thread doing a busywait can still be observed with gdb). On vxworks, the main thread is never scheduled again (I think there's no preemption at the same prio level) and the busy-wait starves the system. The attached patch is a possible fix. It moves the spin() call inside the loop, updating the __val at every iteration. A better fix is probably possible but may require some refactoring (a bit more than I'm comfortable with). I've checked the patch for regression on gcc-master for x86_64. It also fixes the test on gcc-11 for aarch64-vxworks7.