https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104442

            Bug ID: 104442
           Summary: atomic<T>::wait incorrectly loops in case of spurious
                    notification when __waiter is shared
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: poulhies at adacore dot com
  Target Milestone: ---

Created attachment 52377
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52377&action=edit
patch fixing the issue

We are observing a deadlock in 100334.cc on vxworks.

This is caused by :

        template<typename _Tp, typename _ValFn>
          void
          _M_do_wait_v(_Tp __old, _ValFn __vfn)
          {
            __platform_wait_t __val;
            if (__base_type::_M_do_spin_v(__old, __vfn, __val))
               return;

            do
              {
                __base_type::_M_w._M_do_wait(__base_type::_M_addr, __val);
              }
            while (__detail::__atomic_compare(__old, __vfn()));
          }

When several thread are sharing the waiter (as in 100334.cc), the notify_one()
will wake all threads blocked in the _M_do_wait() above. The thread whose data
changed exits the loop correctly, but the others are looping back in
_M_do_wait() with the same arguments. As the waiter's value has changed since
the previous iteration but not __val, the method directly returns (as if it had
detected a notification) and the loop continues.

On GNU/Linux, the test is PASS because the main thread is still scheduled and
will do a .store(1) on all atoms, unblocking all the busy-waiting thread (but
the thread doing a busywait can still be observed with gdb).

On vxworks, the main thread is never scheduled again (I think there's no
preemption at the same prio level) and the busy-wait starves the system.

The attached patch is a possible fix. It moves the spin() call inside the loop,
updating the __val at every iteration. A better fix is probably possible but
may require some refactoring (a bit more than I'm comfortable with).

I've checked the patch for regression on gcc-master for x86_64. It also fixes
the test on gcc-11 for aarch64-vxworks7.

Reply via email to