https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016

Rachel Mant <rachel at rachelmant dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rachel at rachelmant dot com

--- Comment #11 from Rachel Mant <rachel at rachelmant dot com> ---
(In reply to Andrew Pinski from comment #9)

Hi Andrew, original author of the ode here

> So I think this is a bug in your code:
> 
> Inside substrate::threadPool_t::finish,
> we have:
> 
>                         finished = true;
>                         haveWork.notify_all();
> 

`finished` here should be a std::atomic in strict sequential ordering mode

> If I change it to:
>                         {
>                           std::lock_guard<std::mutex> lock{workMutex};
>                           finished = true;
>                           haveWork.notify_all();
>                         }
> 

This will work, yes, except it will also cause mutex contention on the
condition variable, and should not be necessary iff the atomic is working
correctly as the order of operations between the threads should be:

 Thread 1             | Thread 2
----------------------+-------------------
finished = true       | <in haveWork.wait()>
haveWork.notify_all() | <wake-up>
<joining begins>      | lambda evlauation
                      | <haveWork.wait() exists>

The only way that the mutex can fix this is to add in the barrier that should
already be there by virtue of the atomic being used std::memory_order_seq_cst
unless there's also a TOCTOU issue in play.

> Then I don't get a deadlock at all.
> As I mentioned, I did think there was a race condition.
> Here is what I think happened:
> Thread26:                            thread 1
> checks finished, still false         sets finished to be true
> calls wait                           calls notify_all
> ...                                  notify_all happens
> finally gets into futex_wait syscall ....
> 
> And then thread26 never got the notification.
> 
> With my change the check for finished has to wait till thread1 lets go of
> the mutex (and the other way around).

Do please correct us if we're wrong about our understanding on how the atomic
should be working here. We're well aware that concurrency is /hard/ and getting
things like this right is even harder. The thing that stands out, as Amy said,
is that GCC is the *only* compiler on which we get this deadlock which is why
she and we think this is a stdlib or codegen bug rather than our bug, but if
you can say how it's not, then we're happy to accept that.

Reply via email to