https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016
--- Comment #14 from Rachel Mant <rachel at rachelmant dot com> --- (In reply to Andrew Pinski from comment #12) > Let me try again to show the exact events of why I think this is not a > libstdc++/GCC bug here. > > > time thread/core 1 thread/core N > -1 grab the mutex > 0 atomically load waitingThreads atomically increment > waitingThreads > 1 compare waitingThreads atomically load finished > 2 atomically set finished to 1 atomically load work.empty() > (queueLength) > 3 start of notify_all branch on finished/queueLength > 4 ...(some code before ...) start on haveWork.wait > 5 notifies all threads finished .....(some more before ...) > 6 ..... waiting now > 7 starts of joins still inside wait > 8 joins hit thread N still inside wait > etc. > > You will notice the ordering of loading finished and the wait (and setting > of finished and notify_all) is exactly ordered as you expect them to be > ordered with memory_order_seq_cst on each of the core; that is there is no > reordering going on each thread/core. It is still strictly ordered even. > > The reason why maybe libstdc++ exposes this is that the wait implemention > checks the predicate before it goes into wait system call. or the time > between the start of the call of notify_all call starts and the > notifications go out is shorter than the time it takes to after the atomic > load of finished happenes and the wait system call happens. > > Since on thread 1, updating finished to 1 and notify_all is not done > atomically (together), a thread could have read finished before the update > and get into the wait loop after the notifications have gone out. > > It is very similar to a TOCTOU issue because the use of the idea of finished > is the wait itself rather than the comparison. and setting of finished and > notify are done atomically (together); right now there is only an atomic > ordering of the two. Thank you for the clear run-through of the series of events you see leading to the deadlock. That's very helpful. To properly understand this problem space, why do you think locking the mutex before setting `finished` is sufficient to fix this? It feels to us like it shouldn't, and should only mask the bug, making it less likely to trigger?