https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016
--- Comment #13 from Andrew Pinski <pinskia at gcc dot gnu.org> --- I suspect if you change the lambda/call to substrate::threadPool_t::waitWork to be: inline std::pair<bool, std::tuple<args_t...>> waitWork() noexcept { std::unique_lock<std::mutex> lock{workMutex}; ++waitingThreads; // wait, but protect ourselves from accidental wake-ups.. auto b = [this]() noexcept -> bool { bool t = finished; for(volatile int i = 0 ; i < 10000;i ++); return t || !work.empty(); }; //if (!b()) haveWork.wait(lock, b); --waitingThreads; if (!work.empty()) return {true, work.pop()}; return {false, {}}; } you might hit the issue with more C++ implementations. This should simulates the issue I was mentioning by adding a slight delay between the load of finished before the call to wait(lock).