[Bug libstdc++/94702] std::unsequenced_policy should not be defined for C++17
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94702 Thomas Rodgers changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Last reconfirmed||2020-04-21 Assignee|unassigned at gcc dot gnu.org |rodgertq at gcc dot gnu.org
[Bug libstdc++/88101] Implement P0528R3, C++20 cmpxchg and padding bits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88101 --- Comment #4 from Thomas Rodgers --- (In reply to andysem from comment #2) > Another use case is C++20 atomic_ref, which may be bound to an object whose > padding bits are in indeterminate state. An intrinsic to clear padding bits > without altering the object value could be useful. Having now implemented atomic::wait for libstdc++, I think the intrinsic to clear padding bits before calling __builtin_memcmp for generic (trivially copyable) T's is the right approach.
[Bug libstdc++/90252] PSTL test failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90252 --- Comment #2 from Thomas Rodgers --- Author: rodgertq Date: Tue May 21 04:37:45 2019 New Revision: 271451 URL: https://gcc.gnu.org/viewcvs?rev=271451&root=gcc&view=rev Log: tbb-backend effective target should check ability to link TBB PR libstdc++/90252 * testsuite/lib/libstdc++.exp (check_effective_target_tbb-backend): Changed v3_target_compile check from preprocess to executable. Added "-ltbb" to v3_target_compile flags. Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/testsuite/lib/libstdc++.exp
[Bug libstdc++/97936] [11 Regression] 30_threads/latch/3.cc hangs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97936 --- Comment #16 from Thomas Rodgers --- I believe it is addressed in the most recent patch I have submitted for the atomic wait/notify, barriers, latches, semaphores functionality.
[Bug libstdc++/100164] [11 Regression] semaphore_impl not declared on AIX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100164 Thomas Rodgers changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rodgertq at gcc dot gnu.org --- Comment #2 from Thomas Rodgers --- Investigating
[Bug libstdc++/100164] [11 Regression] semaphore_impl not declared on AIX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100164 --- Comment #3 from Thomas Rodgers --- Created attachment 50643 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50643&action=edit Disable on unsupported targets This patch is probably not the most elegant way to do this, it probably should be a dg-effective-target check and it disables on platforms where there is a suitable Posix implementation, but this feature is experimental for GCC11.
[Bug libstdc++/100164] [11 Regression] semaphore_impl not declared on AIX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100164 --- Comment #4 from Thomas Rodgers --- It would appear that I cannot log into either of the AIX machines in the compile farm.
[Bug libstdc++/100164] [11 Regression] semaphore_impl not declared on AIX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100164 --- Comment #7 from Thomas Rodgers --- Created attachment 50645 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50645&action=edit Disable on unsupported targets Let's try this with the right patch attached this time
[Bug libstdc++/100164] [11 Regression] semaphore_impl not declared on AIX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100164 Thomas Rodgers changed: What|Removed |Added Attachment #50643|0 |1 is obsolete|| Attachment #50645|0 |1 is obsolete|| --- Comment #10 from Thomas Rodgers --- Created attachment 50646 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50646&action=edit Work around broken macro name This patch works around the borked macro name that David pointed out on the mailing list. I left in the commented out _GLIBCXX_HAVE_POSIX_SEMAPHORE checks and have temporarily replaced them with _GLIBCXX__GLIBCXX_HAVE_POSIX_SEMAPHORE, and forced the posix semaphore test to always run if that macro is defined. I am not sufficiently versed in the arcane ways in which config.h is transformed to c++config.h but the borked macro transformation ideally should be fixed and the commented out checks restored.
[Bug libstdc++/100164] [11/12 Regression] semaphore_impl not declared on AIX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100164 --- Comment #15 from Thomas Rodgers --- >From the most recent patch - +_GLIBCXX_ALWAYS_INLINE bool +_M_try_acquire() noexcept +{ + for (;;) + { + auto __err = sem_trywait(&_M_semaphore); + if (__err && (errno == EINTR)) + continue; + else if (__err && (errno == EAGAIN)) + return false; + else if (__err) + std::terminate(); + else + break; + } + return true; +} + You are correct it was never exercised. I saw then when I forced it in the test case and then added the above. I don't understand why you are seeing this error if you've applied the 0001-libstdc-Work-around-for-PR100164 patch. I don't see it locally.
[Bug libstdc++/100164] [11/12 Regression] semaphore_impl not declared on AIX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100164 --- Comment #16 from Thomas Rodgers --- The _M_try_acquire() change should be on master and gcc-11 now.
[Bug libstdc++/97719] New: "Implement C++20 features for " changed behavior of istreambuf_iterator
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97719 Bug ID: 97719 Summary: "Implement C++20 features for " changed behavior of istreambuf_iterator Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: rodgertq at gcc dot gnu.org Target Milestone: --- from mailing list - On 07/10/2020 18:55, Thomas Rodgers wrote: From: Thomas Rodgers New ctors and ::view() accessor for - * basic_stingbuf * basic_istringstream * basic_ostringstream * basic_stringstreamm New ::get_allocator() accessor for basic_stringbuf. I found that this <https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a0e4d7b44c544c84cffc7ff9c64b6f1af14fb08d> "libstdc++: Implement C++20 features for " changed the behavior of $ cat test.cc #include #include #include int main() { std::stringstream s("a"); std::istreambuf_iterator i(s); if (i != std::istreambuf_iterator()) std::cout << *i << '\n'; } $ g++ -std=c++20 test.cc $ ./a.out from printing "a" to printing nothing. (The `i != ...` comparison appears to change i from pointing at "a" to pointing to null, and returns false.) I ran into this when building LibreOffice, and I hope test.cc is a faithfully minimized reproducer. However, I know little about std::istreambuf_iterator, so it may well be that the code isn't even valid.
[Bug libstdc++/98034] std::atomic_signed_lock_free and std::atomic_unsigned_lock_free not defined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98034 Thomas Rodgers changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2020-11-30
[Bug libstdc++/100334] atomic::notify_one() sometimes wakes wrong thread
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100334 --- Comment #4 from Thomas Rodgers --- This analysis is likely correct, except for - "- protect from spurious wakeups in __waiter_pool::_M_do_wait by rechecking if the value has changed from old, if not then wait again" An earlier version of this code did this, but was subject to ABA problems, and the standard doesn't require that we do this - "(23.3) — Blocks until it is unblocked by an atomic notifying operation or is unblocked spuriously."
[Bug libstdc++/100334] atomic::notify_one() sometimes wakes wrong thread
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100334 Thomas Rodgers changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2021-05-01 Status|UNCONFIRMED |ASSIGNED
[Bug libstdc++/100334] atomic::notify_one() sometimes wakes wrong thread
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100334 --- Comment #6 from Thomas Rodgers --- Created attachment 50728 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50728&action=edit Fix wrong thread getting notification I am testing this patch
[Bug libstdc++/100334] atomic::notify_one() sometimes wakes wrong thread
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100334 Thomas Rodgers changed: What|Removed |Added Attachment #50728|0 |1 is obsolete|| --- Comment #7 from Thomas Rodgers --- Created attachment 50740 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50740&action=edit Fix wrong thread getting notification, take 2 This should fix the issue, submitting patch to mailing list.
[Bug libstdc++/100334] atomic::notify_one() sometimes wakes wrong thread
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100334 Thomas Rodgers changed: What|Removed |Added Status|ASSIGNED|WAITING
[Bug libstdc++/100334] atomic::notify_one() sometimes wakes wrong thread
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100334 Thomas Rodgers changed: What|Removed |Added Status|WAITING |ASSIGNED
[Bug libstdc++/100334] atomic::notify_one() sometimes wakes wrong thread
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100334 Thomas Rodgers changed: What|Removed |Added Keywords||patch --- Comment #8 from Thomas Rodgers --- Candidate patch submitted to mailing list - https://gcc.gnu.org/pipermail/libstdc++/2021-May/052464.html
[Bug libstdc++/100334] atomic::notify_one() sometimes wakes wrong thread
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100334 Thomas Rodgers changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #9 from Thomas Rodgers --- Fixed on master, backported to releases/gcc-11
[Bug libstdc++/100806] deadlock in std::counting_semaphore
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100806 Thomas Rodgers changed: What|Removed |Added Last reconfirmed||2021-05-28 Assignee|unassigned at gcc dot gnu.org |rodgertq at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 --- Comment #1 from Thomas Rodgers --- This is almost certainly a duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100334 Which is fixed on master with a backport to releases/gcc-11 but was broken in GCC 11.1 Will confirm, thanks!
[Bug libstdc++/100889] Wrong param type for std::atomic_ref<_Tp*>::wait
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100889 Thomas Rodgers changed: What|Removed |Added Status|NEW |ASSIGNED
[Bug libstdc++/100889] Wrong param type for std::atomic_ref<_Tp*>::wait
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100889 Thomas Rodgers changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #4 from Thomas Rodgers --- Fixed in master, backported to releases/gcc-11
[Bug libstdc++/100806] deadlock in std::counting_semaphore
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100806 --- Comment #2 from Thomas Rodgers --- I have confirmed that this is new issue, not related to PR100334. When there are waiting threads and the semaphore attempts to release one of the waiting threads, the FUTEX_WAKE syscall's return indicates that 1 thread is woken, as expected, but it is unclear to me why there is no forward progress at that point. I have replaced the algorithm with a simplified version (similar to what is in libc++) and observe the same result. Further investigation is required. I have submitted an interim patch that forces wake all which appears to address the issue.
[Bug libstdc++/101228] tbb/task.h is Deprecated in newer TBB.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101228 Thomas Rodgers changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rodgertq at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #9 from Thomas Rodgers --- Yes (I already reviewed it when it arrived upstream). It is probably also safe to backport to GCC-11 since there's no ABI or ABI stability concerns. I will have a go at applying *just* this patch (I don't want to commit to trying a full rebase against upstream at this point).
[Bug libstdc++/108974] std::barrier except completion function which is not manifestly noexcept
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108974 --- Comment #4 from Thomas Rodgers --- (In reply to Jiang An from comment #3) > > is_nothrow_invocable_v shall be true. > > If implementation divergence is not intendedly permitted, I don't think it > makes much sense to introduce UB in this way. > > I guess we should either turn it into a mandating requirement: > > Instantiation of barrier is ill-formed > > if is_nothrow_invocable_v is not true. > > Or relax the preconditions: > > If any invocation to the completion function throws an exception, > > the behavior is undefined. > > I've mailed to LWG Chair for this... LWG Chair has already weighed in on this thread. libstdc++ and libc++ have essentially the same implementation of std::barrier (libstdc++'s implementation is derived from libc++'s). So that they behave consistently and MSVC is different is not surprising. I don't have a strong opinion on whether this should be UB ore mandated. Raising an LWG issue seems reasonable.
[Bug libstdc++/108974] std::barrier except completion function which is not manifestly noexcept
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108974 Thomas Rodgers changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rodgertq at gcc dot gnu.org
[Bug libstdc++/103934] std::atomic_flag: multiple C++20 functions missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103934 Thomas Rodgers changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #8 from Thomas Rodgers --- Fixed on master, backported to releases/gcc-12 and releases/gcc-12
[Bug libstdc++/88322] Implement C++20 library features.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88322 Bug 88322 depends on bug 103934, which changed state. Bug 103934 Summary: std::atomic_flag: multiple C++20 functions missing https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103934 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug libstdc++/106772] atomic::wait shouldn't touch waiter pool if used platform wait
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106772 Thomas Rodgers changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #2 from Thomas Rodgers --- This is not a bug. The size of the waiter pool, not withstanding, the point is you want to determine if a notify is likely to notify anything, to avoid the very expensive (by comparison) syscall to wake the futex.
[Bug libstdc++/106772] atomic::wait shouldn't touch waiter pool if used platform wait
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106772 --- Comment #3 from Thomas Rodgers --- Since this latter point has come up before, I want to additionally note that the optimization to use an atomic count of waiters per-waiter pool bucket means that a call to notify_one/notify_all is roughly 25x faster based on my testing than naively issuing a syscall to FUTEX_WAKE when there is no possibility of the wake being issued to a waiter. 2022-09-19T20:34:28-07:00 Running ./benchmark Run on (20 X 4800 MHz CPU s) CPU Caches: L1 Data 48 KiB (x10) L1 Instruction 32 KiB (x10) L2 Unified 1280 KiB (x10) L3 Unified 24576 KiB (x1) Load Average: 0.69, 0.61, 1.30 ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. -- BenchmarkTime CPU Iterations -- BM_empty_notify_checked 3.79 ns 3.79 ns179929051 BM_empty_notify_syscall 94.1 ns 93.9 ns 7477997 For types that can use a FUTEX directly (e.g. int) there is no place to put that extra atomic to perform this check, so we can either have the type that is directly usable by the underlying platform be significantly more expensive to call, or we can use the waiter count in the waiter_pool.
[Bug libstdc++/106772] atomic::wait shouldn't touch waiter pool if used platform wait
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106772 --- Comment #6 from Thomas Rodgers --- (In reply to Mkkt Bkkt from comment #5) > Single reason why I want to use atomic::wait/notify is cross platform api > for futex, wait/wake on address, ulock, etc > Not because I need YOU decide instead of me how many spins, or other > optimization I need. > > boost::atomic already do this. > > Why do you cannot make same? (In reply to Mkkt Bkkt from comment #4) > Nice, thanks for benchmarks without code. > > Also, why do I need call notify when don't have wait? > > Common usage from my point of view looks like: > > atomic.wait(value, mo): > > while (atomic.load(mo) == value) { > futex_wait(); > } > > Notify want to looks like: > > atomic.store(1, mo) > atomic.notify_one(); > > See: > > https://github.com/lewissbaker/cppcoro/blob/master/lib/ > lightweight_manual_reset_event.cpp > I have every confidence that Lewis knows how to bring a paper for a 'lightweight manual reset event' to SG1, I suspect it will be well received when he does. > https://github.com/YACLib/YACLib/blob/main/src/util/atomic_event.cpp > > and others > > So your optimization is pretty useless. > > Also if for some reason possible few notification, or notify before wait, > then possible to have code can looks like: > > if (atomic.exchange(non_wait_value, mo) == wait_value) { > atomic.notify_one(); > } I
[Bug libstdc++/106772] atomic::wait shouldn't touch waiter pool if used platform wait
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106772 --- Comment #11 from Thomas Rodgers --- (In reply to Mkkt Bkkt from comment #9) > Why do you think you smarter than msvc stl, libc++, boost::atomic developers? > > Maybe it's about your "I"? I should ignore this (see jwakely's response), but - I very much don't think I am smarter than the person (ogir...@apple.com) who implemented libc++'s implementation. There are minor difference between libstc++'s details and libc++'s but in this regard,in particular, they behave the same.
[Bug libstdc++/106772] atomic::wait shouldn't touch waiter pool if used platform wait
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106772 --- Comment #17 from Thomas Rodgers --- Created attachment 53638 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53638&action=edit benchmark main.cc
[Bug libstdc++/106772] atomic::wait shouldn't touch waiter pool if used platform wait
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106772 --- Comment #21 from Thomas Rodgers --- (In reply to Mkkt Bkkt from comment #16) > > it with sarcasm > > I started with sarcasm because you restart this thread with some doubtful > benchmarks without code for them. > > I think it's very disrespectfully. I wasn't sarcastic in what I posted. As I noted, this question has come up before in different contexts, Bugzilla is a useful historical archive, so updating this issue with my reasoning and a bit of data was primarily a capture task. So, let's try this again. I did not post the original source because it required hacking on the libstdc++ headers. I have posted a version that does not require that, the results are identical. In this test, it is the example Jonathan cited in #14; incrementing an atomic int and calling notify. This isn't about semaphore or any other synchronization primitive. Those types are free to make different choices that may be more appropriate to the constrained usage of the type just like Lewis' lightweight manual reset event does (I will also note that Lewis has reviewed this implementation, and has written a paper to be discussed at the Fall meeting, p2616). There are, as Jonathan has pointed out, use cases where notify can and will be called without a notifier having any way to determine it will wake a waiter. One example I, as the person that is going to have to implement C++26 executors care about is a wait free work-stealing deque, it certainly doesn't require anything more than spinning for work on an empty queue to be algorithmically correct, but after spinning on an empty queue, making the rounds trying to steal work from other deques, maybe spinning a bit more, just to be sure, the de-queuing thread which hasn't been able to acquire more work is probably going to want to enter a wait until such time as it knows it can do productive work. Another thread pushing work into that queue won't be able to determine if the dequeue-ing thread is spinning for new work, work stealing, or has entered a wait, but atomic::notify() does know that, and can avoid penalizing the thread submitting work with a syscall, if there is no thread in a wait on the other end of the deque, which is the expected case for this algorithm. p1135 was the paper that added atomic wait/notify. One of the co-authors of that paper wrote the libc++ implementation. That implementation, as with libstdc++, is not simply a wrapper over the underlying platform's wait primitive. It has two 'backends', an exponentially timed backoff and ulock wait/wake. libstdc++ currently has a Futex and condvar backend. Both implementations make the choice of having a short-term spinning strategy and long term waiting strategy (spinning, futex/ulock, condvar). I have confirmed with the libc++ implementation's author, (who also chairs the C++ committee's Concurrency and Parallelism study group), that it was never the intention of p1135 or the subsequently standardized language in C++20 to imply that wait/notify were direct portable analogs to the platform's waiting primitives. There are users, such as yourself that want exactly that, there are other users (like in my prior industry) where busy waiting is the desired strategy, and in between those two choices are people who want it to work as advertised in the standard, and to do so 'efficiently'. Both libc++ and libstdc++ take a balanced approach somewhere between always going to OS and always spinning. There is an open question here that your original issue raises - * At what point do collisions on the waiter pool, with the cache invalidation traffic and spurious wakeups that result, swamp the gain of doing this empty waiter check on notify? I also, made a comment about the 'size of the waiter pool not withstanding'. I chose a smaller size than libc++ chose, in part because Jonathan and I did not want to make any sort of ABI commitment until this had been in a few GCC releases. This implementation is header only at present, and still considered experimental. libc++ committed to an ABI early. In the sizing of the libc++ waiter pool there is the comment that 'there is no magic in this value'. Not only is there no magic, there is no test of any sort that I have done, or that has been done on libc++ to determine what effect different size pools have under different load scenarios. So, all of it is a guess at this point. I will likely match libc++ when I do move this into the .so. Finally, in the most recent mailing there is p2643 which proposes additional changes to atomic waiting. One proposal is to add a 'hinted' wait that can allow the caller to steer the choices atomic wait/notify uses. I have conferred with the other authors of the paper and this latter option is not without controversy, and likely some sharp edges for the user, but I plan to raise the discussion at the Fall WG21 meeting to see what the other members of SG1 think.
[Bug libstdc++/106772] atomic::wait shouldn't touch waiter pool if used platform wait
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106772 --- Comment #22 from Thomas Rodgers --- (In reply to Mkkt Bkkt from comment #20) > My main concern with this optimization it's not zero-overhead. > > It's not necessary when we expect we have some waiters, in that case it just > additional synchronization and contention in waiter pool (that have small > fixed size, just imagine system with 100+ cores, if we have > 16 waiting > threads some of them make fetch_add/sub on the same atomic, that can be > expensive, especially on numa) > > And at the same time, I don't understand when I need to notify and cannot > know notification not needed. > I don't understand when it useful. You are correct, it is not zero overhead. It also isn't clear what those overheads are, either. As I noted in comment #21, there is no test over a variety of workloads to inform this discussion, either. Your example of '100+ core' systems especially on NUMA is certainly a valid one. I would ask, at what point do those collisions and the resulting cache invalidation traffic swamp the cost of just making the syscall? I do plan to put these tests together, because there is another algorithm that I am exploring, that I believe will reduce the likelihood of spurious wakeups, and achieves the same result as this particular approach, without generating the same invalidation traffic. At this point, I don't anticipate doing that work until after GCC13 stage1 closes.
[Bug libstdc++/106772] atomic::wait shouldn't touch waiter pool if used platform wait
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106772 --- Comment #25 from Thomas Rodgers --- (In reply to Mkkt Bkkt from comment #24) > (In reply to Thomas Rodgers from comment #22) > > Your example of '100+ core' systems especially on NUMA is certainly a valid > > one. I would ask, at what point do those collisions and the resulting cache > > invalidation traffic swamp the cost of just making the syscall? I do plan to > > put these tests together, because there is another algorithm that I am > > exploring, that I believe will reduce the likelihood of spurious wakeups, > > and achieves the same result as this particular approach, without generating > > the same invalidation traffic. At this point, I don't anticipate doing that > > work until after GCC13 stage1 closes. > > I try to explain: > > syscall overhead is some constant commonly like 10-30ns (futex syscall can > be more expensive like 100ns in your example) > > But numbers of cores are grow, arm also makes more popular (fetch_add/sub > have more cost on it compares to x86) > And people already faced with situation where fetch_add have a bigger cost > than syscall overhead: > > https://pkolaczk.github.io/server-slower-than-a-laptop/ > https://travisdowns.github.io/blog/2020/07/06/concurrency-costs.html > > I don't think we will faced with problem like in these links in > atomic::wait/notify in real code, but I'm pretty sure in some cases it can > be more expansive than syscall part of atomic::wait/notify > > Of course better to prove it, maybe someday I will do it :( So to your previous comment, I don't the discussion is at all pointless. i plan to raise some of these issues at the next SG1 meeting in November. Sure, that doesn't help *you* or any developer with your specific intent until C++26, and maybe Boost's implementation is a better choice, I also get how unsatisfying of an aswer that is. I'm well aware of the potential scalability problems, and I have a longer term plan to get concrete data on how different implementation choices impact scalability. The barrier implementation (which is the same algorithm as in libc++), for example spreads this traffic over 64 individual atomic_refs, for this very reason, and that implementation has been shown to scale quite well on ORNL's Summit. But not all users of libstdc++ have those sorts of problems either.
[Bug libstdc++/102074] include/bits/atomic_timed_wait.h:215: possible missing return ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102074 Thomas Rodgers changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #4 from Thomas Rodgers --- Fixed on master, see - https://gcc.gnu.org/g:763eb1f19239ebb19c0f87590a4f02300c02c52b
[Bug other/89863] [meta-bug] Issues in gcc that other static analyzers (cppcheck, clang-static-analyzer, PVS-studio) find that gcc misses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89863 Bug 89863 depends on bug 102074, which changed state. Bug 102074 Summary: include/bits/atomic_timed_wait.h:215: possible missing return ? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102074 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug libstdc++/102994] std::atomic::wait is not marked const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102994 --- Comment #7 from Thomas Rodgers --- (In reply to Óscar Fuentes from comment #6) > (In reply to Jonathan Wakely from comment #5) > > (In reply to Óscar Fuentes from comment #4) > > > The fix is wrong. It changes atomic_notify_one and atomic_notify_all > > > instead > > > of atomic<>::wait. > > > > It changed both. > > > > > So right now atomic<>::wait remains unfixed > > > > Are you sure? > > Sigh. Sorry. It would be nice if the commit message mentioned the change to > atomic_notify_* and its motivation, though. > > > > and atomic_notify_(one|all) arg > > > is wrongly marked as const. > > > > This will be the subject of a library issue, potentially fixing the > > standard. The notify functions should be const too. > > So IIUC you are applying modifications to libstdc++ that deviate from the > published standard expecting that the committee will accept those changes. > As a user, this is troublesome, because right now I need to special-case gcc > version >11.2 and maybe version not accepted and is reverted. There is an ongoing discussion between myself and the SG1,LWG, and LEWG chairs (two of which were authors of p1135 which proposes atomic wait/notify) as to whether there is a wording issue with the standard. None of the three major standard library implementations require (as a matter of implementation detail) notify_one/notify_all to be non-const, and indeed the early wording of p1135 had them marked const. Between r2 and r3 of p1135 this was changed, it'cites the minutes of an LEWG discussion as part of the change rationale, but the minutes of that discussion do not give the motivation for the change. One argument is that you would typically wait in a const context and notify in a non-const context, but by that rationale, the constness of atomic_ref::notify is somewhat weird.
[Bug libstdc++/102994] std::atomic::wait is not marked const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102994 Thomas Rodgers changed: What|Removed |Added Status|REOPENED|SUSPENDED --- Comment #11 from Thomas Rodgers --- (In reply to Jonathan Wakely from comment #10) > N.B. [member.functions] in the standard says > > "For a non-virtual member function described in the C++ standard library, an > implementation may declare a different set of member function signatures, > provided that any call to the member function that would select an overload > from the set of declarations described in this document behaves as if that > overload were selected." > > In general, being declared with a different signature is permitted. > > Do you have an example where a call to std::atomic::notify_one() that > should be valid according to the standard either fails to compile or > misbehaves, as a result of being const qualified? Pending the outcome of whether there is an LWG issue with the wording, and given this, I am going to mark this issue SUSPENDED.
[Bug libstdc++/104442] atomic::wait incorrectly loops in case of spurious notification when __waiter is shared
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104442 Thomas Rodgers changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2022-02-09 CC||rodgertq at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED --- Comment #1 from Thomas Rodgers --- I took a look at whether or not it made sense to do a different refactoring with this version of the implementation and I don't believe that it does (the implementation detail here is likely to change substantially when we commit to an ABI stable version) and indeed, the predicate version does exactly what this patch proposes.
[Bug libstdc++/104442] atomic::wait incorrectly loops in case of spurious notification when __waiter is shared
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104442 --- Comment #4 from Thomas Rodgers --- (In reply to Jonathan Wakely from comment #3) > Tom submitted > https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590050.html already, > which is slightly different. The primary difference was changing the memory order of the load in _M_do_spin. I may revert that part of the change and submit it separately, as I've noticed one other case where I'd like to change it from RELAXED to SEQ_CST.
[Bug libstdc++/104442] atomic::wait incorrectly loops in case of spurious notification when __waiter is shared
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104442 Thomas Rodgers changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #8 from Thomas Rodgers --- (In reply to Marc Poulhiès from comment #7) > Thanks ! Since the patch committed is effectively the same as your proposed patch, I am going to assume this resolves the issue on vxworks, but I have no way to independently confirm that.
[Bug tree-optimization/102586] [12 Regression] ICE in clear_padding_type, at gimple-fold.c:4798 since r12-3433-ga25e0b5e6ac8a77a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102586 --- Comment #20 from Thomas Rodgers --- (In reply to Jakub Jelinek from comment #17) ... > I don't remember the std::bit_cast case right now, OpenMP atomics are about Not sure if this is what you are talking about (frankly most of this is well above my grokability) but, for std::bit_cast, the standard says - "Constraints: —(1.1) sizeof(To) == sizeof(From) is true; —(1.2) is_trivially_copyable_v is true; and —(1.3) is_trivially_copyable_v is true" So same as for
[Bug libstdc++/102994] std::atomic::wait is not marked const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102994 --- Comment #14 from Thomas Rodgers --- Created attachment 52420 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52420&action=edit Make notify_one/notify_all non-const I submitted this patch to the list.
[Bug libstdc++/100806] deadlock in std::counting_semaphore
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100806 Thomas Rodgers changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #5 from Thomas Rodgers --- I believe this has been fully resolved.
[Bug target/101761] Random hang with 29_atomics/atomic_ref/wait_notify.cc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101761 --- Comment #3 from Thomas Rodgers --- This appears to be the test case itself.
[Bug target/101761] Random hang with 29_atomics/atomic_ref/wait_notify.cc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101761 --- Comment #5 from Thomas Rodgers --- (In reply to Florian Weimer from comment #4) > I think it's a test bug: > > std::atomic_ref a{ aa }; > > std::thread t([&] > { > a.store(bb); > a.notify_one(); > }); > a.wait(aa); > > Due to the use of std::atomic_ref, store() overwrites aa with the value of > bb. If the notify_one() call completes before the wait() call, wait() blocks > because aa == bb (due to the previous store()), and the wakeup never happens > because wakes are not queued. This was my initial suspicion. It applies to all of the 29_atomics/**/wait_notify.cc tests. The libc++ tests for wait/notify have the same problem (see https://github.com/llvm/llvm-project/blob/main/libcxx/test/std/atomics/atomics.types.operations/atomics.types.operations.wait/atomic_wait.pass.cpp) I have, as part of this investigation, extensively re-reviewed the implementation detail in bits/atomic_wait.h and I think there is a race condition lurking in the implementation which also needs to be address. Specifically tracking waiters to try to void making the syscall for FUTEX_WAKE_PRIVATE if there are no current waiters. In order for this to work the increment of the waiter count would have to be atomic with syscall. I believe that libc++ also has this same issue.
[Bug target/101761] Random hang with 29_atomics/atomic_ref/wait_notify.cc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101761 --- Comment #11 from Thomas Rodgers --- (In reply to Jonathan Wakely from comment #10) > http://eel.is/c++draft/atomics#ref.generic.general-3.sentence-2 > > "While any atomic_ref instances exist that reference the *ptr object, all > accesses to that object shall exclusively occur through those atomic_ref > instances." Yes. I will submit a patch for this test shortly. Having said that, the atomic integral tests also spuriously deadlock, they don't have this UB issue.
[Bug libstdc++/102377] FAIL: 29_atomics/atomic_flag/cons/56012.cc with -std=gnu++20
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102377 --- Comment #2 from Thomas Rodgers --- (In reply to Jonathan Wakely from comment #1) > The reason we don't see it in every test is that this one uses: > > // { dg-options "-Wsystem-headers -Wnarrowing" } > > But I think the warning is pointing out a real issue. Since the interference > sizes vary with -mtune options, we shouldn't use them in , since > that's defining a public ABI (or will be, once our C++20 support is > non-experimental). Agreed.
[Bug libstdc++/102994] std::atomic::wait is not marked const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102994 Thomas Rodgers changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #18 from Thomas Rodgers --- Committed to GCC12 and backported to GCC11
[Bug libstdc++/106183] std::atomic::wait might fail to be unblocked by notify_one/all on platforms without platform_wait()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106183 Thomas Rodgers changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2022-07-06 Status|UNCONFIRMED |ASSIGNED
[Bug libstdc++/97936] [11/12 Regression] 30_threads/latch/3.cc hangs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97936 Thomas Rodgers changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED |--- --- Comment #20 from Thomas Rodgers --- Solaris uses the non-futex wait/notify path. There has been a recent PR opened indicating a likely algorithmic issue with the non-futex implementation. See - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106183 I am going to re-open this issue while I investigate the new report.
[Bug libstdc++/101274] [11/12 Regression] std::execution::seq has incorrect behaviour under GCC 11.1.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101274 Thomas Rodgers changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |INVALID --- Comment #3 from Thomas Rodgers --- The standard only requires that the the parallel algorithm's execution, when invoked with std::sequenced_policy, be invoked on the calling thread, and (as with the other execution policies) allows for the invocations of element access functions to be indeterminately sequenced on the execution agent. This is also spelled at - https://en.cppreference.com/w/cpp/algorithm/execution_policy_tag_t "1) The execution policy type used as a unique type to disambiguate parallel algorithm overloading and require that a parallel algorithm's execution may not be parallelized. The invocations of element access functions in parallel algorithms invoked with this policy (usually specified as std::execution::seq) are indeterminately sequenced in the calling thread."
[Bug libstdc++/101274] [11/12 Regression] std::execution::seq has incorrect behaviour under GCC 11.1.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101274 --- Comment #4 from Thomas Rodgers --- I did some more reading of http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/n4878.pdf and it would seem that you are not even guaranteed a deterministic ordering of the application of binary_op on the non-execution policy accepting overloads of std::reduce(). See [reduce] the note at bullet 9 - "[Note 1 : The difference between reduce and accumulate is that reduce applies binary_op in an unspecified order, which yields a nondeterministic result for non-associative or non-commutative binary_op such as floating-point addition. — end note]"
[Bug libstdc++/101274] [11/12 Regression] std::execution::seq has incorrect behaviour under GCC 11.1.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101274 --- Comment #6 from Thomas Rodgers --- It does raise an issue which I'm going to follow up on separately on the SG1 (concurrency and parallelism study group) mailing list. While it is indeed the case that standard says you can't count on deterministic sequencing for std::reduce(), I can't find any wording that directly supports the wording on cppreference.com std::execution::sequenced_policy, though that wording is consistent with my recollection of SG1's discussions regarding execution policies. There is wording that says 'Unless otherwise specified, the semantics of ExecutionPolicy algorithm overloads are identical to their overloads without." The wording for std::for_each() for the non-execution policy overload the standard says - Effects: Applies f to the result of dereferencing every iterator in the range [first, last), starting from first and proceeding to last - 1. And for one that takes an execution policy - Effects: Applies f to the result of dereferencing every iterator in the range [first, last). So it omits the ordering constraint, which makes sense, but I wonder if we shouldn't be more explicit in the documentation of the policies themselves.
[Bug libstdc++/99277] C++2a synchronisation is inefficient in GCC 11
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99277 --- Comment #16 from Thomas Rodgers --- (In reply to Thiago Macieira from comment #15) > > > 5) std::barrier implementation also uses a type that futex(2) can't > > > handle > > > barrier still uses a 1-byte enum for the atomic waits. > > That can only now be fixed for libstdc++.so.7, then. The original implementation came from Olvier Giroux and is part of libc++. The libc++ implementation also does not use a type that futex or ulock_wait/wake (uint64_t) can handle. I have discussed this in the past with Olivier, the choice of char was deliberate on his part. The implementation has been tested on a number of platforms (including time on ORNL's Summit). The following comment, preserved from libc++ should be considered carefully before any change here - " 2. A great deal of attention has been paid to avoid cache line thrashing by flattening the tree structure into cache-line sized arrays, that are indexed in an efficient way." It is my opinion that the bar for making a change here is high. I would need to see benchmark numbers that illustrate the performance differences under various contention scenarios vs impact on caches by being able to fit the entire tree in a single cache line using char, vs four or eight cache lines using the type favored by futex or ulock_wait/wake.
[Bug libstdc++/99277] C++2a synchronisation is inefficient in GCC 11
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99277 --- Comment #23 from Thomas Rodgers --- (In reply to Thiago Macieira from comment #21) > I understand that. I don't think it's a reason to repeat the policy, though. > > Anyway, I don't have any new arguments than when we discussed this two years > ago, so I won't pursue this matter further. (Un)fortunately, the entire implementation detail in the headers will change, most likely with GCC14 (ideally moving most of the implementation into the .so). As Jonathan points out, this hardly the only experimental feature area within libstdc++ that has undergone such changes.