Re: [PATCH v2] libstdc++: Make atomic>::wait sensitive to stored pointer only changes [PR118757]

Tomasz Kaminski Fri, 17 Oct 2025 15:46:24 -0700

On Tue, Oct 14, 2025 at 12:38 PM Jonathan Wakely <[email protected]> wrote:


> On Thu, 21 Aug 2025 at 08:22 +0200, Tomasz Kamiński wrote:
> >Previously, atomic<shared_ptr<T>>::wait (and the weak_ptr version) was
> >equivalent to waiting directly on _M_val, which corresponds to the
> pointer to
> >the control block (_M_pi). Consequently, wakeups were not triggered if
> the stored
> >pointer value was changed to a pointer that uses the same control block
> but
> >stores pointer to a different object. Such a pointer can be constructed
> using an
> >aliasing constructor.
> >
> >To address this, wait now uses a generic proxy wait
> std::__atomic_wait_address
> >function, which supports waiting until any predicate is satisfied. The
> provided
> >predicate now compares both the control block (_M_pi) and the stored
> pointer
> >(_M_ptr). Comparing the latter requires locking the pointer.
> >
> >Since this function operates on raw pointers, the type of _M_val was
> changed
> >from __atomic_base<uintptr_t> to uninptr. Invocations of the corresponding
>
> s/uninptr/uintptr_t/
>
> >member functions are now replaced with direct use of __atomic builtins.
>
> The text above doesn't fit in 80 columns, and will look even worse in
> the output of 'git log' where it's indented by another 4 columns.
>
> See https://cbea.ms/git-commit/#wrap-72 for recommendations on commit
> message formatting.
>
>
> >
> >       PR libstdc++/118757
> >
> >libstdc++-v3/ChangeLog:
> >
> >       * include/bits/shared_ptr_atomic.h (_Atomic_count::_M_wait_unlock):
> >       Add parameter capturing reference to _M_ptr. Reimplement in terms
> >       of __atomic_wait_address.
> >       (_Atomic_count::~_Atomic_count, _Atomic_count::lock)
> >       (_Atomic_count::unlock, _Atomic_count::_M_swap_unlock): Replace
> >       invocation of atomic member funcitons with __atomic builtins.
> >       (_Atomic_count::notify_one, _Atomic_count::notify_all):
> >       Use __atomic_notify_address.
> >       (_Sp_atomic::element_type): Define.
> >       (_Sp_atomic::_M_val): Change type to uintptr_t.
> >       (_Sp_atomic::wait): Pass _M_ptr to _M_wait_unlock.
> >       * python/libstdcxx/v6/printers.py:
> >       * testsuite/20_util/shared_ptr/atomic/pr118757.cc: New test.
> >       * testsuite/20_util/weak_ptr/pr118757.cc: New test.
> >
> >Signed-off-by: Tomasz Kamiński <[email protected]>
> >---
> >Changes in v2:
> > * update pretty printers
> > * use __atomic_sub_fetch in _M_wait_unlock
> >
> >OK for trunk?
> >As atomic wait fixes are in v16, not planning to backport this one.
> >
> > libstdc++-v3/include/bits/shared_ptr_atomic.h | 60 +++++++++++++------
> > libstdc++-v3/python/libstdcxx/v6/printers.py  |  2 +-
> > .../20_util/shared_ptr/atomic/pr118757.cc     | 46 ++++++++++++++
> > .../testsuite/20_util/weak_ptr/pr118757.cc    | 47 +++++++++++++++
> > 4 files changed, 137 insertions(+), 18 deletions(-)
> > create mode 100644
> libstdc++-v3/testsuite/20_util/shared_ptr/atomic/pr118757.cc
> > create mode 100644 libstdc++-v3/testsuite/20_util/weak_ptr/pr118757.cc
> >
> >diff --git a/libstdc++-v3/include/bits/shared_ptr_atomic.h
> b/libstdc++-v3/include/bits/shared_ptr_atomic.h
> >index cc7841a8775..5f1e1b50514 100644
> >--- a/libstdc++-v3/include/bits/shared_ptr_atomic.h
> >+++ b/libstdc++-v3/include/bits/shared_ptr_atomic.h
> >@@ -392,6 +392,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >     class _Sp_atomic
> >     {
> >       using value_type = _Tp;
> >+      using element_type = typename _Tp::element_type;
> >
> >       friend struct atomic<_Tp>;
> >
> >@@ -420,7 +421,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >
> >       ~_Atomic_count()
> >       {
> >-        auto __val = _M_val.load(memory_order_relaxed);
> >+        auto __val = __atomic_load_n(&_M_val, int(memory_order_relaxed));
>
> Would it be cleaner to use the __atomic_impl::xxx wrappers (or just
> use std::atomic_ref) instead of the raw __atomic_xxx built-ins?
>
> You'd still need to use __atomic_wait_address directly, but everything
> else could use the nicer API, and not need to cast memory_order to
> int.
>
I used __atomic_ref that is defined in bits/atomic_base.h class.
Also added alignas(_AtomicRef::required_alignment) to _M_val.


>
> >         _GLIBCXX_TSAN_MUTEX_DESTROY(&_M_val);
> >         __glibcxx_assert(!(__val & _S_lock_bit));
> >         if (auto __pi = reinterpret_cast<pointer>(__val))
>
> Unrelated to this patch, but I wonder if we should do:
>
>    __val = __val & ~_S_lock_bit;
>
> here, so that if the destructor _does_ get run while it's locked (and
> assertions are disabled), we don't get a segfault or bus error for the
> unaligned pointer access.
>
I am not sure here if I would not prefer to get crash close to the problem
when the date race was occuring.

>
> That would mean potentially decrementing the refcount while another
> thread thinks it has it locked, which is still bad, and still a memory
> safety issue.
>
In particular this thread could make a copy of it and return it.

>
> The safest thing to do here would be to acquire the lock in the
> destructor, but that would add overhead for correct programs.
>
>
> >@@ -442,21 +443,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >       {
> >         // To acquire the lock we flip the LSB from 0 to 1.
> >
> >-        auto __current = _M_val.load(memory_order_relaxed);
> >+        auto __current = __atomic_load_n(&_M_val,
> int(memory_order_relaxed));
> >         while (__current & _S_lock_bit)
> >           {
> > #if __glibcxx_atomic_wait
> >             __detail::__thread_relax();
> > #endif
> >-            __current = _M_val.load(memory_order_relaxed);
> >+            __current = __atomic_load_n(&_M_val,
> int(memory_order_relaxed));
> >           }
> >
> >         _GLIBCXX_TSAN_MUTEX_TRY_LOCK(&_M_val);
> >
> >-        while (!_M_val.compare_exchange_strong(__current,
> >-                                               __current | _S_lock_bit,
> >-                                               __o,
> >-                                               memory_order_relaxed))
> >+        while (!__atomic_compare_exchange_n(
> >+                  &_M_val, &__current, __current | _S_lock_bit, 0,
> >+                  int(__o), int(memory_order_relaxed)))
> >           {
> >             _GLIBCXX_TSAN_MUTEX_TRY_LOCK_FAILED(&_M_val);
> > #if __glibcxx_atomic_wait
> >@@ -474,7 +474,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >       unlock(memory_order __o) const noexcept
> >       {
> >         _GLIBCXX_TSAN_MUTEX_PRE_UNLOCK(&_M_val);
> >-        _M_val.fetch_sub(1, __o);
> >+        __atomic_fetch_sub(&_M_val, 1, int(__o));
> >         _GLIBCXX_TSAN_MUTEX_POST_UNLOCK(&_M_val);
> >       }
> >
> >@@ -487,7 +487,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >           __o = memory_order_release;
> >         auto __x = reinterpret_cast<uintptr_t>(__c._M_pi);
> >         _GLIBCXX_TSAN_MUTEX_PRE_UNLOCK(&_M_val);
> >-        __x = _M_val.exchange(__x, __o);
> >+        __x = __atomic_exchange_n(&_M_val, __x, int(__o));
> >         _GLIBCXX_TSAN_MUTEX_POST_UNLOCK(&_M_val);
> >         __c._M_pi = reinterpret_cast<pointer>(__x & ~_S_lock_bit);
> >       }
> >@@ -495,19 +495,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > #if __glibcxx_atomic_wait
> >       // Precondition: caller holds lock!
> >       void
> >-      _M_wait_unlock(memory_order __o) const noexcept
> >+      _M_wait_unlock(const element_type* const& __ptr, memory_order __o)
> const noexcept
> >       {
> >+        auto __old_ptr = __ptr;
> >         _GLIBCXX_TSAN_MUTEX_PRE_UNLOCK(&_M_val);
> >-        auto __v = _M_val.fetch_sub(1, memory_order_relaxed);
> >+        auto __old_pi
> >+          = __atomic_sub_fetch(&_M_val, 1, int(memory_order_relaxed));
> >         _GLIBCXX_TSAN_MUTEX_POST_UNLOCK(&_M_val);
> >-        _M_val.wait(__v & ~_S_lock_bit, __o);
> >+
> >+        // Ensure that the correct value of _M_ptr is visible after
> locking,
> >+        // by upgrading relaxed or consume to acquire.
> >+        auto __lo = __o;
> >+        if (__o != memory_order_seq_cst)
> >+          __lo = memory_order_acquire;
>
> I noticed that if the user calls wait(memory_order::acq_rel) then this
> will downgrade it to just acquire. But wait has a precondition that
> the order is not release or acq_rel, so that's fine.
>
> >+
> >+        std::__atomic_wait_address(
> >+          &_M_val,
> >+          [=, &__ptr, this](uintptr_t __new_pi)
> >+            {
> >+              if (__old_pi != (__new_pi & ~_S_lock_bit))
> >+                // control block changed, we can wake up
> >+                return true;
> >+
> >+              // control block is same, we need to check if ptr changed,
> >+              // the lock needs to be taken first, the value of pi may
> have
> >+              // also been updated in meantime, so reload it
> >+              __new_pi = reinterpret_cast<uintptr_t>(this->lock(__lo));
> >+              auto __new_ptr = __ptr;
> >+              this->unlock(memory_order_relaxed);
> >+              // wake up if either of the values changed
> >+              return __new_pi != __old_pi || __new_ptr != __old_ptr;
> >+            },
> >+          [__o, this] { return __atomic_load_n(&_M_val, int(__o)); });
> >       }
> >
> >       void
> >       notify_one() noexcept
> >       {
> >         _GLIBCXX_TSAN_MUTEX_PRE_SIGNAL(&_M_val);
> >-        _M_val.notify_one();
> >+        std::__atomic_notify_address(&_M_val, false);
> >         _GLIBCXX_TSAN_MUTEX_POST_SIGNAL(&_M_val);
> >       }
> >
> >@@ -515,17 +541,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >       notify_all() noexcept
> >       {
> >         _GLIBCXX_TSAN_MUTEX_PRE_SIGNAL(&_M_val);
> >-        _M_val.notify_all();
> >+        std::__atomic_notify_address(&_M_val, true);
> >         _GLIBCXX_TSAN_MUTEX_POST_SIGNAL(&_M_val);
> >       }
> > #endif
> >
> >       private:
> >-      mutable __atomic_base<uintptr_t> _M_val{0};
> >+      mutable uintptr_t _M_val{0};
> >       static constexpr uintptr_t _S_lock_bit{1};
> >       };
> >
> >-      typename _Tp::element_type* _M_ptr = nullptr;
> >+      element_type* _M_ptr = nullptr;
> >       _Atomic_count _M_refcount;
> >
> >       static typename _Atomic_count::pointer
> >@@ -608,7 +634,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >       {
> >       auto __pi = _M_refcount.lock(memory_order_acquire);
> >       if (_M_ptr == __old._M_ptr && __pi == __old._M_refcount._M_pi)
> >-        _M_refcount._M_wait_unlock(__o);
> >+        _M_refcount._M_wait_unlock(_M_ptr, __o);
> >       else
> >         _M_refcount.unlock(memory_order_relaxed);
> >       }
> >diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py
> b/libstdc++-v3/python/libstdcxx/v6/printers.py
> >index 5f5963cb595..51defbce78b 100644
> >--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
> >+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
> >@@ -287,7 +287,7 @@ class SharedPointerPrinter(printer_base):
> >     def _get_refcounts(self):
> >         if self._typename == 'std::atomic':
> >             # A tagged pointer is stored as uintptr_t.
> >-            ptr_val = self._val['_M_refcount']['_M_val']['_M_i']
> >+            ptr_val = self._val['_M_refcount']['_M_val']
>
> This will make the new printer unable to debug the old version of
> std::atomic<std::shared_ptr<T>>, which can happen if somebody compiles
> a C++20 TU with GCC 15 and links it to a C++17 TU compiled with GCC
> 16, and runs the resulting executable with the libstdc++.so (and GDB
> printers) from GCC 16. That's a valid use case, which doesn't violate
> the ODR by having two ABI-incompatible versions of
> std::atomic<std::shared_ptr<T>> in the program (there's only one TU
> that uses the type, and it uses the old definition).
>
> The simplest way to handle that is probably:
>
>          if self._typename == 'std::atomic':
>              # A tagged pointer is stored as uintptr_t.
>              val = self._val['_M_refcount']['_M_val']
>              if val.type.is_scalar: # GCC 16 stores uintptr_t
>                  ptr_val = val
>              else: # GCC 12-15 stores std::atomic<uintptr_t>
>                  ptr_val = val['_M_i']
>
> That should work correctly for all versions.
>
Used snippet above.

>
> >             ptr_val = ptr_val - (ptr_val % 2)  # clear lock bit
> >             ptr_type = find_type(self._val['_M_refcount'].type,
> 'pointer')
> >             return ptr_val.cast(ptr_type)
> >diff --git a/libstdc++-v3/testsuite/20_util/shared_ptr/atomic/pr118757.cc
> b/libstdc++-v3/testsuite/20_util/shared_ptr/atomic/pr118757.cc
> >new file mode 100644
> >index 00000000000..9c3dc01255c
> >--- /dev/null
> >+++ b/libstdc++-v3/testsuite/20_util/shared_ptr/atomic/pr118757.cc
> >@@ -0,0 +1,46 @@
> >+// Copyright (C) 2014-2025 Free Software Foundation, Inc.
>
> I don't think this test is based on anything from 2014, is it?
>
> And if you're contributing under the DCO not under the Red Hat
> corporate assignment then this isn't copyright FSF anyway, so
> shouldn't have any of this text.
>
Yes, it is not necessary. Must have copy-pasted it with dg-do run,
ad other comments.

> >+// This file is part of the GNU ISO C++ Library.  This library is free
> >+// software; you can redistribute it and/or modify it under the
> >+// terms of the GNU General Public License as published by the
> >+// Free Software Foundation; either version 3, or (at your option)
> >+// any later version.
> >+
> >+// This library is distributed in the hope that it will be useful,
> >+// but WITHOUT ANY WARRANTY; without even the implied warranty of
> >+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >+// GNU General Public License for more details.
> >+
> >+// You should have received a copy of the GNU General Public License
> along
> >+// with this library; see the file COPYING3.  If not see
> >+// <http://www.gnu.org/licenses/>.
> >+
> >+// { dg-do run { target c++20 } }
> >+// { dg-require-gthreads "" }
> >+// { dg-require-effective-target hosted }
> >+
> >+#include <memory>
> >+#include <chrono>
> >+#include <thread>
> >+#include <barrier>
> >+
> >+std::shared_ptr<int> q = std::make_shared<int>(42);
> >+std::atomic<std::shared_ptr<int>> p = q;
> >+
> >+std::barrier bar(2);
> >+
> >+void signaler()
>
> "signaller" with two Ls
>
> Same comments for the weak_ptr test.
>
Fixed.

>
> >+{
> >+  std::this_thread::sleep_for(std::chrono::seconds(1));
> >+  p.store(std::shared_ptr<int>(q, nullptr));
> >+  p.notify_one();
> >+  bar.arrive_and_wait();
> >+}
> >+
> >+int main(int, char**)
> >+{
> >+  std::thread thr(signaler);
> >+  p.wait(q);
> >+  bar.arrive_and_wait();
> >+  thr.join();
> >+}
> >diff --git a/libstdc++-v3/testsuite/20_util/weak_ptr/pr118757.cc
> b/libstdc++-v3/testsuite/20_util/weak_ptr/pr118757.cc
> >new file mode 100644
> >index 00000000000..c1110241f09
> >--- /dev/null
> >+++ b/libstdc++-v3/testsuite/20_util/weak_ptr/pr118757.cc
> >@@ -0,0 +1,47 @@
> >+// Copyright (C) 2014-2025 Free Software Foundation, Inc.
> >+//
> >+// This file is part of the GNU ISO C++ Library.  This library is free
> >+// software; you can redistribute it and/or modify it under the
> >+// terms of the GNU General Public License as published by the
> >+// Free Software Foundation; either version 3, or (at your option)
> >+// any later version.
> >+
> >+// This library is distributed in the hope that it will be useful,
> >+// but WITHOUT ANY WARRANTY; without even the implied warranty of
> >+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >+// GNU General Public License for more details.
> >+
> >+// You should have received a copy of the GNU General Public License
> along
> >+// with this library; see the file COPYING3.  If not see
> >+// <http://www.gnu.org/licenses/>.
> >+
> >+// { dg-do run { target c++20 } }
> >+// { dg-require-gthreads "" }
> >+// { dg-require-effective-target hosted }
> >+
> >+#include <memory>
> >+#include <chrono>
> >+#include <thread>
> >+#include <barrier>
> >+
> >+std::shared_ptr<int> s = std::make_shared<int>(42);
> >+std::weak_ptr<int> q = s;
> >+std::atomic<std::weak_ptr<int>> p = q;
> >+
> >+std::barrier bar(2);
> >+
> >+void signaler()
> >+{
> >+  std::this_thread::sleep_for(std::chrono::seconds(1));
> >+  p.store(std::shared_ptr<int>(s, nullptr));
> >+  p.notify_one();
> >+  bar.arrive_and_wait();
> >+}
> >+
> >+int main(int, char**)
> >+{
> >+  std::thread thr(signaler);
> >+  p.wait(q);
> >+  bar.arrive_and_wait();
> >+  thr.join();
> >+}
> >--
> >2.50.1
> >
> >
>
>

Re: [PATCH v2] libstdc++: Make atomic>::wait sensitive to stored pointer only changes [PR118757]

Reply via email to