Re: [PATCH] libstdc++: Future-proof C++20 atomic wait/notify

Tomasz Kaminski Mon, 17 Nov 2025 04:14:17 -0800

On Mon, Nov 17, 2025 at 10:10 AM Tomasz Kaminski <[email protected]>
wrote:


>
>
> On Sun, Nov 16, 2025 at 1:56 AM Jonathan Wakely <[email protected]>
> wrote:
>
>> This will allow us to extend atomic waiting functions to support a
>> possible future 64-bit version of futex, as well as supporting
>> futex-like wait/wake primitives on other targets (e.g. macOS has
>> os_sync_wait_on_address and FreeBSD has _umtx_op).
>>
>> Before this change, the decision of whether to do a proxy wait or to
>> wait on the atomic variable itself was made in the header at
>> compile-time, which makes it an ABI property that would not have been
>> possible to change later.  That would have meant that
>> std::atomic<uint64_t> would always have to do a proxy wait even if Linux
>> gains support for 64-bit futex2(2) calls at some point in the future.
>> The disadvantage of proxy waits is that several distinct atomic objects
>> can share the same proxy state, leading to contention between threads
>> even when they are not waiting on the same atomic object, similar to
>> false sharing. It also result in spurious wake-ups because doing a
>> notify on an atomic object that uses a proxy wait will wake all waiters
>> sharing the proxy.
>>
>> For types that are known to definitely not need a proxy wait (e.g. int
>> on Linux) the header can still choose a more efficient path at
>> compile-time. But for other types, the decision of whether to do a proxy
>> wait is deferred to runtime, inside the library internals. This will
>> make it possible for future versions of libstdc++.so to extend the set
>> of types which don't need to use proxy waits, without ABI changes.
>>
>> The way the change works is to stop using the __proxy_wait flag that was
>> set by the inline code in the headers. Instead the __wait_args struct
>> has an extra pointer member which the library internals populate with
>> either the address of the atomic object or the _M_ver counter in the
>> proxy state. There is also a new _M_obj_size member which stores the
>> size of the atomic object, so that the library can decide whether a
>> proxy is needed. So for example if linux gains 64-bit futex support then
>> the library can decide not to use a proxy when _M_obj_size == 8.
>> Finally, the _M_old member of the __wait_args struct is changed to
>> uint64_t so that it has room to store 64-bit values, not just whatever
>> size the __platform_wait_t type is (which is a 32-bit int on Linux).
>> Similarly, the _M_val member of __wait_result_type changes to uint64_t
>> too.
>>
>> libstdc++-v3/ChangeLog:
>>
>>         * config/abi/pre/gnu.ver: Adjust exports.
>>         * include/bits/atomic_timed_wait.h
>> (_GLIBCXX_HAVE_PLATFORM_TIMED_WAIT):
>>         Do not define this macro.
>>         (__atomic_wait_address_until_v, __atomic_wait_address_for_v):
>>         Guard assertions with #ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT.
>>         * include/bits/atomic_wait.h (__platform_wait_uses_type):
>>         Different separately for platforms with and without platform
>>         wait.
>>         (_GLIBCXX_HAVE_PLATFORM_WAIT): Do not define this macro.
>>         (_GLIBCXX_UNKNOWN_PLATFORM_WAIT): Define new macro.
>>         (__wait_value_type): New typedef.
>>         (__wait_result_type): Change _M_val to __wait_value_type.
>>         (__wait_args_base::_M_old): Change to __wait_args_base.
>>         (__wait_args_base::_M_obg, __wait_args_base::_M_obj_size): New
>>         data members.
>>         (__wait_args::__wait_args): Set _M_obj and _M_obj_size on
>>         construction.
>>         (__wait_args::_M_setup_wait): Change void* parameter to deduced
>>         type. Use _S_bit_cast instead of __builtin_bit_cast.
>>         (__wait_args::_M_load_proxy_wait_val): Remove function, replace
>>         with ...
>>         (__wait_args::_M_setup_wait_impl): New function.
>>         (__wait_args::_S_bit_cast): Wrapper for __builtin_bit_cast which
>>         also supports conversion from 32-bit values.
>>         (__wait_args::_S_flags_for): Do not set __proxy_wait flag.
>>         (__atomic_wait_address_v): Guard assertions with #ifdef
>>         _GLIBCXX_UNKNOWN_PLATFORM_WAIT.
>>         * src/c++20/atomic.cc (_GLIBCXX_HAVE_PLATFORM_WAIT): Define here
>>         instead of in header. Check _GLIBCXX_HAVE_PLATFORM_WAIT instead
>>         of _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT.
>>         (__spin_impl): Adjust for 64-bit __wait_args_base::_M_old.
>>         (use_proxy_wait): New function.
>>         (__wait_args::_M_load_proxy_wait_val): Replace with ...
>>         (__wait_args::_M_setup_wait_impl): New function. Call
>>         use_proxy_wait to decide at runtime whether to wait on the
>>         pointer directly instead of using a proxy. If a proxy is needed,
>>         set _M_obj to point to its _M_ver member. Adjust for change to
>>         type of _M_old.
>>         (__wait_impl): Wait on _M_obj unconditionally.
>>         (__notify_impl): Call use_proxy_wait to decide whether to notify
>>         on the address parameter or a proxy
>>         (__spin_until_impl): Adjust for change to type of _M_val.
>>         (__wait_until_impl): Wait on _M_obj unconditionally.
>> ---
>>
>> Tested x86_64-linux, powerpc64le-linux, sparc-solaris.
>>
> A lot of comments below.
>
>>
>> I think this is an imporant change which I unfortunately didn't think of
>> until recently.
>>
>> This changes the exports from the shared library, but we're still in
>> stage 1 so I think that should be allowed (albeit unfortunate). Nobody
>> should be expecting GCC 16 to be stable yet.
>>
>> The __proxy_wait enumerator is now unused and could be removed. The
>> __abi_version enumerator could also be bumped to indicate the
>> incompatibility with earlier snapshots of GCC 16, but I don't think that
>> is needed. We could in theory keep the old symbol export
>> (__wait_args::_M_load_proxy_wait) and make it trap/abort if called, but
>> I'd prefer to just remove it and cause dynamic linker errors instead.
>>
>> There's a TODO in the header about which types should be allowed to take
>> the optimized paths (see the __waitable concept). For types where that's
>> true, if the size matches a futex then we'll use a futex, even if it's
>> actually an enum or floating-point type (or pointer on 32-bit targets).
>> I'm not sure if that's safe.
>>
>>
>>  libstdc++-v3/config/abi/pre/gnu.ver           |   3 +-
>>  libstdc++-v3/include/bits/atomic_timed_wait.h |  12 +-
>>  libstdc++-v3/include/bits/atomic_wait.h       | 109 +++++++++-----
>>  libstdc++-v3/src/c++20/atomic.cc              | 140 +++++++++++-------
>>  4 files changed, 166 insertions(+), 98 deletions(-)
>>
>> diff --git a/libstdc++-v3/config/abi/pre/gnu.ver
>> b/libstdc++-v3/config/abi/pre/gnu.ver
>> index 2e48241d51f9..3c2bd4921730 100644
>> --- a/libstdc++-v3/config/abi/pre/gnu.ver
>> +++ b/libstdc++-v3/config/abi/pre/gnu.ver
>> @@ -2553,7 +2553,8 @@ GLIBCXX_3.4.35 {
>>      _ZNSt8__detail11__wait_implEPKvRNS_16__wait_args_baseE;
>>      _ZNSt8__detail13__notify_implEPKvbRKNS_16__wait_args_baseE;
>>
>>  
>> _ZNSt8__detail17__wait_until_implEPKvRNS_16__wait_args_baseERKNSt6chrono8durationI[lx]St5ratioIL[lx]1EL[lx]1000000000EEEE;
>> -    _ZNSt8__detail11__wait_args22_M_load_proxy_wait_valEPKv;
>> +    _ZNSt8__detail11__wait_args18_M_setup_wait_implEPKv;
>> +    _ZNSt8__detail11__wait_args20_M_setup_notify_implEPKv;
>>
>>      # std::chrono::gps_clock::now, tai_clock::now
>>      _ZNSt6chrono9gps_clock3nowEv;
>> diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h
>> b/libstdc++-v3/include/bits/atomic_timed_wait.h
>> index 30f7ff616840..918a267d10eb 100644
>> --- a/libstdc++-v3/include/bits/atomic_timed_wait.h
>> +++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
>> @@ -75,14 +75,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>           return chrono::ceil<__w_dur>(__atime);
>>        }
>>
>> -#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
>> -#define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>> -#else
>> -// define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT and implement
>> __platform_wait_until
>> -// if there is a more efficient primitive supported by the platform
>> -// (e.g. __ulock_wait) which is better than pthread_cond_clockwait.
>> -#endif // ! HAVE_LINUX_FUTEX
>> -
>>      __wait_result_type
>>      __wait_until_impl(const void* __addr, __wait_args_base& __args,
>>                       const __wait_clock_t::duration& __atime);
>> @@ -156,7 +148,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>                                   const chrono::time_point<_Clock, _Dur>&
>> __atime,
>>                                   bool __bare_wait = false) noexcept
>>      {
>> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>> +#ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT
>>        __glibcxx_assert(false); // This function can't be used for proxy
>> wait.
>>  #endif
>>        __detail::__wait_args __args{ __addr, __old, __order, __bare_wait
>> };
>> @@ -208,7 +200,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>                                 const chrono::duration<_Rep, _Period>&
>> __rtime,
>>                                 bool __bare_wait = false) noexcept
>>      {
>> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>> +#ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT
>>
> This name really reads strange, and sounds like something with "TODO".
> I think  _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT was just OK name, even
> if it was not used directly.
>
>>        __glibcxx_assert(false); // This function can't be used for proxy
>> wait.
>>  #endif
>>        __detail::__wait_args __args{ __addr, __old, __order, __bare_wait
>> };
>> diff --git a/libstdc++-v3/include/bits/atomic_wait.h
>> b/libstdc++-v3/include/bits/atomic_wait.h
>> index 95151479c120..49369419d6a6 100644
>> --- a/libstdc++-v3/include/bits/atomic_wait.h
>> +++ b/libstdc++-v3/include/bits/atomic_wait.h
>> @@ -45,35 +45,34 @@
>>  namespace std _GLIBCXX_VISIBILITY(default)
>>  {
>>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> +#if defined _GLIBCXX_HAVE_LINUX_FUTEX
>>    namespace __detail
>>    {
>> -#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
>> -#define _GLIBCXX_HAVE_PLATFORM_WAIT 1
>>      using __platform_wait_t = int;
>>      inline constexpr size_t __platform_wait_alignment = 4;
>> +  }
>> +  template<typename _Tp>
>> +    inline constexpr bool __platform_wait_uses_type
>> +      = is_scalar_v<_Tp> && sizeof(_Tp) == sizeof(int) && alignof(_Tp)
>> >= 4;
>>  #else
>> +# define _GLIBCXX_UNKNOWN_PLATFORM_WAIT 1
>>  // define _GLIBCX_HAVE_PLATFORM_WAIT and implement __platform_wait()
>>  // and __platform_notify() if there is a more efficient primitive
>> supported
>>  // by the platform (e.g. __ulock_wait()/__ulock_wake()) which is better
>> than
>>  // a mutex/condvar based wait.
>> +  namespace __detail
>> +  {
>>  # if ATOMIC_LONG_LOCK_FREE == 2
>>      using __platform_wait_t = unsigned long;
>>  # else
>>      using __platform_wait_t = unsigned int;
>>  # endif
>>      inline constexpr size_t __platform_wait_alignment
>> -      = __alignof__(__platform_wait_t);
>> -#endif
>> +      = sizeof(__platform_wait_t) < __alignof__(__platform_wait_t)
>> +         ? __alignof__(__platform_wait_t) : sizeof(__platform_wait_t);
>>    } // namespace __detail
>> -
>> -  template<typename _Tp>
>> -    inline constexpr bool __platform_wait_uses_type
>> -#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
>> -      = is_scalar_v<_Tp>
>> -       && ((sizeof(_Tp) == sizeof(__detail::__platform_wait_t))
>> -       && (alignof(_Tp) >= __detail::__platform_wait_alignment));
>> -#else
>> -      = false;
>> +  template<typename>
>> +    inline constexpr bool __platform_wait_uses_type = false;
>>  #endif
>>
>>    namespace __detail
>> @@ -105,10 +104,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>         return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) == 0;
>>        }
>>
>> -    // lightweight std::optional<__platform_wait_t>
>> +    // TODO: this needs to be false for types with padding, e.g. __int20.
>>
> I do not understand why this needs to be required. This funciton is used
> only via atomic
> or atomic_ref. For atomic, we can guarantee that the type has padding
> bytes cleared..
>
>> +    // TODO: should this be true only for integral, enum, and pointer
>> types?
>>
> What I think is missing here is alignment. I assume that any platform wait
> may use
> bits that are clear due the alignment of platform wait type for some
> internal state.
> Or we are going to check is_sufficiently_aligment in cc file, and use
> different kind of
> wait depending on the object?
>
> But I think, we can later safely extend or change what is waitable (except
> extending it past 8 bytes),
> as if we start putting _M_obj_size to non zero, impl may use platform wait.
>
> So, I will go with safe option is integral, pointer or enum, This would
> also give
> us no padding guarantee, I assume?
>
>> +    template<typename _Tp>
>> +      concept __waitable
>> +       = is_scalar_v<_Tp> && (sizeof(_Tp) <= sizeof(__UINT64_TYPE__));
>> +
>> +    // Storage for up to 64 bits of value, should be considered opaque
>> bits.
>> +    using __wait_value_type = __UINT64_TYPE__;
>> +
>> +    // lightweight std::optional<__wait_value_type>
>>      struct __wait_result_type
>>      {
>> -      __platform_wait_t _M_val;
>> +      __wait_value_type _M_val;
>>        unsigned char _M_has_val : 1; // _M_val value was loaded before
>> return.
>>        unsigned char _M_timeout : 1; // Waiting function ended with
>> timeout.
>>        unsigned char _M_unused : 6;  // padding
>> @@ -143,8 +151,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>      {
>>        __wait_flags _M_flags;
>>        int _M_order = __ATOMIC_ACQUIRE;
>> -      __platform_wait_t _M_old = 0;
>> +      __wait_value_type _M_old{};
>>        void* _M_wait_state = nullptr;
>> +      const void* _M_obj = nullptr;  // The address of the object to
>> wait on.
>> +      unsigned char _M_obj_size = 0; // The size of that object.
>>
>>        // Test whether _M_flags & __flags is non-zero.
>>        bool
>> @@ -162,36 +172,48 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>         explicit
>>         __wait_args(const _Tp* __addr, bool __bare_wait = false) noexcept
>>         : __wait_args_base{ _S_flags_for(__addr, __bare_wait) }
>> -       { }
>> +       {
>> +         _M_obj = __addr; // Might be replaced by _M_setup_wait
>> +         if constexpr (__waitable<_Tp>)
>> +           // __wait_impl might be able to wait directly on __addr
>> +           // instead of using a proxy, depending on its size.
>> +           _M_obj_size = sizeof(_Tp);
>> +       }
>>
>>        __wait_args(const __platform_wait_t* __addr, __platform_wait_t
>> __old,
>>                   int __order, bool __bare_wait = false) noexcept
>> -      : __wait_args_base{ _S_flags_for(__addr, __bare_wait), __order,
>> __old }
>> -      { }
>> +      : __wait_args(__addr, __bare_wait)
>> +      {
>> +       _M_order = __order;
>> +       _M_old = __old;
>> +      }
>>
>>        __wait_args(const __wait_args&) noexcept = default;
>>        __wait_args& operator=(const __wait_args&) noexcept = default;
>>
>> -      template<typename _ValFn,
>> -              typename _Tp =
>> decay_t<decltype(std::declval<_ValFn&>()())>>
>> +      template<typename _Tp, typename _ValFn>
>>         _Tp
>> -       _M_setup_wait(const void* __addr, _ValFn __vfn,
>> +       _M_setup_wait(const _Tp* __addr, _ValFn __vfn,
>>                       __wait_result_type __res = {})
>>         {
>> +         static_assert(is_same_v<_Tp, decay_t<decltype(__vfn())>>);
>> +
>>           if constexpr (__platform_wait_uses_type<_Tp>)
>>             {
>> -             // If the wait is not proxied, the value we check when
>> waiting
>> -             // is the value of the atomic variable itself.
>> +             // If we know for certain that this type can be waited on
>> +             // efficiently using something like a futex syscall,
>> +             // then we can avoid the overhead of _M_setup_wait_impl
>> +             // and just load the value and store it into _M_old.
>>
>> -             if (__res._M_has_val) // The previous wait loaded a recent
>> value.
>> +             if (__res._M_has_val) // A previous wait loaded a recent
>> value.
>>                 {
>>                   _M_old = __res._M_val;
>> -                 return __builtin_bit_cast(_Tp, __res._M_val);
>> +                 return _S_bit_cast<_Tp>(_M_old);
>>
> I am not sure if I understand which branch of _S_bit_cast this would use,
> we have neither (sizeof(To) == sizeof(From) i.e. 4 vs 8) and
> neither sizeof(_From) == sizeof(__UINT32_TYPE__).
> I would much more prefer if this would be dome as:
> return __builtin_bit_cast(_Tp, static_cast<__UINT32_TYPE__>(_M_old));
>
>>                 }
>>               else // Load the value from __vfn
>>                 {
>> -                 _Tp __val = __vfn();
>> -                 _M_old = __builtin_bit_cast(__platform_wait_t, __val);
>> +                 auto __val = __vfn();
>> +                 _M_old = _S_bit_cast<__wait_value_type>(__val);
>>
> And here:
>       _M_ old = __builtin_bit_cast(__UINT32_TYPE__, __val);
> However, instead of __UINT32_TYPE__, we should use
> make_unsinged<__platform_wait_t>.
>
>>                   return __val;
>>                 }
>>             }
>> @@ -199,16 +221,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>             {
>>               if (__res._M_has_val) // The previous wait loaded a recent
>> value.
>>                 _M_old = __res._M_val;
>> -             else // Load _M_ver from the proxy (must happen before
>> __vfn()).
>> -               _M_load_proxy_wait_val(__addr);
>> +             else // Let the library decide how to setup the wait.
>> +               {
>> +                 // Set _M_obj to the address to be waited on (either
>> __addr
>> +                 // or a proxy) and load its current value into _M_old.
>> +                 _M_setup_wait_impl(__addr);
>> +               }
>>               return __vfn();
>>             }
>>         }
>>
>>      private:
>> -      // Populates _M_wait_state and _M_old from the proxy for __addr.
>> +      template<typename _To, typename _From>
>> +       [[__gnu__::__always_inline__]]
>> +       static _To
>> +       _S_bit_cast(_From __from)
>> +       {
>> +         if constexpr (sizeof(_To) == sizeof(_From))
>> +           return __builtin_bit_cast(_To, __from);
>> +         else if constexpr (sizeof(_From) == sizeof(__UINT32_TYPE__))
>> +           return __builtin_bit_cast(__UINT32_TYPE__, __from);
>> +         else
>> +           __builtin_unreachable();
>> +       }
>> +
>> +      // Populates _M_wait_state and _M_old appropriately for __addr.
>>        void
>> -      _M_load_proxy_wait_val(const void* __addr);
>> +      _M_setup_wait_impl(const void* __addr);
>>
>>        template<typename _Tp>
>>         static constexpr __wait_flags
>> @@ -218,8 +257,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>           __wait_flags __res = __abi_version | __do_spin;
>>           if (!__bare_wait)
>>             __res |= __track_contention;
>> -         if constexpr (!__platform_wait_uses_type<_Tp>)
>> -           __res |= __proxy_wait;
>> +         // if constexpr (!__platform_wait_uses_type<_Tp>)
>> +         // __res |= __proxy_wait;
>>
> Remove this simply.
>
>>           return __res;
>>         }
>>      };
>> @@ -255,7 +294,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>                           __detail::__platform_wait_t __old,
>>                           int __order, bool __bare_wait = false)
>>    {
>> -#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
>> +#ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT
>>      __glibcxx_assert(false); // This function can't be used for proxy
>> wait.
>>  #endif
>>      __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
>> diff --git a/libstdc++-v3/src/c++20/atomic.cc
>> b/libstdc++-v3/src/c++20/atomic.cc
>> index e280045b619d..d5fbd10ed11f 100644
>> --- a/libstdc++-v3/src/c++20/atomic.cc
>> +++ b/libstdc++-v3/src/c++20/atomic.cc
>> @@ -38,14 +38,7 @@
>>  # include <syscall.h>
>>  # include <bits/functexcept.h>
>>  # include <sys/time.h>
>> -#endif
>> -
>> -#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
>> -# ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>> -// __waitable_state assumes that we consistently use the same
>> implementation
>> -// (i.e. futex vs mutex+condvar) for timed and untimed waiting.
>> -#  error "This configuration is not currently supported"
>> -# endif
>> +# define _GLIBCXX_HAVE_PLATFORM_WAIT 1
>>  #endif
>>
>>  #pragma GCC diagnostic ignored "-Wmissing-field-initializers"
>> @@ -107,7 +100,7 @@ namespace
>>      // Count of threads blocked waiting on this state.
>>      alignas(_S_align) __platform_wait_t _M_waiters = 0;
>>
>> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>> +#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
>>      mutex _M_mtx;
>>
>>      // This type meets the Cpp17BasicLockable requirements.
>> @@ -123,7 +116,7 @@ namespace
>>      // use this for waiting and notifying functions instead.
>>      alignas(_S_align) __platform_wait_t _M_ver = 0;
>>
>> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>> +#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
>>      __condvar _M_cv;
>>  #endif
>>
>> @@ -216,17 +209,19 @@ namespace
>>    __spin_impl(const __platform_wait_t* __addr, const __wait_args_base&
>> __args)
>>    {
>>      __platform_wait_t __val{};
>> +    __wait_value_type wval;
>>      for (auto __i = 0; __i < __atomic_spin_count; ++__i)
>>        {
>>         __atomic_load(__addr, &__val, __args._M_order);
>> -       if (__val != __args._M_old)
>> -         return { ._M_val = __val, ._M_has_val = true, ._M_timeout =
>> false };
>> +       wval = static_cast<__wait_value_type>(__val);
>> +       if (wval != __args._M_old)
>> +         return { ._M_val = wval, ._M_has_val = true, ._M_timeout =
>> false };
>>         if (__i < __atomic_spin_count_relax)
>>           __thread_relax();
>>         else
>>           __thread_yield();
>>        }
>> -    return { ._M_val = __val, ._M_has_val = true, ._M_timeout = true };
>> +    return { ._M_val = wval, ._M_has_val = true, ._M_timeout = true };
>>    }
>>
>>    inline __waitable_state*
>> @@ -237,32 +232,58 @@ namespace
>>      return static_cast<__waitable_state*>(args._M_wait_state);
>>    }
>>
>> +  [[gnu::always_inline]]
>> +  inline bool
>> +  use_proxy_wait(const __wait_args_base& args)
>>
> This needs to take the address that we are waiting on and check if it's
> sufficiently aligned
> to __platform_wait_aligment.
> However, I do not think we should check unit64_t at all, as we do not have
> such platform
> wait yet, and we will not change __platform_wait_uses_type in that case.
>>
>> +  {
>> +    if constexpr (__platform_wait_uses_type<uint32_t>)
>> +      if (args._M_obj_size == sizeof(uint32_t))
>> +       return false;
>> +
>> +    if constexpr (__platform_wait_uses_type<uint64_t>)
>> +      if (args._M_obj_size == sizeof(uint64_t))
>> +       return false;
>> +
>> +    // Use proxy wait for everything else:
>> +    return true;
>> +  }
>> +
>>  } // namespace
>>
>> -// Called for a proxy wait
>> +// Set _M_wait_state if using proxy wait, or caller wants contention
>> tracking.
>> +// Set _M_obj to &_M_wait_state->_M_ver if using proxy wait.
>> +// Load the current value from _M_obj and store in _M_val.
>>  void
>> -__wait_args::_M_load_proxy_wait_val(const void* addr)
>> +__wait_args::_M_setup_wait_impl(const void* addr)
>>
> I would use _M_obj_size to indicate if proxy wait is used for the actual
> await.
>
This would not work well with notify, so I think it is fine as it, except
that we need
to check aligment.

>  {
>> -  // __glibcxx_assert( *this & __wait_flags::__proxy_wait );
>> +  if (!use_proxy_wait(*this))
>
> Do; _M_obj_size > 0 &&  !use_proxy_wait(*this)
>
>> +    {
>> +      // We can wait on this address directly.
>> +      __glibcxx_assert(_M_obj == addr);
>>
> This funciton is one that is expected to set _M_obj, so it seems incorrect
> to assert it here.
>
I missed the fact that this is set in the constructor.

>
>> -  // We always need a waitable state for proxy waits.
>> +      int val;
>> +      __atomic_load(static_cast<const int*>(addr), &val,
>> __ATOMIC_ACQUIRE);
>> +      _M_old = val;
>> +
>> +      return;
>> +    }
>> +
>> +  // This will be a proxy wait, so get a waitable state.
>>    auto state = set_wait_state(addr, *this);
>>
>> +  // The address we will wait on is the version count of the waitable
>> state:
>> +  _M_obj = &state->_M_ver;
>>
> +
>>    // Read the value of the _M_ver counter.
>> -  __atomic_load(&state->_M_ver, &_M_old, __ATOMIC_ACQUIRE);
>> +  __platform_wait_t __old;
>> +  __atomic_load(&state->_M_ver, &__old, __ATOMIC_ACQUIRE);
>>
> Nit pick, but I think this could be just:
>     __platform_wait_t __old = __atomic_load_n(&state->_M_ver,
> __ATOMIC_ACQUIRE);
>
>> +  _M_old = __old;
>>
> Or just:
>   _M_old = __atomic_load_n(&state->_M_ver, __ATOMIC_ACQUIRE);
>
>>  }
>>
>>  __wait_result_type
>> -__wait_impl(const void* __addr, __wait_args_base& __args)
>> +__wait_impl([[maybe_unused]] const void* __addr, __wait_args_base&
>> __args)
>>  {
>> -  auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
>> -
>> -  const __platform_wait_t* __wait_addr;
>> -
>> -  if (__args & __wait_flags::__proxy_wait)
>> -    __wait_addr = &__state->_M_ver;
>> -  else
>> -    __wait_addr = static_cast<const __platform_wait_t*>(__addr);
>> +  auto* __wait_addr = static_cast<const
>> __platform_wait_t*>(__args._M_obj);
>>
>>    if (__args & __wait_flags::__do_spin)
>>      {
>> @@ -286,6 +307,7 @@ __wait_impl(const void* __addr, __wait_args_base&
>> __args)
>>    __atomic_load(__wait_addr, &__val, __args._M_order);
>>    if (__val == __args._M_old)
>>      {
>> +      auto __state =
>> static_cast<__waitable_state*>(__args._M_wait_state);
>>        __state->_M_cv.wait(__state->_M_mtx);
>>        return { ._M_val = __val, ._M_has_val = false, ._M_timeout = false
>> };
>>      }
>> @@ -294,24 +316,40 @@ __wait_impl(const void* __addr, __wait_args_base&
>> __args)
>>  }
>>
>>  void
>> -__notify_impl(const void* __addr, [[maybe_unused]] bool __all,
>> +__notify_impl([[maybe_unused]] const void* __addr, [[maybe_unused]] bool
>> __all,
>>               const __wait_args_base& __args)
>>  {
>> -  auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
>> -  if (!__state)
>> -    __state = &__waitable_state::_S_state_for(__addr);
>> +  const bool __track_contention = __args &
>> __wait_flags::__track_contention;
>> +  const bool proxy_wait = use_proxy_wait(__args);
>>
> And here we could just check if _M_obj_size == 0, as this would indication
> of proxy wait.
>
>>
>> -  [[maybe_unused]] const __platform_wait_t* __wait_addr;
>> +  [[maybe_unused]] auto* __wait_addr
>> +    = static_cast<const __platform_wait_t*>(__addr);
>> +
>> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
>> +  // Check whether it would be a non-proxy wait for this object.
>> +  // This condition must match the one in _M_setup_wait_impl to ensure
>> that
>> +  // the address used for the notify matches the one used for the wait.
>> +  if (!proxy_wait)
>> +    {
>> +      if (__track_contention)
>> +       if (!__waitable_state::_S_state_for(__addr)._M_waiting())
>> +         return;
>> +
>> +      __platform_notify(__wait_addr, __all);
>> +      return;
>> +    }
>> +#endif
>> +
>> +  // Either a proxy wait or we don't have platform wait/wake primitives.
>> +
>> +  auto __state = &__waitable_state::_S_state_for(__addr);
>>
>>    // Lock mutex so that proxied waiters cannot race with incrementing
>> _M_ver
>>    // and see the old value, then sleep after the increment and
>> notify_all().
>>    lock_guard __l{ *__state };
>>
>> -  if (__args & __wait_flags::__proxy_wait)
>> +  if (proxy_wait)
>>      {
>> -      // Waiting for *__addr is actually done on the proxy's _M_ver.
>> -      __wait_addr = &__state->_M_ver;
>> -
>>        // Increment _M_ver so that waiting threads see something changed.
>>        // This has to be atomic because the load in _M_load_proxy_wait_val
>>        // is done without the mutex locked.
>> @@ -322,11 +360,11 @@ __notify_impl(const void* __addr, [[maybe_unused]]
>> bool __all,
>>        // they can re-evaluate their conditions to see if they should
>>        // stop waiting or should wait again.
>>        __all = true;
>> -    }
>> -  else // Use the atomic variable's own address.
>> -    __wait_addr = static_cast<const __platform_wait_t*>(__addr);
>>
>> -  if (__args & __wait_flags::__track_contention)
>> +      __wait_addr = &__state->_M_ver;
>> +    }
>> +
>> +  if (__track_contention)
>>      {
>>        if (!__state->_M_waiting())
>>         return;
>> @@ -366,7 +404,7 @@ __platform_wait_until(const __platform_wait_t* __addr,
>>  }
>>  #endif // HAVE_LINUX_FUTEX
>>
>> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>> +#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
>>  bool
>>  __cond_wait_until(__condvar& __cv, mutex& __mx,
>>                   const __wait_clock_t::time_point& __atime)
>> @@ -381,7 +419,7 @@ __cond_wait_until(__condvar& __cv, mutex& __mx,
>>      __cv.wait_until(__mx, __ts);
>>    return __wait_clock_t::now() < __atime;
>>  }
>> -#endif // ! HAVE_PLATFORM_TIMED_WAIT
>> +#endif // ! HAVE_PLATFORM_WAIT
>>
>>  // Unlike __spin_impl, does not always return _M_has_val == true.
>>  // If the deadline has already passed then no fresh value is loaded.
>> @@ -414,7 +452,9 @@ __spin_until_impl(const __platform_wait_t* __addr,
>>             return __res;
>>         }
>>
>> -      __atomic_load(__addr, &__res._M_val, __args._M_order);
>> +      __platform_wait_t val;
>> +      __atomic_load(__addr, &val, __args._M_order);
>>
> Same comment on __atomic_load_n here.
>
>> +      __res._M_val = val;
>>        __res._M_has_val = true;
>>        if (__res._M_val != __args._M_old)
>>         {
>> @@ -428,16 +468,11 @@ __spin_until_impl(const __platform_wait_t* __addr,
>>  } // namespace
>>
>>  __wait_result_type
>> -__wait_until_impl(const void* __addr, __wait_args_base& __args,
>> +__wait_until_impl([[maybe_unused]] const void* __addr, __wait_args_base&
>> __args,
>>                   const __wait_clock_t::duration& __time)
>>  {
>>    const __wait_clock_t::time_point __atime(__time);
>> -  auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
>> -  const __platform_wait_t* __wait_addr;
>> -  if (__args & __wait_flags::__proxy_wait)
>> -    __wait_addr = &__state->_M_ver;
>> -  else
>> -    __wait_addr = static_cast<const __platform_wait_t*>(__addr);
>> +  auto* __wait_addr = static_cast<const
>> __platform_wait_t*>(__args._M_obj);
>>
>>    if (__args & __wait_flags::__do_spin)
>>      {
>> @@ -448,9 +483,9 @@ __wait_until_impl(const void* __addr,
>> __wait_args_base& __args,
>>         return __res;
>>      }
>>
>> -#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
>>    if (__args & __wait_flags::__track_contention)
>> -    set_wait_state(__addr, __args);
>> +    set_wait_state(__addr, __args); // scoped_wait needs a
>> __waitable_state
>>    scoped_wait s(__args);
>>    bool timeout = !__platform_wait_until(__wait_addr, __args._M_old,
>> __atime);
>>    return { ._M_val = __args._M_old, ._M_has_val = false, ._M_timeout =
>> timeout };
>> @@ -460,6 +495,7 @@ __wait_until_impl(const void* __addr,
>> __wait_args_base& __args,
>>    __atomic_load(__wait_addr, &__val, __args._M_order);
>>    if (__val == __args._M_old)
>>      {
>> +      auto __state =
>> static_cast<__waitable_state*>(__args._M_wait_state);
>>        bool timeout = !__cond_wait_until(__state->_M_cv, __state->_M_mtx,
>> __atime);
>>        return { ._M_val = __val, ._M_has_val = false, ._M_timeout =
>> timeout };
>>      }
>> --
>> 2.51.1
>>
>>

Re: [PATCH] libstdc++: Future-proof C++20 atomic wait/notify

Reply via email to