I'm finally getting round to merging this series!

On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
If std::future::wait_until is passed a time point measured against a clock
that is neither std::chrono::steady_clock nor std::chrono::system_clock
then the generic implementation of
__atomic_futex_unsigned::_M_load_when_equal_until is called which
calculates the timeout based on __clock_t and calls the
_M_load_when_equal_until method for that clock to perform the actual wait.

There's no guarantee that __clock_t is running at the same speed as the
caller's clock, so if the underlying wait times out timeout we need to
check the timeout against the caller's clock again before potentially
looping.

Also add two extra tests to the testsuite's async.cc:

* run test03 with steady_clock_copy, which behaves identically to
 std::chrono::steady_clock, but isn't std::chrono::steady_clock. This
 causes the overload of __atomic_futex_unsigned::_M_load_when_equal_until
 that takes an arbitrary clock to be called.

* invent test04 which uses a deliberately slow running clock in order to
 exercise the looping behaviour o
 __atomic_futex_unsigned::_M_load_when_equal_until described above.

        * libstdc++-v3/include/bits/atomic_futex.h:
        (__atomic_futex_unsigned) Add loop to _M_load_when_equal_until on
        generic _Clock to check the timeout against _Clock again after
        _M_load_when_equal_until returns indicating a timeout.

        * libstdc++-v3/testsuite/30_threads/async/async.cc: Invent
        slow_clock that runs at an eleventh of steady_clock's speed. Use it
        to test the user-supplied-clock variant of
        __atomic_futex_unsigned::_M_load_when_equal_until works generally
        with test03 and loops correctly when the timeout time hasn't been
        reached in test04.
---
libstdc++-v3/include/bits/atomic_futex.h         | 15 ++--
libstdc++-v3/testsuite/30_threads/async/async.cc | 70 +++++++++++++++++-
2 files changed, 80 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
index 4375129..5f95ade 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -229,11 +229,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
      _M_load_when_equal_until(unsigned __val, memory_order __mo,
          const chrono::time_point<_Clock, _Duration>& __atime)
      {
-       const typename _Clock::time_point __c_entry = _Clock::now();
-       const __clock_t::time_point __s_entry = __clock_t::now();
-       const auto __delta = __atime - __c_entry;
-       const auto __s_atime = __s_entry + __delta;
-       return _M_load_when_equal_until(__val, __mo, __s_atime);
+       typename _Clock::time_point __c_entry = _Clock::now();
+       do {
+         const __clock_t::time_point __s_entry = __clock_t::now();
+         const auto __delta = __atime - __c_entry;
+         const auto __s_atime = __s_entry + __delta;
+         if (_M_load_when_equal_until(__val, __mo, __s_atime))
+           return true;

I wonder if we can avoid looping if _Clock::is_steady is true i.e.

          if _GLIBCXX_CONSTEXPR (_Clock::is_steady)
            return false

If _Clock is steady then it can't be adjusted to move backwards. I am
not 100% sure, but I don't think a steady clock can run "slow" either.
But I'm checking with LWG about that. We should keep the loop for now.

The reason for wanting to return early is that _Clock::now() can be
quite expensive, so avoiding the second call if it's not needed would
be significant.

Related to this, I've been experimenting with a change to wait_for
that checks __rtime <= 0 and if so, uses __clock_t::time_point::min()
for the wait_until call, so that we don't bother to call
__clock_t::now(). For a non-positive duration we don't need to
determine the precise time_point in the past, because it's in the past
anyway. Using time_point::min() is as good as __clock_t::now() - rtime
(as long as we don't overflow anything when converting it to a struct
timespec, or course!)

Using future.wait_for(0s) to poll for a future becoming ready takes a
long time currently: https://wandbox.org/permlink/Z1arsDE7eHW9JrqJ
Using wait_until(C::time_point::min()) is faster, because it doesn't
call C::now().

This probably isn't important for most timed waiting functions in the
library, because I don't see any reason to use wait_for(0s) to poll a
mutex, condition_variable or atomic. But it's reasonable to do for
futures.

I already added (__rtime <= __rtime.zero()) to this_thread::sleep_for
in d1a74a287ee1a84b90e5675904dac7f945cffca1.

The extra branch to check rtime <= rtime.zero() or atime < C::now() is
probably insignificant compared to the cost of unnecessary calls to
now() on one or more clocks.

+         __c_entry = _Clock::now();
+       } while (__c_entry < __atime);
+       return false;
      }


Reply via email to