Re: [PATCH] libstdc++: optimize bit iterators assuming normalization [PR110807]

Jonathan Wakely Wed, 08 Nov 2023 11:32:54 -0800

On 08/11/23 13:10 -0300, Alexandre Oliva wrote:

The representation of bit iterators, using a pointer into an array of
words, and an unsigned bit offset into that word, makes for some
optimization challenges: because the compiler doesn't know that the
offset is always in a certain narrow range, beginning at zero and
ending before the word bitwidth, when a function loads an offset that
it hasn't normalized itself, it may fail to derive certain reasonable
conclusions, even to the point of retaining useless calls that elicit
incorrect warnings.


Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
list to a global bit vector variable.  Based on the compile-time
constant length of the list, we decide in _M_insert_range whether to
use the existing storage or to allocate new storage for the vector.
After allocation, we decide in _M_copy_aligned how to copy any
preexisting portions of the vector to the newly-allocated storage.
When copying two or more words, we use __builtin_memmove.

However, because we compute the available room using bit offsets
without range information, even comparing them with constants, we fail
to infer ranges for the preexisting vector depending on word size, and
may thus retain the memmove call despite knowing we've only allocated
one word.

Other parts of the compiler then detect the mismatch between the
constant allocation size and the much larger range that could
theoretically be copied into the newly-allocated storage if we could
reach the call.

Ensuring the compiler is aware of the constraints on the offset range
enables it to do a much better job at optimizing.  The challenge is to
do so without runtime overhead, because this is not about checking
that it's in range, it's only about telling the compiler about it.

This patch introduces a __GLIBCXX_BUILTIN_ASSUME macro that, when
optimizing, expands to code that invokes undefined behavior in case
the expression doesn't hold, so that the compiler optimizes out the
test and the entire branch containing, but retaining enough
information about the paths that shouldn't be taken, so that at
remaining paths it optimizes based on the assumption.

I also introduce a member function in bit iterators that conveys to
the compiler the information that the assumption is supposed to hold,
and various calls throughout member functions of bit iterators that
might not otherwise know that the offsets have to be in range,
making pessimistic decisions and failing to optimize out cases that it
could.

With the explicit assumptions, the compiler can correlate the test for
available storage in the vector with the test for how much storage
might need to be copied, and determine that, if we're not asking for
enough room for two or more words, we can omit entirely the code to
copy two or more words, without any runtime overhead whatsoever: no
traces remain of the undefined behavior or of the tests that inform
the compiler about the assumptions that must hold.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
x86_64-.  Ok to install?

(It was later found to fix 23_containers/vector/bool/allocator/copy_cc
on x86_64-linux-gnu as well, that fails on gcc-13 with the same warning.)

(The constant_evaluated/static_assert bit is disabled because expr is
not a constant according to some libstdc++ build errors, but there
doesn't seem to be a problem with the other bits.  I haven't really
thought that bit through, it was something I started out as potentially
desirable, but that turned out to be not required.  I suppose I could
just drop it.)

(I suppose __GLIBCXX_BUILTIN_ASSUME could be moved to a more general
place and put to more general uses, but I didn't feel that bold ;-)


for  libstdc++-v3/ChangeLog

        PR libstdc++/110807
        * include/bits/stl_bvector.h (__GLIBCXX_BUILTIN_ASSUME): New.
        (_Bit_iterator_base): Add _M_normalized_p and
        _M_assume_normalized.  Use them in _M_bump_up, _M_bump_down,
        _M_incr, operator==, operator<=>, operator<, and operator-.
        (_Bit_iterator): Also use them in operator*.
        (_Bit_const_iterator): Likewise.
---
libstdc++-v3/include/bits/stl_bvector.h |   75 ++++++++++++++++++++++++++++++-
1 file changed, 72 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_bvector.h 
b/libstdc++-v3/include/bits/stl_bvector.h
index 8d18bcaffd434..81b316846454b 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -177,6 +177,55 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
    _Bit_type * _M_p;
    unsigned int _M_offset;

+#if __OPTIMIZE__ && !__GLIBCXX_DISABLE_ASSUMPTIONS
+// If the assumption (EXPR) fails, invoke undefined behavior, so that
+// the test and the failure block gets optimized out, but the compiler
+// still recalls that (expr) can be taken for granted.  Use this only
+// for expressions that are simple and explicit enough that the
+// compiler can optimize based on them.  When not optimizing, the
+// expression is still compiled, but it's never executed.
+#if 0 /* ??? */ && __cpp_lib_is_constant_evaluated
+#define __GLIBCXX_BUILTIN_ASSUME(expr)         \


A single underscore prefix on __GLIBCXX_BUILTIN_ASSUME and
__GLIBCXX_DISABLE_ASSUMPTIONS please.

+    do                                         \
+      if (std::is_constant_evaluated ())       \
+       static_assert(expr);                    \


This can never be valid. The true branch of the
if (is_constant_evaluated) statement must always compile, which means
expr must be a valid constant expression to be usable in a
static_assert. It can't be a runtime condition like
"_M_offset < _S_word_bit".

+      else if (!(expr))                                \
+       {                                       \
+         void **__assert_failed = 0;           \
+         *__assert_failed = 0;                 \
+         __builtin_unreachable ();             \


This already works fine in constant evaluation anyway. If a
__builtin_unreachable() is reached during constant evaluation, the
compilation fails. So this already serves as static_assert(expr),
except it actually works :-)

But what's the null dereference for?

+       }                                       \
+    while (0)
+#else
+#define __GLIBCXX_BUILTIN_ASSUME(expr)         \
+    do                                         \
+      if (!(expr))                             \
+       {                                       \
+         void **__assert_failed = 0;           \
+         *__assert_failed = 0;                 \
+         __builtin_unreachable ();             \
+       }                                       \
+    while (0)
+#endif
+#else
+#define __GLIBCXX_BUILTIN_ASSUME(expr)         \
+    (void)(false && (expr))


What's the point of this, just to verify that (expr) is contextually
convertible to bool?

+#endif
+
+    _GLIBCXX20_CONSTEXPR
+    bool
+    _M_normalized_p() const


We don't use the _p suffix for predicates in the library.
Please use just _M_normalized or _M_is_normalized.

But do we even need this function? It's not used anywhere else, can we
just inline the condition into _M_assume_normalized() ?

+    {
+      return (_M_offset < unsigned(_S_word_bit));
+    }
+
+    _GLIBCXX20_CONSTEXPR
+    void
+    _M_assume_normalized() const


I think this should use _GLIBCXX_ALWAYS_INLINE

+    {
+      __GLIBCXX_BUILTIN_ASSUME (_M_normalized_p ());


Is there even any benefit to this macro?

Can we just define this function as:

        if (_M_offset >= unsigned(_S_word_bit))
          __builtin_unreachable();

Or better still:

       __attribute__((__assume__(_M_offset < unsigned(_S_word_bit))));

Maybe even get rid of _M_assume_normalized() as a function and just
put that attribute everywhere you currently use _M_assume_normalized.

+    }
+
    _GLIBCXX20_CONSTEXPR
    _Bit_iterator_base(_Bit_type * __x, unsigned int __y)
    : _M_p(__x), _M_offset(__y) { }
@@ -185,6 +234,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
    void
    _M_bump_up()
    {
+      _M_assume_normalized();
      if (_M_offset++ == int(_S_word_bit) - 1)
        {
          _M_offset = 0;
@@ -196,6 +246,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
    void
    _M_bump_down()
    {
+      _M_assume_normalized();
      if (_M_offset-- == 0)
        {
          _M_offset = int(_S_word_bit) - 1;
@@ -207,6 +258,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
    void
    _M_incr(ptrdiff_t __i)
    {
+      _M_assume_normalized();
      difference_type __n = __i + _M_offset;
      _M_p += __n / int(_S_word_bit);
      __n = __n % int(_S_word_bit);
@@ -221,7 +273,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
    _GLIBCXX_NODISCARD
    friend _GLIBCXX20_CONSTEXPR bool
    operator==(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
-    { return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; }
+    {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
+      return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset;
+    }

#if __cpp_lib_three_way_comparison
    [[nodiscard]]
@@ -229,6 +285,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
    operator<=>(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
    noexcept
    {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
      if (const auto __cmp = __x._M_p <=> __y._M_p; __cmp != 0)
        return __cmp;
      return __x._M_offset <=> __y._M_offset;
@@ -238,6 +296,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
    friend bool
    operator<(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
    {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
      return __x._M_p < __y._M_p
            || (__x._M_p == __y._M_p && __x._M_offset < __y._M_offset);
    }
@@ -266,6 +326,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
    friend _GLIBCXX20_CONSTEXPR ptrdiff_t
    operator-(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
    {
+      // Make _M_offset's range visible to optimizers.
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
      return (int(_S_word_bit) * (__x._M_p - __y._M_p)
              + __x._M_offset - __y._M_offset);
    }
@@ -297,7 +360,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
    _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
    reference
    operator*() const
-    { return reference(_M_p, 1UL << _M_offset); }
+    {
+      _M_assume_normalized();
+      return reference(_M_p, 1UL << _M_offset);
+    }

    _GLIBCXX20_CONSTEXPR
    iterator&
@@ -408,7 +474,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
    _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
    const_reference
    operator*() const
-    { return _Bit_reference(_M_p, 1UL << _M_offset); }
+    {
+      _M_assume_normalized();
+      return _Bit_reference(_M_p, 1UL << _M_offset);
+    }

    _GLIBCXX20_CONSTEXPR
    const_iterator&

Re: [PATCH] libstdc++: optimize bit iterators assuming normalization [PR110807]

Reply via email to