Re: [PATCH][_Hashtable] Use RAII to restore Rehash state

Jonathan Wakely Thu, 09 Nov 2023 00:11:56 -0800

On Thu, 26 Oct 2023 at 21:52, François Dumont <frs.dum...@gmail.com> wrote:
>
>
> On 26/10/2023 12:43, Jonathan Wakely wrote:
> > On 26/10/23 07:18 +0200, François Dumont wrote:
> >>     libstdc++: [_Hashtable] Use RAII type to manage rehash functor state
> >>
> >>     Replace usage of __try/__catch with a RAII type to restore rehash
> >> functor
> >>     state when needed.
> >
> > Generally I really like replacing try-catch with RAII but I have some
> > questions below.
> >
> >>     libstdc++-v3/ChangeLog:
> >>
> >>             * include/bits/hashtable_policy.h (_RehashStateGuard): New.
> >>             (_Insert_base<>::_M_insert_range(_IIt, _IIt, const
> >> _NodeGet&, false_type)):
> >>             Adapt.
> >>             * include/bits/hashtable.h (__rehash_guard_t): New.
> >>             (__rehash_state): Remove.
> >>             (_M_rehash): Remove.
> >>             (_M_rehash_aux): Rename into _M_rehash.
> >>             (_M_assign_elements, _M_insert_unique_node,
> >> _M_insert_multi_node): Adapt.
> >>             (rehash): Adapt.
> >>
> >>
> >> Tested under Linux x64.
> >>
> >> Ok to commit ?
> >>
> >> François
> >
> >> diff --git a/libstdc++-v3/include/bits/hashtable.h
> >> b/libstdc++-v3/include/bits/hashtable.h
> >> index 0857448f7ed..64071ac1fb2 100644
> >> --- a/libstdc++-v3/include/bits/hashtable.h
> >> +++ b/libstdc++-v3/include/bits/hashtable.h
> >> @@ -234,6 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>                           _RehashPolicy, _Traits>;
> >>       using __enable_default_ctor
> >>     = _Hashtable_enable_default_ctor<_Equal, _Hash, _Alloc>;
> >> +      using __rehash_guard_t
> >> +    = __detail::_RehashStateGuard<_RehashPolicy>;
> >>
> >>     public:
> >>       typedef _Key                        key_type;
> >> @@ -264,7 +266,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>
> >>     private:
> >>       using __rehash_type = _RehashPolicy;
> >> -      using __rehash_state = typename __rehash_type::_State;
> >>
> >>       using __unique_keys = typename __traits_type::__unique_keys;
> >>
> >> @@ -1200,14 +1201,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>
> >>     private:
> >>       // Helper rehash method used when keys are unique.
> >> -      void _M_rehash_aux(size_type __bkt_count, true_type __uks);
> >> +      void _M_rehash(size_type __bkt_count, true_type __uks);
> >>
> >>       // Helper rehash method used when keys can be non-unique.
> >> -      void _M_rehash_aux(size_type __bkt_count, false_type __uks);
> >> -
> >> -      // Unconditionally change size of bucket array to n, restore
> >> -      // hash policy state to __state on exception.
> >> -      void _M_rehash(size_type __bkt_count, const __rehash_state&
> >> __state);
> >> +      void _M_rehash(size_type __bkt_count, false_type __uks);
> >>     };
> >>
> >>   // Definitions of class template _Hashtable's out-of-line member
> >> functions.
> >> @@ -1337,7 +1334,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>       {
> >>     __buckets_ptr __former_buckets = nullptr;
> >>     std::size_t __former_bucket_count = _M_bucket_count;
> >> -    const __rehash_state& __former_state = _M_rehash_policy._M_state();
> >> +    __rehash_guard_t __rehash_guard(_M_rehash_policy);
> >>
> >>     if (_M_bucket_count != __ht._M_bucket_count)
> >>       {
> >> @@ -1359,6 +1356,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>         _M_assign(std::forward<_Ht>(__ht), __roan);
> >>         if (__former_buckets)
> >>           _M_deallocate_buckets(__former_buckets,
> >> __former_bucket_count);
> >> +        __rehash_guard._M_reset = false;
> >
> > I find this confusing. Usually "reset" means that something is
> > cleared, so won't take an action in the destructor. e.g. if you use
> > std::unique_ptr::reset() then the object is destroyed immediately, and
> > then nothing happens in the destructor. Here it's the opposite,
> > _M_reset=true means that it _sould_ do something in the destructor.
> >
> > The problem is the ambiguity between "reset the state in the
> > destructor later" and "reset the object to an empty state now".
> >
> > If the member was called _M_guarded then it might be clearer.
> > _M_guarded=true means the guard is active, and will restore the state
> > later. Or _M_active, or _M_armed, or even _M_reset_in_dtor. Any of
> > those names avoids the confusion with the semantics of
> > std::unique_ptr::reset() and similar well-known APIs.
> >
> > Or what I usually do is store a pointer to the guarded object in the
> > RAII guard type, and then just null the pointer to disarm the guard.
> > That means you don't need a separate bool member variable. If
> > _RehashStateGuard::_M_rehash_policy was called _M_guarded_obj and was
> > a _RehashPolicy* instead of _RehashPolicy& then disarming it would be:
> >
> >        __rehash_guard._M_guarded_obj = nullptr;
> >
> > This seems clear to me, as it says that the guard no longer has
> > anything to guard, so won't do anything in the destructor.
> >
> Looks clearer to me too.
>
> I started with a _RehashStateScope and a _M_commit() method, it could
> have been worst :-)
>
>
> >
> >>       }
> >>     __catch(...)
> >>       {
> >> @@ -1366,7 +1364,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>           {
> >>         // Restore previous buckets.
> >>         _M_deallocate_buckets();
> >> -        _M_rehash_policy._M_reset(__former_state);
> >>         _M_buckets = __former_buckets;
> >>         _M_bucket_count = __former_bucket_count;
> >>           }
> >> @@ -2142,17 +2139,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>               __node_ptr __node, size_type __n_elt)
> >>     -> iterator
> >>     {
> >> -      const __rehash_state& __saved_state =
> >> _M_rehash_policy._M_state();
> >> +      __rehash_guard_t __rehash_guard(_M_rehash_policy);
> >>       std::pair<bool, std::size_t> __do_rehash
> >>     = _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count,
> >>                       __n_elt);
> >>
> >>       if (__do_rehash.first)
> >>     {
> >> -      _M_rehash(__do_rehash.second, __saved_state);
> >> +      _M_rehash(__do_rehash.second, true_type{});
> >>       __bkt = _M_bucket_index(__code);
> >>     }
> >>
> >> +      __rehash_guard._M_reset = false;
> >>       this->_M_store_code(*__node, __code);
> >
> > This changes behaviour. Previously the try-catch was inside the call to
> > _M_rehash. Now we're starting to guard before calling _M_need_rehash,
>
> We were capturing _M_rehash_policy state before the call to
> _M_need_rehash so it's not different.
>
> We really need to instantiate the guard before we call _M_need_rehash
> cause it is the method changing the _M_rehash_policy state.
>
> > and we run the destructor unconditionally even if we never call
> > _M_rehash.
>
> Here we only want to reset rehash state on exception.
>
> Note that with the 2 rehash policy implementations we provide it doesn't
> matter if we reset or not when we do not call _M_rehash because their
> state changes only when rehash is needed. But some user's fancy rehash
> policy might update their state even if no rehash is requested and in
> this case it is better to reset this state only on exception.
>
>
> > I think that's OK, because _M_need_rehash is noexcept (but
> > it's confusing because we have a const overload of _M_need_rehash
> > which is noexcept(false) and a non-const overload which is
> > noexcept(true) ... should they both be noexcept?)
>
> That's another subject but yes, _M_need_rehash should be non-const and
> noexcept on both rehash policy implementations. I considered doing such
> a thing but won't it impact abi as _Prime_rehash_policy._M_need_rehash
> symbols is coming from the lib ?
>
>
> >
> > Can we do this instead:
> >
> >
> > -      const __rehash_state& __saved_state = _M_rehash_policy._M_state();
> >        std::pair<bool, std::size_t> __do_rehash
> >      = _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count,
> >                        __n_elt);
> >
> >        if (__do_rehash.first)
> >      {
> > +      __rehash_guard_t __rehash_guard(_M_rehash_policy);
>
> Clearly no, we need to capture state before _M_need_rehash. Like it was
> done with the __saved_state above.
>
>
> > -      _M_rehash(__do_rehash.second, __saved_state);
> > +      _M_rehash(__do_rehash.second, true_type{});
> > +      __rehash_guard._M_reset = false;
> >        __bkt = _M_bucket_index(__code);
> >      }
> >
> >        this->_M_store_code(*__node, __code);
> >
> > This only creates a guard object if we're actually going to rehash, so
> > we don't run its destructor (and check if we need t restore anything)
> > when no rehash is needed.
> >
> > N.B. _M_bucket_index is also confusing. There are two overloads of it,
> > one is noexcept and one isn't. The one that we call here is
> > noexcept(false), so that call could throw ... is it OK that we don't
> > restore the old state if that happens? Should that be guarded too? (I
> > have no idea).
> >
> > Also, I see that the noexcept(true) overload of _M_bucket_index calls
> > __hash_code_base::_M_bucket_index, which is noexcept(/* condition */),
> > so the caller says it cannot throw, but it calls something that
> > sometimes throws. Is that correct?
>
> It can be noexcept because usage of hash code cache depends on either
> hash functor is noexcept or not.
>
> >
> >>
> >>       // Always insert at the beginning of the bucket.
> >> @@ -2172,13 +2170,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>              __hash_code __code, __node_ptr __node)
> >>     -> iterator
> >>     {
> >> -      const __rehash_state& __saved_state =
> >> _M_rehash_policy._M_state();
> >> +      __rehash_guard_t __rehash_guard(_M_rehash_policy);
> >>       std::pair<bool, std::size_t> __do_rehash
> >>     = _M_rehash_policy._M_need_rehash(_M_bucket_count,
> >> _M_element_count, 1);
> >>
> >>       if (__do_rehash.first)
> >> -    _M_rehash(__do_rehash.second, __saved_state);
> >> +    _M_rehash(__do_rehash.second, false_type{});
> >>
> >> +      __rehash_guard._M_reset = false;
> >
> > Again, we're creating the guard object in a larger scope than needed.
> >
> >>       this->_M_store_code(*__node, __code);
> >>       const key_type& __k = _ExtractKey{}(__node->_M_v());
> >>       size_type __bkt = _M_bucket_index(__code);
> >> @@ -2509,39 +2508,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>            _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
> >>     rehash(size_type __bkt_count)
> >>     {
> >> -      const __rehash_state& __saved_state =
> >> _M_rehash_policy._M_state();
> >> +      __rehash_guard_t __rehash_guard(_M_rehash_policy);
> >>       __bkt_count
> >>     = std::max(_M_rehash_policy._M_bkt_for_elements(_M_element_count
> >> + 1),
> >>            __bkt_count);
> >>       __bkt_count = _M_rehash_policy._M_next_bkt(__bkt_count);
> >>
> >>       if (__bkt_count != _M_bucket_count)
> >> -    _M_rehash(__bkt_count, __saved_state);
> >> -      else
> >> -    // No rehash, restore previous state to keep it consistent with
> >> -    // container state.
> >> -    _M_rehash_policy._M_reset(__saved_state);
> >> -    }
> >> -
> >> -  template<typename _Key, typename _Value, typename _Alloc,
> >> -       typename _ExtractKey, typename _Equal,
> >> -       typename _Hash, typename _RangeHash, typename _Unused,
> >> -       typename _RehashPolicy, typename _Traits>
> >> -    void
> >> -    _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
> >> -           _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
> >> -    _M_rehash(size_type __bkt_count, const __rehash_state& __state)
> >> -    {
> >> -      __try
> >> -    {
> >> -      _M_rehash_aux(__bkt_count, __unique_keys{});
> >> -    }
> >> -      __catch(...)
> >>     {
> >> -      // A failure here means that buckets allocation failed. We only
> >> -      // have to restore hash policy previous state.
> >> -      _M_rehash_policy._M_reset(__state);
> >> -      __throw_exception_again;
> >> +      _M_rehash(__bkt_count, __unique_keys{});
> >> +      __rehash_guard._M_reset = false;
> >
> > And a larger scope again.
> >
> >>     }
> >>     }
> >>
> >> @@ -2553,7 +2529,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>     void
> >>     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
> >>            _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
> >> -    _M_rehash_aux(size_type __bkt_count, true_type /* __uks */)
> >> +    _M_rehash(size_type __bkt_count, true_type /* __uks */)
> >>     {
> >>       __buckets_ptr __new_buckets = _M_allocate_buckets(__bkt_count);
> >>       __node_ptr __p = _M_begin();
> >> @@ -2596,7 +2572,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>     void
> >>     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
> >>            _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
> >> -    _M_rehash_aux(size_type __bkt_count, false_type /* __uks */)
> >> +    _M_rehash(size_type __bkt_count, false_type /* __uks */)
> >>     {
> >>       __buckets_ptr __new_buckets = _M_allocate_buckets(__bkt_count);
> >>       __node_ptr __p = _M_begin();
> >> diff --git a/libstdc++-v3/include/bits/hashtable_policy.h
> >> b/libstdc++-v3/include/bits/hashtable_policy.h
> >> index 5d162463dc3..8b9626b1575 100644
> >> --- a/libstdc++-v3/include/bits/hashtable_policy.h
> >> +++ b/libstdc++-v3/include/bits/hashtable_policy.h
> >> @@ -715,6 +715,26 @@ namespace __detail
> >>     std::size_t    _M_next_resize;
> >>   };
> >>
> >> +  template<typename _RehashPolicy>
> >> +    struct _RehashStateGuard
> >> +    {
> >> +      _RehashPolicy& _M_rehash_policy;
> >> +      typename _RehashPolicy::_State _M_prev_state;
> >> +      bool _M_reset = true;
> >
> > This could be:
> >
> > +      _RehashPolicy* _M_guarded_obj;
> > +      typename _RehashPolicy::_State _M_prev_state;
> >
> >> +
> >> +      _RehashStateGuard(_RehashPolicy& __policy)
> >> +      : _M_rehash_policy(__policy)
> >> +      , _M_prev_state(__policy._M_state())
> >
> > +      _RehashStateGuard(_RehashPolicy& __policy)
> > +      : _M_guarded_obj(std::__addressof(__policy))
> > +      , _M_prev_state(__policy._M_state())
> >
> >> +      { }
> >> +      _RehashStateGuard(const _RehashStateGuard&) = delete;
> >> +
> >> +      ~_RehashStateGuard()
> >> +      {
> >> +    if (_M_reset)
> >> +      _M_rehash_policy._M_reset(_M_prev_state);
> >
> > +    if (_M_guarded_obj)
> > +      _M_guarded_obj->_M_reset(_M_prev_state);
> >
> >> +      }
> >> +    };
> >> +
> >>   // Base classes for std::_Hashtable.  We define these base classes
> >>   // because in some cases we want to do different things depending on
> >>   // the value of a policy class.  In some cases the policy class
> >> @@ -1007,7 +1027,7 @@ namespace __detail
> >>               const _NodeGetter& __node_gen, false_type __uks)
> >>       {
> >>     using __rehash_type = typename __hashtable::__rehash_type;
> >> -    using __rehash_state = typename __hashtable::__rehash_state;
> >> +    using __rehash_guard_t = typename __hashtable::__rehash_guard_t;
> >>     using pair_type = std::pair<bool, std::size_t>;
> >>
> >>     size_type __n_elt = __detail::__distance_fw(__first, __last);
> >> @@ -1016,14 +1036,15 @@ namespace __detail
> >>
> >>     __hashtable& __h = _M_conjure_hashtable();
> >>     __rehash_type& __rehash = __h._M_rehash_policy;
> >> -    const __rehash_state& __saved_state = __rehash._M_state();
> >> +    __rehash_guard_t __rehash_guard(__rehash);
> >>     pair_type __do_rehash = __rehash._M_need_rehash(__h._M_bucket_count,
> >>                             __h._M_element_count,
> >>                             __n_elt);
> >>
> >>     if (__do_rehash.first)
> >> -      __h._M_rehash(__do_rehash.second, __saved_state);
> >> +      __h._M_rehash(__do_rehash.second, __uks);
> >>
> >> +    __rehash_guard._M_reset = false;
> >
> > A larger scope again.
>
> Note that there is only one place where we reset rehash policy state
> even if no exception took place. It is in the std rehash method when the
> user requests a rehash that has no effect.
>
> Here is the updated patch, ok to commit ?


Thanks for the explanations, OK for trunk.

Re: [PATCH][_Hashtable] Use RAII to restore Rehash state

Reply via email to