Re: [PATCH] libstdc++: make hash container clear() complexity O(size()) (DR2550) [PR67922]

Tomasz Kaminski Thu, 02 Jul 2026 00:54:25 -0700

On Thu, Jul 2, 2026 at 9:32 AM Tomasz Kaminski <[email protected]> wrote:


>
>
> On Thu, Jul 2, 2026 at 2:21 AM Nathan Myers <[email protected]> wrote:
>
>> The hash tables std::unordered_set, _map, _multiset, _multimap
>> as implemented take time linear in the size of the bucket array
>> (because memset), rather than in the number of elements stored,
>> in violation of Standard requirements. With this patch it
>> performs work only for the elements present, but only for a very
>> sparsely-populated table. With a bucket loading above 0.5%, the
>> status-quo method tests faster, so the new method is used only
>> when less.
>>
> Could you post your test results?
>
Clearing at the bucket level requires us to find the bucket index for the
node,
which requires computing a hash. This may highly depend on type used, so
I would measure for keys
  * int - so fash hash is enabled
  * string_view - we explicitly disable fast hash, so we get hash cached
  * string_view with custom hash specialization - so is_fast_hash is enabled

If you will use the implementation suggested below (instead of erase), I
think
the expected load factor may be much higher if is_fast_hash was specialized
to false (and we do not need to recompute the hash).


>
>> It turns out libc++ also zeroes its bucket table on clear(), like
>> libstdc++, but uses a regular loop. Performance differences seem
>> to depend on memory organization.
>>
>> libstdc++-v3/Changelog:
>>         PR libstdc++/67922
>>         * include/bits/hashtable.h (clear): Special-case minimal
>> population.
>> ---
>>  libstdc++-v3/include/bits/hashtable.h | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/libstdc++-v3/include/bits/hashtable.h
>> b/libstdc++-v3/include/bits/hashtable.h
>> index eff6c31d827..d9eb0fe45e3 100644
>> --- a/libstdc++-v3/include/bits/hashtable.h
>> +++ b/libstdc++-v3/include/bits/hashtable.h
>> @@ -2778,6 +2778,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>                _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
>>      clear() noexcept
>>      {
>> +      if (this->size() < this->bucket_count() / 200)
>> +       {
>> +         this->erase(this->begin(), this->end());
>>
> The erase implementation here is not optimal, as it requires some extra
work
to detect moving between the buckets, that is unnecessary here,
We could do something like this instead, were we just clear the buckets
nodes
as we go.
      if (load_factor_check)
      {
          // Avoids computing the hash
          this->_M_deallocate_nodes(_M_begin());
          std::fill_n(_M_buckets, _M_bucket_count, nullptr);
      }
      else while (__n)
        {
          __node_ptr __tmp = __n;
          __n = __n->_M_next();
          _M_buckets[M_bucket_index(*__n)] = nullptr;
          _M_deallocate_node(__tmp);
        }
      _M_element_count = 0;
      _M_before_begin._M_nxt = nullptr;


> +         return;
>> +       }
>>        this->_M_deallocate_nodes(_M_begin());
>>        std::fill_n(_M_buckets, _M_bucket_count, nullptr);
>>        _M_element_count = 0;
>> --
>> 2.54.0
>>
>>

Re: [PATCH] libstdc++: make hash container clear() complexity O(size()) (DR2550) [PR67922]

Reply via email to