On 23/04/16 10:27 +0200, François Dumont wrote:
Hi

Here is a patch to introduce a new power of 2 based rehash policy. It enhances performance as it avoids modulo float operations. I have updated performance benches and here is the result:

54075.cc tr1 benches 455r 446u 8s 0mem 0pf 54075.cc std benches 466r 460u 6s 0mem 0pf 54075.cc std2 benches 375r 369u 6s 0mem 0pf

std2 benches is the one using power of 2 bucket count.

Note that I made use of __detected_or_t to avoid duplicating all the code of _Rehash_base<>.

It brings a simplification of _Insert<>, it doesn't take a _Unique_keys template parameter anymore. It allowed to remove a specialization.

It also improve behavior when we reach maximum number of buckets, we won't keep on trying to increase the number as it is impossible.

Last it fixes a small problem in 54075.cc bench. We were using __uset_traits rather than __umset_traits in definition of __umset. Results were not the expected ones.

Thanks, now that we're back in stage 1 we can make this change.



2016-04-22  François Dumont <fdum...@gcc.gnu.org>

   * include/bits/hashtable_policy.h
   (_Prime_rehash_policy::__has_load_factor): New. Mark rehash policy
   having load factor management.
   (_Mask_range_hashing): New.
   (_NextPower2<size_t>): New.
   (_Power2_rehash_policy): New.
   (_Inserts<>): Remove last template parameter _Unique_keys. Use the same
   implementation when keys are unique no matter if iterators are constant
   or not.

I found this change description confusing, because it's really using
the same implementation whether keys are unique or not, but this says
"Use the same implementation whether iterators are constant or not".

Shouldn't it be "Use the same implementation when iterators are
constant, no matter if keys are unique or not" ?

Maybe this would be clearer:

   (_Inserts<>): Remove last template parameter, _Unique_keys, so
   that partial specializations only depend on whether iterators are
   constant or not.


   * src/c++11/hashable_c++0x.cc (_Prime_rehash_policy::_M_next_bkt):
   Consider when last prime number has been reach.

s/reach/reached/

   * testsuite/23_containers/unordered_set/hash_policy/power2_rehash.cc:
   New.
   * testsuite/performance/23_containers/insert/54075.cc: Add bench using

s/bench/benchmark/

   the new hash policy.
   * testsuite/performance/23_containers/insert_erase/41975.cc: Likewise.

Tested under linux x64_86, ok to commit ?

François



Index: include/bits/hashtable_policy.h
===================================================================
--- include/bits/hashtable_policy.h     (r??vision 235348)
+++ include/bits/hashtable_policy.h     (copie de travail)
@@ -457,6 +457,8 @@
  /// smallest prime that keeps the load factor small enough.
  struct _Prime_rehash_policy
  {
+    using __has_load_factor = std::true_type;

    _Prime_rehash_policy(float __z = 1.0) noexcept
    : _M_max_load_factor(__z), _M_next_resize(0) { }

@@ -501,6 +503,136 @@
    mutable std::size_t _M_next_resize;
  };

+  /// Range hashing function assuming that second args is a power of 2.

s/args/arg/

+  struct _Mask_range_hashing
+  {
+    typedef std::size_t first_argument_type;
+    typedef std::size_t second_argument_type;
+    typedef std::size_t result_type;
+
+    result_type
+    operator()(first_argument_type __num,
+              second_argument_type __den) const noexcept
+    { return __num & (__den - 1); }
+  };
+
+
+  /// Helper type to compute next power of 2.
+  template<typename _SizeT,
+          std::size_t _N = sizeof(_SizeT) << 2, bool _Increment = true>
+    struct _NextPower2
+    {
+      static _SizeT
+      _Get(_SizeT __n)
+      {
+       _SizeT __next = _NextPower2<_SizeT, (_N >> 1), false>::_Get(__n);
+       __next |= __next >> _N;
+       if (_Increment)
+         ++__next;
+
+       return __next;
+      }
+    };
+
+  template<typename _SizeT>
+    struct _NextPower2<_SizeT, 1, false>
+    {
+      static _SizeT
+      _Get(_SizeT __n)
+      {
+       --__n;
+       return __n |= __n >> 1;
+      }
+    };

What's the reason to keep this recursive template instead of using a
simple function like the clp2() we discussed? A simple function (which
could be _GLIBCXX14_CONSTEXPR) compiles faster, and produces similar
object code for the default -std=gnu++14 mode. And it doesn't require
six calls to _NextPower2::_Get to calculate the result.

If you're worried about the final shift being unnecessary on 32-bit
you can use the preprocessor, something like:

 _GLIBCXX14_CONSTEXPR
 std::size_t
 __clp2(std::size_t n)
 {
#if __SIZEOF_SIZE_T__ >= 8
   std::uint_fast64_t x = n;
#else
   std::uint_fast32_t x = n;
#endif
   // Algorithm from Hacker's Delight, Figure 3-3.
   x = x - 1;
   x = x | (x >> 1);
   x = x | (x >> 2);
   x = x | (x >> 4);
   x = x | (x >> 8);
   x = x | (x >>16);
#if __SIZEOF_SIZE_T__ >= 8
   x = x | (x >>32);
#endif
   return x + 1;
 }

I don't think we need to worry about 128-bit integers, because that
would be a ridiculous number of buckets anyway. We can just limit
__max_bkt so we don't have to deal with more than 2^63 buckets.

+  /// Rehash policy providing power of 2 bucket numbers. Ease modulo

s/Ease/Avoids/

+  /// operations.
+  struct _Power2_rehash_policy
+  {
+    using __has_load_factor = std::true_type;
+
+    _Power2_rehash_policy(float __z = 1.0) noexcept
+    : _M_max_load_factor(__z), _M_next_resize(0) { }
+
+    float
+    max_load_factor() const noexcept
+    { return _M_max_load_factor; }
+
+    // Return a bucket size no smaller than n (as long as n is not above the
+    // highest power of 2).
+    std::size_t
+    _M_next_bkt(std::size_t __n) const
+    {
+      constexpr auto __max_bkt
+       = std::size_t(1) << ((sizeof(std::size_t) << 3) - 1);

Should this be sizeof(size_t) * __CHAR_BITS__ instead of << 3 ?
Otherwise targets with char wider than 8 bits would get a smaller
maximum number of buckets, even if they have 64-bit size_t.

So, including the suggestion above to limit it to 2^63, it would be:

    constexpr size_t __max_width = std::min(sizeof(size_t), 8);
    constexpr auto __max_bkt
     = std::size_t(1) << (__max_width * __CHAR_BITS__ - 1);

+
+      std::size_t __res = _NextPower2<std::size_t>::_Get(__n);
+
+      if (__res == 0)
+       __res = __max_bkt;
+
+      if (__res == __max_bkt)
+       // Set next resize to the max value so that we never try to rehash again
+       // as we already reach the biggest possible bucket number.
+       // Note that it might result in max_load_factor not being respected.
+       _M_next_resize = std::size_t(0) - 1;

This can be simplified to size_t(-1), there's no need for arithmetic.

+      else
+       _M_next_resize
+         = __builtin_floor(__res * (long double)_M_max_load_factor);

Isn't floor() on a positive value identical to the truncation we'd get
by converting to an integer anyway? I suppose it makes it explicit,
and is symmetrical with the ceil() calls, so is OK.


@@ -775,8 +907,7 @@
           typename _ExtractKey, typename _Equal,
           typename _H1, typename _H2, typename _Hash,
           typename _RehashPolicy, typename _Traits,
-          bool _Constant_iterators = _Traits::__constant_iterators::value,
-          bool _Unique_keys = _Traits::__unique_keys::value>
+          bool _Constant_iterators = _Traits::__constant_iterators::value>
    struct _Insert;

  /// Specialization.

(I'm sure I've said it before, but having 10+ template parameters here
makes me sad).




Index: src/c++11/hashtable_c++0x.cc
===================================================================
--- src/c++11/hashtable_c++0x.cc        (r??vision 235348)
+++ src/c++11/hashtable_c++0x.cc        (copie de travail)
@@ -60,8 +60,16 @@
      = sizeof(__prime_list) / sizeof(unsigned long) - 1;
    const unsigned long* __next_bkt =
      std::lower_bound(__prime_list + 5, __prime_list + __n_primes, __n);
-    _M_next_resize =
-      __builtin_ceil(*__next_bkt * (long double)_M_max_load_factor);
+
+    if (__next_bkt == __prime_list + __n_primes)
+      // Set next resize to the max value so that we never try to rehash again
+      // as we already reach the biggest possible bucket number.
+      // Note that it might result in max_load_factor not being respected.
+      _M_next_resize = std::size_t(0) - 1;

This can be simplified as above:

     _M_next_resize = size_t(-1);

+    else
+      _M_next_resize =
+       __builtin_ceil(*__next_bkt * (long double)_M_max_load_factor);
+
    return *__next_bkt;
  }

Reply via email to