I'm investigating an issue I run into with using changes of size ()
as an indicator of hash-table re-allocation through expand ().  This
doesn't work when ::expand computes the same size as before since
it then still happily re-allocates.  To complicate this ::expand
has its own heuristic when to re-calculate the allocation size:

  /* Resize only when table after removal of unused elements is either
     too full or too empty.  */
  unsigned int nindex;
  size_t nsize;
  if (elts * 2 > osize || too_empty_p (elts))
    {
      nindex = hash_table_higher_prime_index (elts * 2);
      nsize = prime_tab[nindex].prime;
    }
  else
    {
      nindex = oindex;
      nsize = osize;
    }

which does not match that of the caller in find_slot_with_hash:

  if (insert == INSERT && m_size * 3 <= m_n_elements * 4)
    expand ();

one could argue that expand () gets rid of deleted entries even
when re-allocating to the same size.  But then it's odd to trigger
this from INSERT which is perfectly capable of handling deleted
entries optimally.  In fact removing elts never triggers this
thus removing all elts and then re-inserting them will trigger
re-allocations without good reason.

Note the INSERT heuristic computes the load factor including
deleted elements while expand () computes the load factor excluding
them.  (the only other caller of expand() is traverse which
re-allocates when too_empty_p ())

You could argue checking for re-allocation is bogus, even
checking m_entries can hide two re-allocations.  Checking
m_entries and m_size would work I guess, adding a separate
counter for re-allocations would, too.

Still what I say above holds.  I'd argue ::expand () should
not repeat heuristics of callers and iff we want to shrink
because of deleted entries cost then we should do that when
we do lookups (if we better not re-allocate with just lookups
then we should do that at clear_slot time instead?).  I'm
seeing a lot of re-allocation churn when inserting/removing
from a very small hashtable with size == 7.

Maybe at least separate the load factor from the too empty
case at insertion, so

  if (insert == INSERT 
      && (size () * 3 <= elements () * 4 || too_empty_p (elements ())))
    expand ();

and then elide the heuristics from ::expand () (there's the chance
we re-allocate over and over to the same size for each INSERT with this 
mismatch)

Note that ::empty () also oddly uses too_empty_p (m_n_elements)

(maybe we should rename m_n_elements to m_n_elements_with_deleted)

Thanks,
Richard.

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

Reply via email to