Re: hash resizing bug

Jim Meyering Wed, 17 Jun 2009 14:25:26 -0700

Eric Blake wrote:
> According to Eric Blake on 6/17/2009 12:21 PM:
>> Before the rehash, about 20% of the buckets were empty.  But after changing 
>> the
>> prime factor, the hasher function scatters better, and _every single bucket_ 
>> is
>> occupied.  Which means ALL subsequent calls to hash_insert will trigger an
>> overflow, rather than inserting into a bucket head.  And since the resize
>> checking logic in hash_rehash is only triggered after insertion into a bucket
>> head, the hash table has effectively been locked into a fixed size that will
>> never grow again, no matter how many hash_insert calls are made, even though
>> the table is certainly exceeding the requested threshold of 20% empty 
>> buckets.
>> Ouch.
>
> I'm now playing with this simple code motion patch, also available at:
> http://repo.or.cz/w/gnulib/ericb.git
> $ git pull git://repo.or.cz/gnulib/ericb.git master
>
> --
> Don't work too hard, make some time for fun as well!
>
> Eric Blake             e...@byu.net
>>From d3edc0d1e302af613e8b56396f1f9d0bec15a264 Mon Sep 17 00:00:00 2001
> From: Eric Blake <e...@byu.net>
> Date: Wed, 17 Jun 2009 13:01:41 -0600
> Subject: [PATCH] hash: check for resize before insertion
>
> * lib/hash.c (hash_insert): Check whether bucket usage exceeds
> threshold before insertion, so that a pathological hash_rehash
> that fills every bucket can still trigger another rehash.
>
> Signed-off-by: Eric Blake <e...@byu.net>
> ---
>  ChangeLog  |    5 +++++
>  lib/hash.c |   52 ++++++++++++++++++++++++++--------------------------
>  2 files changed, 31 insertions(+), 26 deletions(-)
>
> diff --git a/ChangeLog b/ChangeLog
> index 1f57b7e..38c1d45 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,5 +1,10 @@
>  2009-06-17  Eric Blake  <e...@byu.net>
>
> +     hash: check for resize before insertion
> +     * lib/hash.c (hash_insert): Check whether bucket usage exceeds
> +     threshold before insertion, so that a pathological hash_rehash
> +     that fills every bucket can still trigger another rehash.
> +
>       hash: manage bucket overflows more efficiently
>       * lib/hash.c [USE_OBSTACK]: Delete; the optimizations provided by
>       an obstack have been folded in to mainline code.
> diff --git a/lib/hash.c b/lib/hash.c
> index 9d78990..e4310b5 100644
> --- a/lib/hash.c
> +++ b/lib/hash.c
> @@ -999,32 +999,6 @@ hash_insert (Hash_table *table, const void *entry)
>    if ((data = hash_find_entry (table, entry, &bucket, false)) != NULL)
>      return data;
>
> -  /* ENTRY is not matched, it should be inserted.  */
> -
> -  if (bucket->data)
> -    {
> -      struct hash_entry *new_entry = allocate_entry (table);
> -
> -      if (new_entry == NULL)
> -     return NULL;
> -      if (table->overflow_size <= new_entry - table->overflow)
> -     abort ();
> -
> -      /* Add ENTRY in the overflow of the bucket.  */
> -
> -      new_entry->data = (void *) entry;
> -      new_entry->next = bucket->next;
> -      bucket->next = new_entry - table->overflow;
> -      table->n_entries++;
> -      return (void *) entry;
> -    }
> -
> -  /* Add ENTRY right in the bucket head.  */
> -
> -  bucket->data = (void *) entry;
> -  table->n_entries++;
> -  table->n_buckets_used++;
> -
>    /* If the growth threshold of the buckets in use has been reached, increase
>       the table size and rehash.  There's no point in checking the number of
>       entries:  if the hashing function is ill-conditioned, rehashing is not
> @@ -1057,6 +1031,32 @@ hash_insert (Hash_table *table, const void *entry)
>       }
>      }


You can't use a pre-rehash "bucket" here (after rehash),
since it is no longer valid.

Considering the cost of rehashing (and its relative infrequency),
one more hash_find_entry won't even show up on the radar.

> +  /* ENTRY is not matched, it should be inserted.  */
> +
> +  if (bucket->data)
> +    {
> +      struct hash_entry *new_entry = allocate_entry (table);
> +
> +      if (new_entry == NULL)
> +     return NULL;
> +      if (table->overflow_size <= new_entry - table->overflow)
> +     abort ();
> +
> +      /* Add ENTRY in the overflow of the bucket.  */
> +
> +      new_entry->data = (void *) entry;
> +      new_entry->next = bucket->next;
> +      bucket->next = new_entry - table->overflow;
> +      table->n_entries++;
> +      return (void *) entry;
> +    }
> +
> +  /* Add ENTRY right in the bucket head.  */
> +
> +  bucket->data = (void *) entry;
> +  table->n_entries++;
> +  table->n_buckets_used++;
> +
>    return (void *) entry;
>  }

Re: hash resizing bug

Reply via email to