Re: hash resizing bug

Eric Blake Wed, 17 Jun 2009 14:46:17 -0700

Jim Meyering <jim <at> meyering.net> writes:

> >    /* If the growth threshold of the buckets in use has been reached, 
increase
> >       the table size and rehash.  There's no point in checking the number of
> >       entries:  if the hashing function is ill-conditioned, rehashing is not
> > @@ -1057,6 +1031,32 @@ hash_insert (Hash_table *table, const void *entry)
> >     }
> >      }
> 
> You can't use a pre-rehash "bucket" here (after rehash),
> since it is no longer valid.
> 
> Considering the cost of rehashing (and its relative infrequency),
> one more hash_find_entry won't even show up on the radar.


Good catch.  So the simple code motion was a bit too simple; here's the updated 
patch (and I've now run it through the same m4 test that found the original 
problem):


From: Eric Blake <e...@byu.net>
Date: Wed, 17 Jun 2009 13:01:41 -0600
Subject: [PATCH] hash: check for resize before insertion

* lib/hash.c (hash_insert): Check whether bucket usage exceeds
threshold before insertion, so that a pathological hash_rehash
that fills every bucket can still trigger another rehash.

Signed-off-by: Eric Blake <e...@byu.net>
---
 lib/hash.c |   58 +++++++++++++++++++++++++++++++---------------------------
 1 files changed, 31 insertions(+), 27 deletions(-)

diff --git a/lib/hash.c b/lib/hash.c
index 7d76d45..ed171eb 100644
--- a/lib/hash.c
+++ b/lib/hash.c
@@ -1,7 +1,7 @@
 /* hash - hashing table processing.

-   Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2006, 2007 Free
-   Software Foundation, Inc.
+   Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2006, 2007,
+   2009 Free Software Foundation, Inc.

    Written by Jim Meyering, 1992.

@@ -917,30 +917,6 @@ hash_insert (Hash_table *table, const void *entry)
   if ((data = hash_find_entry (table, entry, &bucket, false)) != NULL)
     return data;

-  /* ENTRY is not matched, it should be inserted.  */
-
-  if (bucket->data)
-    {
-      struct hash_entry *new_entry = allocate_entry (table);
-
-      if (new_entry == NULL)
-       return NULL;
-
-      /* Add ENTRY in the overflow of the bucket.  */
-
-      new_entry->data = (void *) entry;
-      new_entry->next = bucket->next;
-      bucket->next = new_entry;
-      table->n_entries++;
-      return (void *) entry;
-    }
-
-  /* Add ENTRY right in the bucket head.  */
-
-  bucket->data = (void *) entry;
-  table->n_entries++;
-  table->n_buckets_used++;
-
   /* If the growth threshold of the buckets in use has been reached, increase
      the table size and rehash.  There's no point in checking the number of
      entries:  if the hashing function is ill-conditioned, rehashing is not
@@ -967,10 +943,38 @@ hash_insert (Hash_table *table, const void *entry)

          /* If the rehash fails, arrange to return NULL.  */
          if (!hash_rehash (table, candidate))
-           entry = NULL;
+           return NULL;
+
+         /* Update the bucket we are interested in.  */
+         if (hash_find_entry (table, entry, &bucket, false) != NULL)
+           abort ();
        }
     }

+  /* ENTRY is not matched, it should be inserted.  */
+
+  if (bucket->data)
+    {
+      struct hash_entry *new_entry = allocate_entry (table);
+
+      if (new_entry == NULL)
+       return NULL;
+
+      /* Add ENTRY in the overflow of the bucket.  */
+
+      new_entry->data = (void *) entry;
+      new_entry->next = bucket->next;
+      bucket->next = new_entry;
+      table->n_entries++;
+      return (void *) entry;
+    }
+
+  /* Add ENTRY right in the bucket head.  */
+
+  bucket->data = (void *) entry;
+  table->n_entries++;
+  table->n_buckets_used++;
+
   return (void *) entry;
 }

-- 
1.6.3.2

Re: hash resizing bug

Reply via email to