-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Eric Blake on 6/17/2009 12:21 PM: > Before the rehash, about 20% of the buckets were empty. But after changing > the > prime factor, the hasher function scatters better, and _every single bucket_ > is > occupied. Which means ALL subsequent calls to hash_insert will trigger an > overflow, rather than inserting into a bucket head. And since the resize > checking logic in hash_rehash is only triggered after insertion into a bucket > head, the hash table has effectively been locked into a fixed size that will > never grow again, no matter how many hash_insert calls are made, even though > the table is certainly exceeding the requested threshold of 20% empty > buckets. > Ouch.
I'm now playing with this simple code motion patch, also available at: http://repo.or.cz/w/gnulib/ericb.git $ git pull git://repo.or.cz/gnulib/ericb.git master - -- Don't work too hard, make some time for fun as well! Eric Blake e...@byu.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAko5PdIACgkQ84KuGfSFAYBqZgCeOqttX/POCKvJXLBVubjaee9M Tr4AoIMsLpf+3LELwthAN6QK3cUpo/9y =I5iP -----END PGP SIGNATURE-----
>From d3edc0d1e302af613e8b56396f1f9d0bec15a264 Mon Sep 17 00:00:00 2001 From: Eric Blake <e...@byu.net> Date: Wed, 17 Jun 2009 13:01:41 -0600 Subject: [PATCH] hash: check for resize before insertion * lib/hash.c (hash_insert): Check whether bucket usage exceeds threshold before insertion, so that a pathological hash_rehash that fills every bucket can still trigger another rehash. Signed-off-by: Eric Blake <e...@byu.net> --- ChangeLog | 5 +++++ lib/hash.c | 52 ++++++++++++++++++++++++++-------------------------- 2 files changed, 31 insertions(+), 26 deletions(-) diff --git a/ChangeLog b/ChangeLog index 1f57b7e..38c1d45 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,10 @@ 2009-06-17 Eric Blake <e...@byu.net> + hash: check for resize before insertion + * lib/hash.c (hash_insert): Check whether bucket usage exceeds + threshold before insertion, so that a pathological hash_rehash + that fills every bucket can still trigger another rehash. + hash: manage bucket overflows more efficiently * lib/hash.c [USE_OBSTACK]: Delete; the optimizations provided by an obstack have been folded in to mainline code. diff --git a/lib/hash.c b/lib/hash.c index 9d78990..e4310b5 100644 --- a/lib/hash.c +++ b/lib/hash.c @@ -999,32 +999,6 @@ hash_insert (Hash_table *table, const void *entry) if ((data = hash_find_entry (table, entry, &bucket, false)) != NULL) return data; - /* ENTRY is not matched, it should be inserted. */ - - if (bucket->data) - { - struct hash_entry *new_entry = allocate_entry (table); - - if (new_entry == NULL) - return NULL; - if (table->overflow_size <= new_entry - table->overflow) - abort (); - - /* Add ENTRY in the overflow of the bucket. */ - - new_entry->data = (void *) entry; - new_entry->next = bucket->next; - bucket->next = new_entry - table->overflow; - table->n_entries++; - return (void *) entry; - } - - /* Add ENTRY right in the bucket head. */ - - bucket->data = (void *) entry; - table->n_entries++; - table->n_buckets_used++; - /* If the growth threshold of the buckets in use has been reached, increase the table size and rehash. There's no point in checking the number of entries: if the hashing function is ill-conditioned, rehashing is not @@ -1057,6 +1031,32 @@ hash_insert (Hash_table *table, const void *entry) } } + /* ENTRY is not matched, it should be inserted. */ + + if (bucket->data) + { + struct hash_entry *new_entry = allocate_entry (table); + + if (new_entry == NULL) + return NULL; + if (table->overflow_size <= new_entry - table->overflow) + abort (); + + /* Add ENTRY in the overflow of the bucket. */ + + new_entry->data = (void *) entry; + new_entry->next = bucket->next; + bucket->next = new_entry - table->overflow; + table->n_entries++; + return (void *) entry; + } + + /* Add ENTRY right in the bucket head. */ + + bucket->data = (void *) entry; + table->n_entries++; + table->n_buckets_used++; + return (void *) entry; } -- 1.6.3.rc3.2.g4b51