Eric Blake wrote: > According to Eric Blake on 6/17/2009 12:21 PM: >> Before the rehash, about 20% of the buckets were empty. But after changing >> the >> prime factor, the hasher function scatters better, and _every single bucket_ >> is >> occupied. Which means ALL subsequent calls to hash_insert will trigger an >> overflow, rather than inserting into a bucket head. And since the resize >> checking logic in hash_rehash is only triggered after insertion into a bucket >> head, the hash table has effectively been locked into a fixed size that will >> never grow again, no matter how many hash_insert calls are made, even though >> the table is certainly exceeding the requested threshold of 20% empty >> buckets. >> Ouch. > > I'm now playing with this simple code motion patch, also available at: > http://repo.or.cz/w/gnulib/ericb.git > $ git pull git://repo.or.cz/gnulib/ericb.git master > > -- > Don't work too hard, make some time for fun as well! > > Eric Blake e...@byu.net >>From d3edc0d1e302af613e8b56396f1f9d0bec15a264 Mon Sep 17 00:00:00 2001 > From: Eric Blake <e...@byu.net> > Date: Wed, 17 Jun 2009 13:01:41 -0600 > Subject: [PATCH] hash: check for resize before insertion > > * lib/hash.c (hash_insert): Check whether bucket usage exceeds > threshold before insertion, so that a pathological hash_rehash > that fills every bucket can still trigger another rehash. > > Signed-off-by: Eric Blake <e...@byu.net> > --- > ChangeLog | 5 +++++ > lib/hash.c | 52 ++++++++++++++++++++++++++-------------------------- > 2 files changed, 31 insertions(+), 26 deletions(-) > > diff --git a/ChangeLog b/ChangeLog > index 1f57b7e..38c1d45 100644 > --- a/ChangeLog > +++ b/ChangeLog > @@ -1,5 +1,10 @@ > 2009-06-17 Eric Blake <e...@byu.net> > > + hash: check for resize before insertion > + * lib/hash.c (hash_insert): Check whether bucket usage exceeds > + threshold before insertion, so that a pathological hash_rehash > + that fills every bucket can still trigger another rehash. > + > hash: manage bucket overflows more efficiently > * lib/hash.c [USE_OBSTACK]: Delete; the optimizations provided by > an obstack have been folded in to mainline code. > diff --git a/lib/hash.c b/lib/hash.c > index 9d78990..e4310b5 100644 > --- a/lib/hash.c > +++ b/lib/hash.c > @@ -999,32 +999,6 @@ hash_insert (Hash_table *table, const void *entry) > if ((data = hash_find_entry (table, entry, &bucket, false)) != NULL) > return data; > > - /* ENTRY is not matched, it should be inserted. */ > - > - if (bucket->data) > - { > - struct hash_entry *new_entry = allocate_entry (table); > - > - if (new_entry == NULL) > - return NULL; > - if (table->overflow_size <= new_entry - table->overflow) > - abort (); > - > - /* Add ENTRY in the overflow of the bucket. */ > - > - new_entry->data = (void *) entry; > - new_entry->next = bucket->next; > - bucket->next = new_entry - table->overflow; > - table->n_entries++; > - return (void *) entry; > - } > - > - /* Add ENTRY right in the bucket head. */ > - > - bucket->data = (void *) entry; > - table->n_entries++; > - table->n_buckets_used++; > - > /* If the growth threshold of the buckets in use has been reached, increase > the table size and rehash. There's no point in checking the number of > entries: if the hashing function is ill-conditioned, rehashing is not > @@ -1057,6 +1031,32 @@ hash_insert (Hash_table *table, const void *entry) > } > }
You can't use a pre-rehash "bucket" here (after rehash), since it is no longer valid. Considering the cost of rehashing (and its relative infrequency), one more hash_find_entry won't even show up on the radar. > + /* ENTRY is not matched, it should be inserted. */ > + > + if (bucket->data) > + { > + struct hash_entry *new_entry = allocate_entry (table); > + > + if (new_entry == NULL) > + return NULL; > + if (table->overflow_size <= new_entry - table->overflow) > + abort (); > + > + /* Add ENTRY in the overflow of the bucket. */ > + > + new_entry->data = (void *) entry; > + new_entry->next = bucket->next; > + bucket->next = new_entry - table->overflow; > + table->n_entries++; > + return (void *) entry; > + } > + > + /* Add ENTRY right in the bucket head. */ > + > + bucket->data = (void *) entry; > + table->n_entries++; > + table->n_buckets_used++; > + > return (void *) entry; > }