On Tue, Jul 05, 2005 at 08:13:41PM +0200, Paolo Carlini wrote: > Hi Joe and Gaby and thanks for your messages, > > >Close, but not quite. > > > >Hash functions are, by nature, many-to-one. A good hash function has > >few collisions for values that frequently appear; the program will preform > >very poorly if many inputs hash to the same value. The existing function > >will make all values over max(size_t) collide (assuming the cast clips). > >Using only the mantissa is better, but if doubles are used to > >represent fixed-point, then common values like 0.25, 0.5, 1, 2, 4 might > >all be in the hash table, and they will collide. > > > >You could do frexp, scale the mantissa to form an integer (e.g. multiply > >by a large integer), then add it (modulo 2**n) to some prime number > >multiplied by the exponent. This should give good spreading for both > >large values and exactly representable values. > > > Indeed, I can find around also more sophisticated solutions similar to > your proposal, perhaps in the python library?!? I'm tempted to go along > this way, but Gaby's idea also seems nice, opinions about it? We have to > make a choice... or we can provide both and the user chooses... still we > have to select a default ;)
If I were you I'd be tempted to crib from Python. Because of the centrality of good hashing for Python performance, I'm sure that they've done a good job.