On Tue, Jul 05, 2005 at 08:13:41PM +0200, Paolo Carlini wrote:
> Hi Joe and Gaby and thanks for your messages,
> 
> >Close, but not quite.
> >
> >Hash functions are, by nature, many-to-one.  A good hash function has
> >few collisions for values that frequently appear; the program will preform
> >very poorly if many inputs hash to the same value.  The existing function
> >will make all values over max(size_t) collide (assuming the cast clips).
> >Using only the mantissa is better, but if doubles are used to
> >represent fixed-point, then common values like 0.25, 0.5, 1, 2, 4 might
> >all be in the hash table, and they will collide.
> >
> >You could do frexp, scale the mantissa to form an integer (e.g. multiply
> >by a large integer), then add it (modulo 2**n) to some prime number
> >multiplied by the exponent.  This should give good spreading for both
> >large values and exactly representable values.
> >
> Indeed, I can find around also more sophisticated solutions similar to
> your proposal, perhaps in the python library?!? I'm tempted to go along
> this way, but Gaby's idea also seems nice, opinions about it? We have to
> make a choice... or we can provide both and the user chooses... still we
> have to select a default ;)

If I were you I'd be tempted to crib from Python.  Because of the
centrality of good hashing for Python performance, I'm sure that they've
done a good job.

Reply via email to