On Fri, Oct 25, 2013 at 9:11 AM, Stuart Halloway <[email protected]>wrote:
> To save others from having to read this whole thread, is the following > summary correct? > > 1. We believe that the performance issue Paul first encountered is > substantially / entirely caused by hash collisions. > Yes. > > 2. The hash collisions could be improved by spreading the hashes of > numbers (and not needing to mess with hashes of collections). > Yes. Increasing the spread of the number hashes wouldn't solve all the types of hash collisions we've discussed, but it would definitely help. However, as David pointed out, there's a tradeoff between increasing the complexity of the hashing function and the time saved by improving hash collisions. My intuition from playing around with this is that increasing the complexity of hashing numbers from the current, dead-simple scheme would be an overhead that would really be felt. Two advantages of messing with hashes at the collection level is that: a. Collections are already doing reasonably complex things during hashing, like traversing their elements. b. The hash, once computed, can be cached. Those two factors mean that adding a tiny amount of complexity to the hash function of a collection is more likely to help than to hurt performance. > > If the above summary is correct, then I have the following questions: > > a. Is the approach Scala took substantially #2 above, or something > different / more? > I don't believe that Scala altered the hashing of numbers, but I am not certain. All I know about Scala's hashing is what I've seen from following some of the links that David Nolen has posted here. It appears to me that they made a variety of small improvements for different kinds of hashing. For example, choosing different magic primes for strings versus collections (Clojure mimics Java and uses 31 everywhere) helped with certain kinds of collisions. The link David posted is specifically about how they improved case classes, which are similar to Clojure's records. It looks like they generate a custom hash function at compile time using knowledge of the types of the element and an algorithm that munges together the individual bytes of the items. So I'd say that it looks like they did something in the different / more category. > b. Is changing the hash codes of numbers going to break expectations on > the Java side of things? > I don't know. -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
