On Fri, Oct 25, 2013 at 9:11 AM, Stuart Halloway
<[email protected]>wrote:

> To save others from having to read this whole thread, is the following
> summary correct?
>
> 1. We believe that the performance issue Paul first encountered is
> substantially / entirely caused by hash collisions.
>

Yes.


>
> 2. The hash collisions could be improved by spreading the hashes of
> numbers (and not needing to mess with hashes of collections).
>

Yes.  Increasing the spread of the number hashes wouldn't solve all the
types of hash collisions we've discussed, but it would definitely help.
However, as David pointed out, there's a tradeoff between increasing the
complexity of the hashing function and the time saved by improving hash
collisions.  My intuition from playing around with this is that increasing
the complexity of hashing numbers from the current, dead-simple scheme
would be an overhead that would really be felt.  Two advantages of messing
with hashes at the collection level is that:
a. Collections are already doing reasonably complex things during hashing,
like traversing their elements.
b. The hash, once computed, can be cached.
Those two factors mean that adding a tiny amount of complexity to the hash
function of a collection is more likely to help than to hurt performance.



>
> If the above summary is correct, then I have the following questions:
>
> a. Is the approach Scala took substantially #2 above, or something
> different / more?
>

I don't believe that Scala altered the hashing of numbers, but I am not
certain.  All I know about Scala's hashing is what I've seen from following
some of the links that David Nolen has posted here.  It appears to me that
they made a variety of small improvements for different kinds of hashing.
For example, choosing different magic primes for strings versus collections
(Clojure mimics Java and uses 31 everywhere) helped with certain kinds of
collisions.  The link David posted is specifically about how they improved
case classes, which are similar to Clojure's records.  It looks like they
generate a custom hash function at compile time using knowledge of the
types of the element and an algorithm that munges together the individual
bytes of the items.  So I'd say that it looks like they did something in
the different / more category.


> b. Is changing the hash codes of numbers going to break expectations on
> the Java side of things?
>

I don't know.

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to