Re: Dubious performance characteristics of hash-maps

Edward Z. Yang Tue, 23 Feb 2010 10:56:51 -0800

Hello!

Excerpts from Rich Hickey's message of Tue Feb 23 09:41:28 -0500 2010:
> Yes, the methodology is wrong, at least on the Clojure/JVM side.
> First, benchmarking anything without -server is a waste of time.
> Second, the JVM does dynamic profile-driven compilation - performance
> on single cold runs is not indicative of optimal performance. Third,
> putting the random stream gen call in the let doesn't preallocate the
> sequence, nor move its execution there (and thus out of the timing),
> as Clojure's repeatedly/take etc are lazy. Fourth, it's possible to
> structure the code more like your Haskell IntMap code [1] (i.e. no
> interleave). Fifth, AOT compilation makes no difference in performance
> - Clojure code is always compiled.


Great, I was hoping that was the case.  The HotSpot JVM is definitely
a big part of the mix, and I hadn't realized that it does dynamic profile
driven compilation.  I went off and read Cliff Click's excellent presentation
on the subject.

Re laziness, on closer inspection it looks like the equivalent Haskell
is also generating the lazy values inside the loop, so actually they're
roughly equivalent.  I've rewritten the Haskell code (not uploaded yet)
to force the random generation before timing.

> Try this:
> 
> ;run with java -server -Xmx1024m

Good catch.  I will bump up the default heap size in the Haskell RTS
to 1G as well.

> (defn main [i]
>   (let [vals (vec (take (* i 2) (mk-random-stream)))]
>     (dotimes [_ 10]
>       (time
>        (let [m (apply hash-map (take i vals))]

If I understand this code correctly, hash-map will get called with
i arguments, which means it will be populated with i/2 values... probably
not the intended result? (I will admit, I've done a bit of hacking on
mit-scheme but my clojure is not quite up to speed, so I might be missing a
subtlety here).

>From David Nolen:
> Here's another version that's almost as fast as Rich's and it even includes
> the time for computing values.
> 
> (defn mk-random-map-fast [n]
>   (let [m (transient {})
>         r (new ec.util.MersenneTwisterFast)
>         ni (. r (nextInt))]
>     (loop [i 0 result m ni ni]
>       (if (= i n)
>         (persistent! m)
>         (recur (inc i) (assoc! m ni ni) (. r nextInt))))))

Aww, that's cheating! :-)  More seriously, the operation I'd like to test here
is the functional (non-destructive) update, so marking the structure as 
transient
sort of defeats the purpose of the benchmark.  It's true that in this particular
case, the transient variant would be much faster.  You also appear to do the 
real
sample size, which might be why you observed this version to be slightly slower
than Rich's.

I'll resetup my tests and try this out.  Thanks for the comments!

Cheers,
Edward

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Dubious performance characteristics of hash-maps

Reply via email to