Re: abysmal multicore performance, especially on AMD processors

Andy Fingerhut Tue, 11 Dec 2012 12:00:28 -0800

Marshall:

I'm not practiced in recognizing megamorphic call sites, so I could be missing 
some in the example code below, modified from Lee's original code.  It doesn't 
use reverse or conj, and as far as I can tell doesn't use PersistentList, 
either, only Cons.


(defn burn-cons [size]
  (let [size (long size)]
    (loop [i (long 0)
           value nil]
      (if (>= i size)
        (last value)
        (recur (inc i) (clojure.lang.Cons.
                        (* (int i)
                           (+ (float i)
                              (- (int i)
                                 (/ (float i)
                                    (inc (int i))))))
                        value))))))

(a) invoke (burn-cons 2000000) sequentially 64 times in a single JVM

(b) invoke (burn-cons 2000000) 64 times using a modified version of pmap that 
limits the number of active threads to 2 (see below), in a single JVM.  I would 
hope that this would take about half the elapsed time than (a), but the elapsed 
time is longer than (a)

(c) start up two JVMs simultaneously and invoke (burn-cons 2000000) 
sequentially 32 times in each.  The elapsed time here is less than (a), as I 
would expect.

(Clojure 1.4, Oracle/Apple JDK 1.6.0_37, Mac OS X 10.6.8, running on a machine 
with Intel core i7 with 4 physical cores but OS X reports it as 8 I think 
because of 2 hyperthreads per core -- more details available on request).

Can you try to reproduce to see if you get similar results?  If so, do you know 
why we get bad parallelism in a single JVM for this code?  If there are no 
megamorphic call sites, then it is examples like this that lead me to wonder 
about locking in memory allocation and/or GC.


With the functions below, my part (b) was measured by doing:

(time (doall (nthreads-pmap 2 (burn-cons 2000000) (unchunk (range 64)))))

Andy



(defn unchunk [s]
  (when (seq s)
    (lazy-seq
     (cons (first s)
           (unchunk (next s))))))

(defn nthreads-pmap
  "Like pmap, except can take an argument nthreads to control the
  maximum number of parallel threads used."
  ([f coll]
     (let [n (+ 2 (.. Runtime getRuntime availableProcessors))]
       (nthreads-pmap n f coll)))
  ([nthreads f coll]
     (if (= nthreads 1)
       (map f coll)
       (let [n (dec nthreads)
             rets (map #(future (f %)) coll)
             step (fn step [[x & xs :as vs] fs]
                    (lazy-seq
                     (if-let [s (seq fs)]
                       (cons (deref x) (step xs (rest s)))
                       (map deref vs))))]
         (step rets (drop n rets)))))
  ([nthreads f coll & colls]
   (let [step (fn step [cs]
                (lazy-seq
                 (let [ss (map seq cs)]
                   (when (every? identity ss)
                     (cons (map first ss) (step (map rest ss)))))))]
     (nthreads-pmap nthreads #(apply f %) (step (cons coll colls))))))


On Dec 11, 2012, at 10:06 AM, Marshall Bockrath-Vandegrift wrote:

> Lee Spector <[email protected]> writes:
> 
>> If the application does lots of "list processing" but does so with a
>> mix of Clojure list and sequence manipulation functions, then one
>> would have to write private, list/cons-only versions of all of these
>> things? That is -- overstating it a bit, to be sure, but perhaps not
>> entirely unfairly -- re-implement Clojure's Lisp?
> 
> I just did a quick look over clojure/core.clj, and `reverse` is the only
> function which stood out to me as hitting the most pathological case.
> Every other `conj` loop over a user-provided datastructure is `conj`ing
> into an explicit non-list/`Cons` type.
> 
> So I think if you replace your calls to `reverse` and any `conj` loops
> you have in your own code, you should see a perfectly reasonable
> speedup.
> 
> -Marshall
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to [email protected]
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: abysmal multicore performance, especially on AMD processors

Reply via email to