Poor parallelization performance across 18 cores (but not 4)

David Iba Mon, 16 Nov 2015 21:38:51 -0800

I have functions f1 and f2 below, and let's say they run in T1 and T2 
amount of time when running a single instance/thread.  The issue I'm facing 
is that parallelizing f2 across 18 cores takes anywhere from 2-5X T2, and 
for more complex funcs takes absurdly long.



   1. (defn f1 []
   2.   (apply + (range 2e9)))
   3.  
   4. ;; Note: each call to (f2) makes its own x* atom, so the 'swap!' 
   should never retry.
   5. (defn f2 []
   6.   (let [x* (atom {})]
   7.     (loop [i 1e9]
   8.       (when-not (zero? i)
   9.         (swap! x* assoc :k i)
   10.         (recur (dec i))))))
   

Of note:
- On a 4-core machine, both f1 and f2 parallelize well (roungly T1 and T2 
for 4 runs in parallel)
- running 18 f1's in parallel on the 18-core machine also parallelizes well.
- Disabling hyperthreading doesn't help.
- Based on jvisualvm monitoring, doesn't seem to be GC-related
- also tried on dedicated 18-core ec2 instance with same issues, so not 
shared-tenancy-related
- if I make a jar that runs a single f2 and launch 18 in parallel, it 
parallelizes well (so I don't think it's machine/aws-related)

Could it be that the 18 f2's in parallel on a single JVM instance is 
overworking the STM with all the swap's?  Any other theories?

Thanks!

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Poor parallelization performance across 18 cores (but not 4)

Reply via email to