On Mon, Jan 24, 2011 at 10:29 AM, Michael Gardner <[email protected]> wrote:
> Thanks for the work, Ken.
You're welcome.
> Clojure's multi-threaded performance can be mysterious indeed
I think that may be generally true of multi-processing of every kind. :)
> Anyway, for now I will probably just stick with pmap + shuffle, since it's
> dead-simple
> and should be enough my purposes.
Probably a good idea, if keeping the input in order is not important.
If you do need to keep the ordering, you can still do the items in a
somewhat perturbed order, if you're willing to give up some laziness:
(defn permute
"Applies a permutation to a vector. Permutation should be a vector
of same length with (range n) in some shuffled order. If v is shorter
than perm, just returns v."
[perm v]
(if (< (count v) (count perm))
v
(vec (map #(nth v %) perm))))
(defn i-perm
"Inverts a permutation."
[perm]
(let [r (range (count perm))]
(persistent!
(reduce
(fn [out [pi ix]] (assoc! out pi ix))
(transient (vec r))
(map vector perm r)))))
(defn shuffle
"Shuffles a vector"
[v]
(let [s (count v)]
(loop [out (transient v) n (dec s)]
(if (zero? n)
(persistent! out)
(let [i (rand-int (inc n))
t (out n)]
(recur
(assoc!
(assoc! out n (out i))
i t)
(dec n)))))))
Test that shuffle works:
user=> (frequencies (map shuffle (take 500 (repeat [1 2 3]))))
{[3 1 2] 88,
[2 3 1] 91,
[1 2 3] 83,
[2 1 3] 84,
[3 2 1] 82,
[1 3 2] 72}
All six possible permutations occur, at reasonably similar rates.
(defn jumbleize
"Returns a pair of functions one of which will jumble and one of
which will unjumble a seq; the seq is chunked into parts of
size n and those parts are jumbled. The functions are lazy in
that only one chunk is realized at a time."
[n]
(let [rn (vec (range n))
perms (repeatedly #(shuffle rn))
ips (map i-perm perms)
jumble (fn [s perm-seq]
(mapcat permute
perm-seq
(map vec (partition n n [] s))))]
[#(jumble % perms) #(jumble % ips)]))
(defn jumbled-pmap
"Like pmap, only the order in which it sees sequence elements is jumbled
somewhat, which may improve efficiency if easy and difficult jobs have a
regular arrangement in the input. Chunks the input by chunk-size and
jumbles each chunk internally before calling pmap; puts the output back
in order."
[chunk-size f & colls]
(let [[jumbler unjumbler] (jumbleize chunk-size)]
(unjumbler (apply pmap f (map jumbler colls)))))
user=> (drop 97 (jumbled-pmap 16 #(* % %) (range 100)))
(9409 9604 9801)
user=> (take 3 (drop 50 (jumbled-pmap 16 #(* % %) (range 100))))
(2500 2601 2704)
user=> (take 3 (drop 50 (jumbled-pmap 16 #(+ (* 1000 %1) %2) (range
100) (range 100 180))))
(50150 51151 52152)
That last, BTW, shows it working properly even with multiple input
colls of different lengths.
user=> (take 3 (drop 50 (jumbled-pmap 16 #(+ (* 1000 %1) %2) (iterate
inc 0) (drop 100 (iterate inc 0)))))
(50150 51151 52152)
And this shows that it's still lazy enough not to blow up on infinite inputs. :)
The internal sequences of random permutations and their inverses, as
well as the input sequences, lose their heads:
user=> (take 3 (drop 50000000 (jumbled-pmap 16 #(+ (* 1000 %1) %2)
(iterate inc 0) (drop 100 (iterate inc 0)))))
(50050000100
50050001101
50050002102)
Memory use stayed constant at a bit more than 64MB, which was probably
the -Xms parameter to the JVM instance Enclojure ran its REPL in.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en