I needed a random sampling function for work and wrote this in
Clojure.
(defn random-sample
"Take a random sample of size `sample-size' from the `items'
sequence.
This uses Algorithm R -- random sampling with a reservoir. It
requires O(`sample-size') space and does not need to know the size of
`items' ahead of time. Described in Knuth's The Art of Computer
Programming, volume 2."
[sample-size items]
(let [maybe-insert-item
(fn [[num current] item]
(if (< num sample-size)
[(inc num) (conj current item)]
(let [index (rand-int num)]
(if (< index sample-size)
[(inc num) (assoc current index item)]
[(inc num) current]))))]
(second (reduce maybe-insert-item [0 []] items))))
It seems to work fine, but it strikes me as a little busy with ()[],
even for Lisp. Does anyone have any pointers? Will the internal
function "maybe-insert-item" be evaluated every time "random-sample"
is called, and if so, is that worth worrying about wrt. program
optimization?
Does this sort of thing belong in clojure-contrib or somewhere else?
It seems awfully short but it's a nice function to have, at least for
me.
Thanks,
Steve
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---