On Feb 3, 12:22 am, Andy Fingerhut <[email protected]> wrote:
> I've done a pass through most of the Clojure programs on the shootout
> web site recently, making some of them faster, and choosing -Xmx
> command line arguments when running them to keep the memory usage down
> to a reasonable level -- not always the smallest heap size that works,
> mind you -- just one that avoids exorbitantly large memory usage.
>
> http://shootout.alioth.debian.org
>
> The Clojure program for the "fasta" problem, with source code, AOT
> compilation command, and execution command given on this web page:
>
> http://shootout.alioth.debian.org/u32/program.php?test=fasta&lang=clo...
>
> still takes about 6x to 8x more time than the best Java 6 -server
> program here, depending upon which of the four machines it is run on:
>
> http://shootout.alioth.debian.org/u32/program.php?test=fasta&lang=jav...
>
> I'm sure the Clojure program can be made faster, e.g. by doing fewer
> calls to write to the output file, with more bytes per call. Most of
> the time seems to be file writing and generating random numbers in gen-
> random!, at least on my systems where I've done testing and
> profiling. I'm also seeing a fair amount of time spent in calls to
> java.lang.Double.valueOf, according to the built-in profiler that
> comes with the Hotspot JVM.
>
> Note: The web site is Clojure 1.2 only right now, so don't expect a
> tweaked-out program using things that only work in Clojure 1.3 to work
> there yet.
This program is considerably faster on my computer:
(set! *warn-on-reflection* true)
(def *width* 60)
(def *lookup-size* 222000)
(def *alu* (str "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG"
"GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA"
"CCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAAT"
"ACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCA"
"GCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG"
"AGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCC"
"AGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA"))
(def *codes* "acgtBDHKMNRSVWY")
(def *iub* [0.27 0.12 0.12 0.27 0.02 0.02 0.02 0.02
0.02 0.02 0.02 0.02 0.02 0.02 0.02])
(def *homosapiens* [0.3029549426680 0.1979883004921
0.1975473066391 0.3015094502008])
(defn find-index [f coll]
(first (keep-indexed #(if (f %2) %1) coll)))
(def random-seed (int-array [42]))
(let [ IM (int 139968)
IA (int 3877)
IC (int 29573)
scale (double (/ *lookup-size* IM))
]
(defn gen-random-fast []
(let [ new-seed (unchecked-remainder (unchecked-add (unchecked-
multiply
(aget (ints random-seed) 0) IA) IC) IM) ]
(aset (ints random-seed) 0 new-seed)
(int (* new-seed scale)))))
;; Takes a vector of probabilities.
(defn make-cumulative [v]
(vec (map #(reduce + (subvec v 0 %)) (range 1 (inc (count v))))))
;; Takes a vector of cumulative probabilities.
(defn make-lookup-table [v]
(let [ lookup-scale (- *lookup-size* 0.0001)
tmp (map
(fn [n] (find-index #(<= (/ n lookup-scale) %) v))
(range *lookup-size*)) ]
(int-array tmp)))
(defn cycle-bytes [source source-size n ^java.io.BufferedOutputStream
ostream]
(let [ source-size (int source-size)
width (int *width*)
width+1 (int (inc width))
buffer-size (int (* width+1 4096))
buffer (byte-array buffer-size (byte 10))
next-i (fn[i]
(unchecked-remainder (unchecked-add (int i) width) source-
size))
next-j (fn[j]
(let [j (+ j width+1)]
(if (= j buffer-size)
(do (.write ostream buffer) 0)
j)))
]
(loop [i (int 0) j (int 0) n (int n)]
(System/arraycopy source i buffer j width)
(if (> n width)
(recur (int (next-i i)) (int (next-j j)) (- n width))
(do
(aset buffer (+ j n) (byte 10))
(.write ostream buffer 0 (+ j n 1)))))))
(defn fasta-repeat [n ^java.io.BufferedOutputStream ostream]
(let [ source (.getBytes (str *alu* *alu*)) ]
(cycle-bytes source (count *alu*) n ostream)))
(defn fasta-random [probs n ^java.io.BufferedOutputStream ostream]
(let [ codes (.getBytes (str *codes*))
lookup-table (ints (make-lookup-table (make-cumulative
probs)))
width (int *width*)
buffer (byte-array 222000)
seeds (int-array 222000)
first-seed (aget (ints random-seed) 0)
]
(loop [i (int 0)]
(aset seeds i (aget (ints random-seed) 0))
(aset buffer i
(aget codes
(aget lookup-table
(gen-random-fast))))
(if (= (aget (ints random-seed) 0) first-seed)
(do
(System/arraycopy buffer 0 buffer (inc i) *width*)
(cycle-bytes buffer (inc i) n ostream)
(aset (ints random-seed) 0 (aget seeds (mod n (inc i)))))
(recur (unchecked-inc i))))))
(defn write-line [s ^java.io.BufferedOutputStream stream]
(.write stream (.getBytes (str s "\n"))))
(let [ ostream (java.io.BufferedOutputStream. System/out)
arg (first *command-line-args*)
n (read-string (or arg "25000000"))
start-time (System/currentTimeMillis)
]
(write-line ">ONE Homo sapiens alu" ostream)
(fasta-repeat (* n 2) ostream)
(write-line ">TWO IUB ambiguity codes" ostream)
(fasta-random *iub* (* n 3) ostream)
(write-line ">THREE Homo sapiens frequency" ostream)
(fasta-random *homosapiens* (* n 5) ostream)
(.flush ostream)
(binding [*out* *err*]
(prn n)
(print (/ (- (System/currentTimeMillis) start-time) 1000.0))
(println " seconds"))
)
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en