Yet another approach that might work for you, depending on your
requirements, is to use a lazy sequence to access your data. I did that
for a load of Twitter data that would have been too large to hold in memory
at any one time.
Here's the relevant bit (I think), copied and pasted:
(defn out-files [dir-name]
(let [dir (jio/file dir-name)]
(map #(str (jio/file dir %))
(sort (filter #(.startsWith % "out") (.list dir))))))
(defn tweet-seq [dir-name]
(map json/read-json
(mapcat #(with-open [r (jio/reader %)] (doall (line-seq r)))
(out-files dir-name))))
In context: https://gist.github.com/2357604
Ali
On Wednesday, 11 April 2012 07:08:11 UTC+1, Andy Fingerhut wrote:
>
> On Apr 9, 2012, at 10:05 PM, Andy Wu wrote:
>
> > Hi there,
> >
> > I'm studying algo-class.org, and one of it's programming assignment
> > gives you a file containing contents like below:
> > 1 2
> > 1 7
> > 2 100
> > ...
> >
> > There is roughly over 5 million lines, and i want to first construct a
> > vector of vector of integers for further process:
> > [[1 2][1 7][2 100]...]
> >
> > Below is what the code looks like:
> >
> > (def int-vec (with-open [rdr (clojure.java.io/reader "<file name>")]
> > (doall (map convert (line-seq rdr)))))
> >
> > and this leads to OutOfMemory Error. I tried to generate a vector with
> > random intergers, and that wont introduce the error. So I guess it is
> > the temp objects in convert(it break down a line in a list of strings,
> > and then do the convert to integer) that are causing the memory issue.
> >
> > Can someone advice me what would be a ideal way to handle this case in
> > clojure?
>
>
> Most likely any way you want to do it will require more memory than the
> default heap size that your JVM has. You can increase the heap size using
> the -Xmx<num_megabytes>m command, e.g.
>
> java -Xmx2048m -cp clojure.jar clojure.main
>
> Replace the other command line arguments with what you need if they don't
> match what I wrote. That might be all you need, assuming you have enough
> RAM for the job.
>
> If that isn't all you need, you can consider using different data
> structures that require less memory. One possibility is mutable Java
> arrays. Another more Clojure-y method is (vector-of :int 1 2), which
> creates an immutable Clojure vector that can only hold Java primitive ints.
>
> Andy
>
>
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en