On Tuesday, May 5, 2015 at 11:18:56 PM UTC-4, Sam Raker wrote:
>
> I've got two really big CSV files that I need to compare. Stay tuned for
> the semi-inevitable "how do I optimize over this M x N space?" question,
> but for now I'm still trying to get the data into a reasonable format--I'm
> planning on converting each line into a map, with keys coming from either
> the first line of the file, or a separate list I was given. Non-lazy
> approaches run into memory limitations; lazy approaches run into "Stream
> closed" exceptions while trying to coordinate `with-open` and `line-seq`.
> Given that memory is already tight, I'd like to avoid leaving open
> files/file descriptors/readers/whatever-the-term-in-clojure-is lying
> around. I've tried writing a macro, I've tried transducers, I've tried
> passing around the open reader along with the lazy seq, none successfully,
> albeit none necessarily particularly well. Any suggestions on streaming
> such big files?
>
Something like this didn't work?
(with-open [rdr1 ...
rdr2 ...]
(let [l1 (line-seq rdr1)
l2 (line-seq rdr2)]
(->> (map something l1 l2)
(filter whatever)
(first))))
For instance, to check if two text files are the same, something would be
not= and whatever would be identity, and the result would be nil if they
were the same, and something truthy otherwise. The first has the effect of
short circuiting when the result is known, and neither line-seq's head
should be held. The first also has the effect of ensuring the with-open
scope is not left until as much of both line-seqs are consumed as will be
needed. Reduce and the use of trans/reducers that get reduced would have
the same effect.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.