I am required to process a huge XML file with 300,000 records. The
structure is like this:
<root>
<header>
....
</header>
<body>
<record>...</record>
<record>...</record>
... 299,998 more
</body>
</root>
Obviously, it is of key importance not to allocate memory for all the
records at once. If I do this:
(use ['clojure.contrib.lazy-xml :only ['parse-trim]])
(use ['clojure.java.io :only ['reader]])
(-> (parse-trim (reader "huge.xml"))
:content
second
:tag)
This should only parse the start-tag <body>, but it parses all the way
down to </body> -- at least it tries to, failing with
OutOfMemoryError.
Am I wrong in expecting the entire contents of body not to be
parsed? :content is supposed to be a lazy seq, so even if I access its
head, it should still not parse more than just the first <record>
element, right?
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en