Good question. Every lib that came to mind when I saw
clojure.data.xml/parse's
tree of Elements {:tag _,
:attrs _, :content _} only works on zippers which apparently sit in memory.
One option is to use `clojure.data.xml/source-seq` to get back a lazy
sequence
of Events {:type _, :name _, :attrs _, :str _} where the event :name is
either
:start-element, :end-element, or :characters.
For example, "<strong>Hello</strong>" would parse into the events
[:start-element "strong"], [:characters "Hello"], [:end-element "strong"].
You
could use loop/recur to manage state as your consume the sequence.
That's actually how I'm used to working with SAX parsers anyways. Here are
some
naive Ruby examples if it's new to you:
https://gist.github.com/danneu/3977120.
Of course, I imagine the ideal solution would involve some way to express
selectors on the
Element tree like I'm used to doing with raynes/laser on zippers:
https://github.com/Raynes/laser/blob/master/docs/guide.md#screen-scraping.
On Tuesday, December 17, 2013 4:57:32 AM UTC-6, Peter Ullah wrote:
>
>
> Hi all,
>
> I'm attempting to parse a large (500MB) XML, specifically I am trying to
> extract various parts using XPath. I've been using the examples presented
> here:
> http://clojure-doc.org/articles/tutorials/parsing_xml_with_zippers.html
> and all was going when tested against small files, however now that I am
> using the larger file Fireplace/Vim just hangs and my laptop gets hot then
> I get a memory exception.
>
> I've been playing around with various other libraries such as
> clojure.data.xml and and found that the following works perfectly well for
> parsing... but when I come to search inside root, things start to snarl up
> again.
>
> (ns example.core
> (:require [clojure.java.io :as java.io]
> [clojure.data.xml :as data.xml]
> ))
>
> (def large-file "/path-to-large-file")
>
> ;; using clojure.data.xml returns quickly with no problems whereas
> clojure.xml/parse from the link above causes problems..
> (def root
> ( -> large-file
> java.io/input-stream
> data.xml/parse
> ))
>
> (class root) ;clojure.data.xml.Element
>
> Does anyone know a way of searching within root that won't consume the
> heap?
>
> Forgive me, I'm new to Clojure and these forums, I've searched through
> previous posts but not managed to answer my own question.
>
> Thanks in advance.
>
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.