I've been struggling to create a monadic parser. I'm parsing XML documents into a DOM tree. Step one is to use a SAX parser to create a stream of XML tokens; each a struct map. The token types are :start-element, :attribute, :text and :end-element.
Next I'm feeding this list of tokens into a monadic parser I've created following http://intensivesystems.net/tutorials/monads_101.html. The output should be a DOM tree structure; element nodes will contain an :attributes key (containing attribute tokens) and a :body key containing child text and elements nodes. I've had to make some minor changes to the tutorial's parser-m monad, to allow for the fact that I'm parsing tokens, not character strings. As I parse :attribute elements, I take the original element node and add the attribute token to the :attributes list inside the element node struct, creating the vector as necessary. I have this working for the simple case, where I only allow for at-most one attribute token. (defmonad parser-m [m-result (fn [x] (fn [tokens] (list x tokens))) m-bind (fn [parser action] (fn [tokens] (let [result (parser tokens)] (when-not (nil? result) ((action (first result)) (second result)))))) m-zero (fn [tokens] nil) m-plus (fn [& parsers] (fn [tokens] (first (drop-while nil? (map #(% tokens) parsers)))))]) ; And some actions and parser generators (defn any-token "Fundamental parser action: returns [first, rest] if tokens is not empty, nil otherwise." [tokens] (if (empty? tokens) nil ; This is what actually "consumes" the tokens seq (list (first tokens) (rest tokens)))) ; Utilities that will likely move elsewhere (defn add-to-key-list "Updates the map adding the value to the list stored in the indicated key." [map key value] (update-in map [key] #(conj (or % []) value))) (with-monad parser-m (defn token-test "Parser factory using a predicate. When a token matches the predicate, it becomes the new result." [pred] (domonad [t any-token :when (pred t)] ; return the matched token t)) (defn match-type "Parser factory that matches a particular token type (making the matched token the result), or returns nil." [type] (token-test #(= (% :type) type))) (defn optional [parser] (m-plus parser (m-result nil))) (declare element one-or-more) (defn none-or-more [parser] (optional (one-or-more parser))) (defn one-or-more [parser] (domonad [a parser as (none-or-more parser)] (cons a as))) (defn attribute "Parser generator that matches attribute tokens, adding them to :attributes key of the element-node, and returning the new element-node." [element-node] (domonad [attribute-token (match-type :attribute)] (add-to-key-list element-node :attributes attribute-token))) (defn text "Parser generator that matches text tokens, adding them to the :body key of the element-node, and returning the new element-node." [element-node] (domonad [text-token (match-type :text)] (add-to-key-list element-node :body text-token))) (def match-first m-plus) (defn body-token [element-node] (match-first (attribute element-node) (text element-node) (element element-node))) (defn process-body [element-node] (domonad [modified-element (optional (body-token element-node))] (or modified-element element-node))) (defn element "Parses an element token into an element-node, and then adds the fully constructed element-node to the :body of the containing element node." [container-element-node] (domonad [token (match-type :start-element) element-node (m-result (struct element-node :element token)) assembled-node (process-body element-node) _ (match-type :end-element)] (add-to-key-list container-element-node :body assembled-node))) (def template ; Creates a psuedo-element (an empty map) and parses the root element into it, then extracts the root element. ; Will eventually have to expand to support text and comments, etc., around the root element." (domonad [root (element {})] (first (root :body)))) ) Here's the problem: I need to be able to handle multiple attribute tokens, as well as :text and sub-elements. To me, that means that process-body should be recursive, but we need to pass each modification of the element on each iteration (to accumlate additions to the :attributes and :body keys). However, when I do the following: (defn process-body [element-node] (domonad [modified-element (optional (body-token element-node)) :when modified-element final-element (process-body modified-element)] (or final-element modified-element element-node))) Which, to me, says "parse a body token and apply changes to the original element-node, then recurse; if the current token is not a body token, then abort, returning the initial element unchanged." Unfortunately, this breaks the parser entirely, with nothing parsing, not even a trivial document of just a root element with no attributes. I've tried variations with (none-or-more), but what happens is the initial element-node appears to be passed to the (body-token) parser repeatedly with the end result being that only the final change (the last attribute node in my current test) is recorded, and the prior changes are lost. Perhaps I'm misunderstanding what the :when guard does? I think I'm missing something subtle here (well, obviously, it doesn't work!) but my head keeps spinning a little. Within a parser monad, it's easy to get tripped up by parsers (monadic functions: accept state and return actions), actions (monadic values: functions that accept state and return result and new state) and parser generators (that return parsers). Thanks! -- Howard M. Lewis Ship Creator of Apache Tapestry Director of Open Source Technology at Formos --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---
