Hi Pratt -- ppatel3026 <[EMAIL PROTECTED]> writes:
> Could someone provide a link or examples of parsing XML document in R? Few > specific questions below: Always helpful to know what software you're using; here's mine > library(XML) > sessionInfo() R version 2.8.0 Under development (unstable) (2008-06-09 r45889) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics utils datasets grDevices methods base other attached packages: [1] XML_1.95-2 loaded via a namespace (and not attached): [1] tools_2.8.0 > For instance I can retrieve specific nodes using this: > node <- xpathApply(xml, "//" %+% xtag, xmlValue) > > 1) I want to be able to retrieve parent node for this node, how can I do > this? getParentNode() does not seem to cut it. I've found it easier to use xpath and the 'internal' representation > library(XML) > f <- system.file("exampleData", "mtcars.xml", package="XML") > xml <- xmlTreeParse(f, useInternal=TRUE) > q <- "//[EMAIL PROTECTED]'AMC Javelin']" > nodes <- xpathApply(xml, q) nodes is a list of length one. Here's the parent of the first (and only) element > parent <- xmlParent(nodes[[1]]) or > xpathApply(xml, paste(q, "/.."))[[1]] > 2) How can I retrieve children nodes for a particular node? > xmlChildren(parent) or for parent identified by path pq <- "dataset" > xpathApply(xml, paste(pq, "/*")) > 3) How can I create an iterator to iterate through the whole tree? For true event parsing I think you want xmlEventParse, which traverses the tree and invokes the argument 'handlers' on each node. 'handlers' is a named list of functions, the name either signifying a general type of position in the tree (e.g.,'startElement') or name of node (e.g., 'record'). So > handler <- list(startElement=function(name, atts, ...) { + cat("starting", name, "\n") + }) > xmlEventParse(f, handler) starting dataset starting variables starting variable [etc] The usual 'trick' is to use R's lexical scope to provide a context where results can be stored, e.g., defining a factory to produce handlers handlerFactory <- function() { ## 'local' store visible to functions defined inside ## handlerFactory counts <- new.env(parent=emptyenv()) ## return value -- list of functions list(startElement=function(name, atts, ...) { ## lexical scope often requires use of <<- rather than <- if (!exists(name, counts)) counts[[name]] <- 1 else counts[[name]] <- counts[[name]] + 1 }, getCounts=function() { ## for retrieving results as.list(counts) }) } Then invoke xmlEventParse with an instance of the handler. xmlEventParse actually returns the handler, which by the end of xmlEventParse has 'counts' modified appropriately. We access the results by invoking our getCounts function. > xmlEventParse(f, handlerFactory())$getCounts() $record [1] 32 $variable [1] 11 [etc] If the use of lexical scope is a bit mysterious, there is a 'bank account' example in the Introduction to R manual (section 10.7) and a paper by Ross Ihaka and Robert Gentleman on lexical scope (referenced at http://www.r-project.org/doc/bib/R-other.html) that might help. I don't usually use event parsing, so the above may not be accurate. Martin > Thank you, > Pratt > -- > View this message in context: > http://www.nabble.com/Parse-XML-tp17757373p17757373.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.