Wacek Kusnierczyk wrote:
Duncan Temple Lang wrote:
Wacek Kusnierczyk wrote:
Don MacQueen wrote:
I have an XML file that has within it the coordinates of some polygons
that I would like to extract and use in R. The polygons are nested
rather deeply. For example, I found by trial and error that I can
extract the coordinates of one of them using functions from the XML
package:
doc <- xmlInternalTreeParse('doc.kml')
docroot <- xmlRoot(doc)
pgon <-
try
lapply(
xpathSApply(doc, '//Polygon',
xpathSApply, '//coordinates', function(node)
strsplit(xmlValue(node), split=',|\\s+')),
as.numeric)
Just for the record, I the xpath expression in the
second xpathSApply would need to be
".//coordinates"
to start searching from the previously matched Polygon node.
Otherwise, the search starts from the top of the document again.
not really: the xpath pattern '//coordinates' does say 'find all
coordinates nodes searching from the root', but the root here is not the
original root of the whole document, but each polygon node in turn.
try:
root = xmlInternalTreeParse('
<root>
<foo>
<bar>1</bar>
</foo>
<foo>
<bar>2</bar>
</foo>
</root>')
xpathApply(root, '//foo', xpathSApply, '//bar', xmlValue)
# equals list("1", "2"), not list(c("1", "2"), c("1", "2"))
Just for the record and to avoid confusion for anyone reading the
archives in the future, the behaviour displayed above is from an old
version of the XML package (mid 2008). Subsequent versions yield
the second result as the //bar works from the root of the document.
But using .//bar would search from the foo down the sub-tree.
The reason for this is that, having used XPath to get a node, e.g. foo,
we often want to go back up the XML tree from that current node, e.g.
../
./ancestor::foo
and so on.
D.
this is not equivalent to
xpathApply(root, '//foo', function(foo) xpathSApply(root, '//bar',
xmlValue))
but to
xpathApply(root, '//foo', function(foo) xpathSApply(foo, '//bar',
xmlValue))
as the author of the XML package, you should know ;)
However, it would seem that
xpathSApply(doc, "//Polygon//coordinates",
function(node) strsplit(.....))
would be more direct, i.e. fetch the coordinates nodes in single
XPath expression.
yes, in this case it would; i was not sure about the concrete schema.
i copied the code from my solution to some other problem, where polygon
would have multiple coordinates nodes which would have to be merged in
some way for each polygon separately -- your solution would return the
content of each coordinates nodes separately irrespectively of whether
it is unique within the polygon (which might well be in this particular
case, and thus your solution is undeniably more elegant).
vQ
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.