Wacek Kusnierczyk wrote:
Don MacQueen wrote:
I have an XML file that has within it the coordinates of some polygons
that I would like to extract and use in R. The polygons are nested
rather deeply. For example, I found by trial and error that I can
extract the coordinates of one of them using functions from the XML
package:
doc <- xmlInternalTreeParse('doc.kml')
docroot <- xmlRoot(doc)
pgon <-
try
lapply(
xpathSApply(doc, '//Polygon',
xpathSApply, '//coordinates', function(node)
strsplit(xmlValue(node), split=',|\\s+')),
as.numeric)
Just for the record, I the xpath expression in the
second xpathSApply would need to be
".//coordinates"
to start searching from the previously matched Polygon node.
Otherwise, the search starts from the top of the document again.
However, it would seem that
xpathSApply(doc, "//Polygon//coordinates",
function(node) strsplit(.....))
would be more direct, i.e. fetch the coordinates nodes in single
XPath expression.
D.
which should find all polygon nodes, extract the coordinates node for
each polygon separately, split the coordinates string by comma and
convert to a numeric vector, and then report a list of such vectors, one
vector per polygon.
i've tried it on some dummy data made up from your example below. the
xpath patterns may need to be adjusted, depending on the actual
structure of your xml file, as may the strsplit pattern.
vQ
but this is hardly general!
I'm hoping there is some relatively straightforward way to use
functions from the XML package to recursively descend the structure
and return the text strings representing the polygons into, say, a
list with as many elements as there are polygons. I've been looking at
several XML documentation files downloaded from
http://www.omegahat.org/RSXML/ , but since my understanding of XML is
weak at best, I'm having trouble. I can deal with converting the text
strings to an R object suitable for plotting etc.
Here's a look at the structure of this file
graphics[5]% grep Polygon doc.kml
<Polygon id="15342">
</Polygon>
<Polygon id="1073">
</Polygon>
<Polygon id="16508">
</Polygon>
<Polygon id="18665">
</Polygon>
<Polygon id="32903">
</Polygon>
<Polygon id="5232">
</Polygon>
And each of the <Polygon> </Polygon> pairs has <coordinates> as per
this example:
<Polygon id="15342">
<outerBoundaryIs>
<LinearRing id="11467">
<coordinates>
-23.679835352296,30.263840290388,5.000000000000001
-23.68138782285701,30.264740875186,5.000000000000001
[snip]
-23.679835352296,30.263840290388,5.000000000000001
-23.679835352296,30.263840290388,5.000000000000001 </coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
Thanks!
-Don
p.s.
There is a lot of other stuff in this file, i.e, some points, and
attributes of the points such as color, as well as a legend describing
what the polygons mean, but I can get by without all that stuff, at
least for now.
Note also that readOGR() would in principle work, but the underlying
OGR libraries have some limitations that this file exceeds. Per info
at http://www.gdal.org/ogr/drv_kml.html.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.