On Wed, Jun 29, 2011 at 8:17 AM, Kai Serschmarn <serschm...@googlemail.com> wrote: > Hi all, > > this is my first post in this mailing group. I hope that anyboby could help > me parsing a xml file. > I found this website http://www.omegahat.org/RSXML/gettingStarted.html but > unfortunately my XML file is not as easy as the one in the example. > > Example: > > <?xml version="1.0" encoding="UTF-8"?> > <?xml-stylesheet > href="http://werdis.dwd.de/css/UNIDART/climateTimeseriesOrderByStation.xsl" > type="text/xsl"?> > <data xmlns="http://www.unidart.eu/xsd" > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xsi:schemaLocation="http://www.unidart.eu/xsd > http://werdis.dwd.de/conf/timeseriesExchangeType.xsd"> > <stationname value="Aachen"> > <v date="2011-04-01" qualityLevel="high" latitude="50.7839" > longitude="6.0947" altitude="202" unitA="m" geoQualityLevel="certain" > unitV="degree C">14.1</v> > <v date="2011-04-02">17.6</v> > <v date="2011-04-03">11.5</v> > <v date="2011-04-04">10.0</v> > <v date="2011-04-05" qualityLevel="low">9.6</v> > <v date="2011-04-06">16.0</v> > </stationname> > <stationname value="Ahaus"> > <v date="2011-04-01" qualityLevel="high" latitude="52.0828" > longitude="6.9417" altitude="45.5" unitA="m" geoQualityLevel="certain" > unitV="degree C">12.5</v> > <v date="2011-04-02">15.9</v> > <v date="2011-04-03">12.0</v> > <v date="2011-04-04">10.1</v> > <v date="2011-04-05">8.8</v> > <v date="2011-04-06">13.5</v> > </stationname> > </data> > > > I would like to get a table in R like this: > > stationname date value > Aachen 2011-04-01 14.1 > Aachen 2011-04-01 17.6 > . > . > . > Ahaus 2011-04-06 13.5 > > I tried to do this: > > doc = xmlRoot(xmlTreeParse("de.dwd.klis.TADM.xml")) > tmp = xmlSApply(doc, function(x) xmlSApply(x, xmlValue))
You can loop over the doc to get to <stationname> elements, then loop over that list to get <v> elements. Then extract the node values and attributes with some assorted selectors: dumpData <- function(doc){ for(i in 1:length(doc)){ stns = doc[[i]] for (j in 1:length(stns)){ cat(stns$attributes['value'],stns[[j]][[1]]$value,stns[[j]]$attributes['date'],"\n") } } } Run that on your doc to see it printed out. Save to a data frame if that's what you need. This is not the perfect way to do it, since if you have other (non <stationname> or <v>) elements it'll try and handle those too, and fail. There's probably a way of looping over all <stationname> elements but XML makes me feel sick when I try and remember how to parse it in R at this time of the morning. its probably in the docs but this should get you started. Barry ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.