Harold -- you'll really want to check out the XML package. xmlTreeParse + xpathApply provides a very flexible solution. As a recent example, parsing 189 XML files to extract 4 attributes from deeply nested elements into a data frame:
fls <- list.files('~/runBrowser', pattern=".*xml", full=TRUE) f <- function(fl) { xq <- function(xml, q) unlist(xpathApply(xml, q, xmlValue, namespaces="xsi")) xml <- xmlTreeParse(fl, useInternal=TRUE) data.frame(idx=rep(as.numeric(xq(xml, "//xsi:tile/@idx")), each=4), lane=rep(as.numeric(xq(xml, "//xsi:tile/@lane")), each=4), base=xq(xml, '//xsi:image/@base'), medSigInt=as.numeric(xq(xml, "//xsi:sgnInt/@median"))) } res <- do.call('rbind', lapply(fls, f)) 'res' has 54800 rows and 4 columns. The XML stays in C, so this is fast. The data can be effectively (your mileage may vary) visualized with lattice, e.g., xyplot(log(medSigInt)~idx|lane*base, res, strip=FALSE, pch=".", cex=2) Martin Doran, Harold wrote: > I'm not sure it is possible to parse an XML file in R directly. Well, I > guess it's *possible*, but may not be the best way to do it. ElementTree > in Python is an easy-to-use parser that you might use to first parse > your XML file (or others hierarchically structured data), organize it > anyway you want, and then bring those data into R for subsequent > analysis. > > In fact, I have recently done just this. I have another statistical > program that outputs data as an XML file. So, I wrote a python program > that parses that XML file, pulls out the data of interest into a text > file, and then I bring those data into R for analysis. > >> -----Original Message----- >> From: [EMAIL PROTECTED] >> [mailto:[EMAIL PROTECTED] On Behalf Of Keith Alan >> Chamberlain >> Sent: Thursday, April 10, 2008 4:14 PM >> To: r-help@r-project.org >> Subject: [R] Relational Databases or XML? >> >> Dear R-Help, >> >> I am working on a paper in an R course for large file support >> in R using scan(), relational databases, and XML. I have >> never used SQL or heirarchical document formats such as XML >> (except where it occurs without user interaction), and >> knowledge in RDBs and XML is lacking in my program. I have >> tried finding a working example for the novices-novice on the >> topic, read many postings, the r-data I/O manual several >> times, and descriptions of packages RODBC, DBI, XML, among >> others. I understand that RDBs are (assumed at least) used >> widely among the R community. I have not been able to put all >> of the pieces together, but assuming that RDB use is actually >> quite widespread, it should be quite easy to fill me in >> and/or correct my understanding where necessary. >> >> For a cross-platform solution (PC/OSX at least, or in part) >> my questions/problems are about what preliminary steps are >> needed to get an SQL or XML query "to work" in R to begin >> with, what the appropriate data-file formats are, and how to >> convert to them if starting out with data in, say, a >> delimited ASCII text file. Very basic examples should >> suffice, say, a table with 20 random observations, a grouping >> variable with 2 levels, and a factor with 2 levels. >> >> ## untested code >> set.seed(1024) >> write.table("junk.txt", >> data.frame(Subj=c(rep(1,10),rep(2,10)),block=rep(c(rep(-1,5),r >> ep(1,5)),2), obs=rnorm(20,0,1))) >> >> Specifically, >> >> 1- what are the minimum required non R components that are >> needed to support SQL or XML functionality, which may or may >> not need to be installed? >> >> 2- what R packages need to be installed, at a minimum (also >> as a cross-PC/Mac solution if possible or at least as much as >> possible) >> >> 3- I keep seeing reference to connections of a given name "if >> previously setup". What kind of setup is needed outside of R, if any? >> >> 4- what steps are needed in R to then connect to a file and >> import a subset based on a query? >> >> 5- Do I then use standard R routines (e.g. write()) to export >> as a DB, or an RDB/XML specific function? >> >> Sincerely, >> KeithC. [U.S] >> >> 1/k^c >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.