Thanks! That helps a lot! A quick follow-up question - I can't really tell what part of the commands tell it to only look at the child nodes of <C>. Is there any way to also access the fields that are in the <C> heirarchy? (ie the S, D, C, and F)
I wouldn't necessarily want those repeated thousands of times in the data frame, but C and F are useful reference points as they are actually row numbers where specific events occurred. Thanks again for all the help! -Brigid On Wed, May 20, 2009 at 5:16 PM, Duncan Temple Lang <dun...@wald.ucdavis.edu> wrote: > Hi Brigid. > > Here are a few commands that should do what you want: > > bri = xmlParse("myDataFile.xml") > > tmp = t(xmlSApply(xmlRoot(bri), xmlAttrs))[, -1] > dd = as.data.frame(tmp, stringsAsFactors = FALSE, > row.names = 1:nrow(tmp)) > > And then you can convert the columns to whatever types you want > using regular R commands. > > The basic idea is that for each of the child nodes of C, > i.e. the <T>'s, we want the character vector of attributes > which we can get with xmlAttrs(). > > Then we stack them together into a matrix, drop the "N" > and then convert the result to a data frame, avoiding > duplicate row names which are all "T". > > (BTW, make certain the '-' on the second line is not in the XML content. > I assume that came from bringing the text into mail.) > > HTH > D. > > > Brigid Mooney wrote: >> >> Hi, >> >> I am trying to parse XML files and read them into R as a data frame, >> but have been unable to find examples which I could apply >> successfully. >> >> I'm afraid I don't know much about XML, which makes this all the more >> difficult. If someone could point me in the right direction to a >> resource (preferably with an example or two), it would be greatly >> appreciated. >> >> Here is a snippet from one of the XML files that I am looking to read, >> and I am aiming to be able to get it into a data frame with columns N, >> T, A, B, C as in the 2nd level of the heirarchy. >> >> <?xml version="1.0" encoding="utf-8" ?> >> - <C S="UnitA" D="1/3/2007" C="24745" F="24648"> >> <T N="1" T="9:30:13 AM" A="30.05" B="29.85" C="30.05" /> >> <T N="2" T="9:31:05 AM" A="29.89" B="29.78" C="30.05" /> >> <T N="3" T="9:31:05 AM" A="29.9" B="29.86" C="29.87" /> >> <T N="4" T="9:31:05 AM" A="29.86" B="29.86" C="29.87" /> >> <T N="5" T="9:31:05 AM" A="29.89" B="29.86" C="29.87" /> >> <T N="6" T="9:31:06 AM" A="29.89" B="29.85" C="29.86" /> >> <T N="7" T="9:31:06 AM" A="29.89" B="29.85" C="29.86" /> >> <T N="8" T="9:31:06 AM" A="29.89" B="29.85" C="29.86" /> >> </C> >> >> Thanks for any help or direction anyone can provide. >> >> As a point of reference, I am using R 2.8.1 and have loaded the XML >> package. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.