Hi,
May be this also helps: s <- c(" <data>", " <row BRAND=\"GMC\" NUM=\"1\" YEAR=\"1999\" VALUE=\"10000\" />", " <row BRAND=\"FORD\" NUM=\"1\" YEAR=\"2000\" VALUE=\"12000\" />", " <row BRAND=\"GMC\" NUM=\"1\" YEAR=\"2001\" VALUE=\"12500\" />", " <row BRAND=\"FORD\" NUM=\"1\" YEAR=\"2002\" VALUE=\"13000\" />", " <row BRAND=\"GMC\" NUM=\"1\" YEAR=\"2003\" VALUE=\"14000\" />", " <row BRAND=\"FORD\" NUM=\"1\" YEAR=\"2004\" VALUE=\"17000\" />", " <row BRAND=\"GMC\" NUM=\"1\" YEAR=\"2005\" VALUE=\"15000\" />", " <row BRAND=\"GMC\" NUM=\"1\" YEAR=\"1967\" VALUE=\"PRICLESS\" />", " <row BRAND=\"FORD\" NUM=\"1\" YEAR=\"2007\" VALUE=\"17500\" />", " <row BRAND=\"GMC\" NUM=\"1\" YEAR=\"2008\" VALUE=\"22000\" />", " </data>") Lines1<-gsub("^\\s+| \\s+$","",gsub("[^0-9A-Z]"," ",s)) dat1<-read.table(text=Lines1[Lines1!=""],sep="",header=F,stringsAsFactors=F) dat1New<-dat1[,seq(2,ncol(dat1),by=2)] colnames(dat1New)<- unlist(unique(dat1[,seq(1,ncol(dat1),by=2)])) str(dat1New) #'data.frame': 10 obs. of 4 variables: # $ BRAND: chr "GMC" "FORD" "GMC" "FORD" ... # $ NUM : int 1 1 1 1 1 1 1 1 1 1 # $ YEAR : int 1999 2000 2001 2002 2003 2004 2005 1967 2007 2008 # $ VALUE: chr "10000" "12000" "12500" "13000" ... #or Lines2<-gsub(" <.*>","",gsub("^.*=\"(.*)\"\\s+.*=\"(.*)\"\\s+.*=\"(.*)\"\\s+.*=\"(.*)\".*","\\1 \\2 \\3 \\4",s)) dat2<-read.table(text=Lines2[Lines2!=""&Lines2!=" "],sep="",header=FALSE,stringsAsFactors=FALSE) colnames(dat2)<- unlist(unique(dat1[,seq(1,ncol(dat1),by=2)])) str(dat2) 'data.frame': 10 obs. of 4 variables: # $ BRAND: chr "GMC" "FORD" "GMC" "FORD" ... # $ NUM : int 1 1 1 1 1 1 1 1 1 1 # $ YEAR : int 1999 2000 2001 2002 2003 2004 2005 1967 2007 2008 # $ VALUE: chr "10000" "12000" "12500" "13000" ... head(dat2,3) # BRAND NUM YEAR VALUE #1 GMC 1 1999 10000 #2 FORD 1 2000 12000 #3 GMC 1 2001 12500 A.K. ----- Original Message ----- From: Ben Tupper <btup...@bigelow.org> To: Adam Gabbert <adamjgabb...@gmail.com> Cc: r-help@r-project.org Sent: Tuesday, January 22, 2013 10:13 PM Subject: Re: [R] Creating a Data Frame from an XML On Jan 22, 2013, at 3:11 PM, Adam Gabbert wrote: > Hello, > > I'm attempting to read information from an XML into a data frame in R using > the "XML" package. I am unable to get the data into a data frame as I would > like. I have some sample code below. > > *XML Code:* > > Header... > > Data I want in a data frame: > > <data> > <row BRAND="GMC" NUM="1" YEAR="1999" VALUE="10000" /> > <row BRAND="FORD" NUM="1" YEAR="2000" VALUE="12000" /> > <row BRAND="GMC" NUM="1" YEAR="2001" VALUE="12500" /> > <row BRAND="FORD" NUM="1" YEAR="2002" VALUE="13000" /> > <row BRAND="GMC" NUM="1" YEAR="2003" VALUE="14000" /> > <row BRAND="FORD" NUM="1" YEAR="2004" VALUE="17000" /> > <row BRAND="GMC" NUM="1" YEAR="2005" VALUE="15000" /> > <row BRAND="GMC" NUM="1" YEAR="1967" VALUE="PRICLESS" /> > <row BRAND="FORD" NUM="1" YEAR="2007" VALUE="17500" /> > <row BRAND="GMC" NUM="1" YEAR="2008" VALUE="22000" /> > </data> > > *R Code:* > > doc< -xmlInternalTreeParse ("Sample2.xml") > top <- xmlRoot (doc) > xmlName (top) > names (top) > art <- top [["row"]] > art > ** > *Output:* > >> art<row BRAND="GMC" NUM="1" YEAR="1999" VALUE="10000"/> > > > > > This is where I am having difficulties. I am unable to "access" additional > rows; ( i.e. <row BRAND="GMC" NUM="1" YEAR="1967" VALUE="PRICLESS" /> ) > > and I am unable to access the individual entries to actually create the > data frame. The data frame I would like is as follows: > > BRAND NUM YEAR VALUE > GMC 1 1999 10000 > FORD 2 2000 12000 > GMC 1 2001 12500 > etc........ > > Any help or suggestions would be appreciated. Conversly, my eventual goal > would be to take a data frame and write it into an XML in the previously > shown format. > Hi, You are so close! You have a number of nodes with the name 'row'. The "[[" function selects just one item from a list, and when there's a number that have that name it returns just the first. So you really want to use the "[" function instead and then select by order index using "[[" library(XML) > s <- c(" <data>", " <row BRAND=\"GMC\" NUM=\"1\" YEAR=\"1999\" > VALUE=\"10000\" />", " <row BRAND=\"FORD\" NUM=\"1\" YEAR=\"2000\" VALUE=\"12000\" />", " <row BRAND=\"GMC\" NUM=\"1\" YEAR=\"2001\" VALUE=\"12500\" />", " <row BRAND=\"FORD\" NUM=\"1\" YEAR=\"2002\" VALUE=\"13000\" />", " <row BRAND=\"GMC\" NUM=\"1\" YEAR=\"2003\" VALUE=\"14000\" />", " <row BRAND=\"FORD\" NUM=\"1\" YEAR=\"2004\" VALUE=\"17000\" />", " <row BRAND=\"GMC\" NUM=\"1\" YEAR=\"2005\" VALUE=\"15000\" />", " <row BRAND=\"GMC\" NUM=\"1\" YEAR=\"1967\" VALUE=\"PRICLESS\" />", " <row BRAND=\"FORD\" NUM=\"1\" YEAR=\"2007\" VALUE=\"17500\" />", " <row BRAND=\"GMC\" NUM=\"1\" YEAR=\"2008\" VALUE=\"22000\" />", " </data>") > x <- xmlRoot(xmlTreeParse(s, asText = TRUE, useInternalNodes = TRUE)) > x["row"][[1]] <row BRAND="GMC" NUM="1" YEAR="1999" VALUE="10000"/> > x["row"][[2]] <row BRAND="FORD" NUM="1" YEAR="2000" VALUE="12000"/> Your rows are set up so the attributes have the values you want - use xmlAttrs to retrieve them. > xmlAttrs(x["row"][[2]]) BRAND NUM YEAR VALUE "FORD" "1" "2000" "12000" You can use lapply to iterate through each row and apply the xmlAttrs function. You'll end up with a list if character vectors. > y <- lapply(x["row"], xmlAttrs) > str(y) List of 10 $ row: Named chr [1:4] "GMC" "1" "1999" "10000" ..- attr(*, "names")= chr [1:4] "BRAND" "NUM" "YEAR" "VALUE" $ row: Named chr [1:4] "FORD" "1" "2000" "12000" ..- attr(*, "names")= chr [1:4] "BRAND" "NUM" "YEAR" "VALUE" $ row: Named chr [1:4] "GMC" "1" "2001" "12500" ..- attr(*, "names")= chr [1:4] "BRAND" "NUM" "YEAR" "VALUE" . . . Next make a character matrix using do.call and rbind ... > m <- do.call(rbind, y) > str(m) chr [1:10, 1:4] "GMC" "FORD" "GMC" "FORD" "GMC" "FORD" "GMC" "GMC" "FORD" ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:10] "row" "row" "row" "row" ... ..$ : chr [1:4] "BRAND" "NUM" "YEAR" "VALUE" And then on to a data.frame... > d <- as.data.frame(m) > str(d) 'data.frame': 10 obs. of 4 variables: $ BRAND: chr "GMC" "FORD" "GMC" "FORD" ... $ NUM : chr "1" "1" "1" "1" ... $ YEAR : chr "1999" "2000" "2001" "2002" ... $ VALUE: chr "10000" "12000" "12500" "13000" ... Cheers, Ben > Thank you > > AG > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. Ben Tupper Bigelow Laboratory for Ocean Sciences 180 McKown Point Rd. P.O. Box 475 West Boothbay Harbor, Maine 04575-0475 http://www.bigelow.org ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.