Hey Sowmiyan, I would recommend taking a look at the xml2, rather than xml, package for a start. It's a lot more structured and traversing between elements far easier :)
On 24 January 2016 at 12:27, sowmiyan <sowmiyan0...@gmail.com> wrote: > I am working with a XML, which can be found in the link Sample XML file > <https://www.dropbox.com/s/8kn9g8xev2u5n8o/Dummy.xml?dl=0&preview=Dummy.xml> > > I am trying to extract each and every fields information to a csv file. I > want my output to be as below: Required output: > *Total of 20 columns and 2 rows* > DateCreated DateModified Creator.UserAccountName Creator.PersonName > Creator..attrs.referenceNumber Modifier.UserAccountName Modifier.PersonName > Modifier..attrs.referenceNumber AdditionalEmailStr AdditionalComment > DateIssued DocumentaryInstructions NominationParcel.attr.Referencenumber > NominationParcel.SecondContractNumber > NominationParcel.Coordinator.RefernceNumber > NominationParcel.Coordinator.Username NominationParcel.Coordinator.Email > NominationParcel.Coordinator.Office.Name > NominationParcel.Coordinator.Office.Email > NominationParcel.Coordinator.Office.attrs.referenceNumber > Nomination 2007-11-25T17:01:32 2007-11-25T17:11:09 mkolker Merryn Kolker > 15351 mkolker Merryn Kolker 15351 Good work 7 sam > Nomination 2007-11-25T17:18:01 2007-11-25T17:19:11 mkolker Merryn Kolker > 15351 mkolker Merryn Kolker 15351 Nicely Performed 10 107 102 > > But I am not able to get my output in the required format. I have tried in > two different ways > > 1 Below is my first code, the problem with this is that my NULL fields are > not getting captured correctly and there is spillover of data. Also I am > not able to capture all the fields of nested lists in the XML > > *Code 1* > > doc <- xmlParse("Dummy.xml") > lst<-xmlToList(doc) > f <- function(col) do.call(rbind, lapply(lst, function(x) > unlist(x[cols]))); > cols > <-c("DateCreated","DateModified","Creator","Modifier","AdditionalEmailStr","AdditionalComment","DateIssued", > "DocumentaryInstructions", "NominationParcel" ); > res <- setNames(lapply(cols, f), cols); > list2env(res, .GlobalEnv) > *Output 1* > > > DateCreated DateModified Creator.UserAccountName Creator.PersonName > Creator..attrs.referenceNumber Modifier.UserAccountName Modifier.PersonName > Modifier..attrs.referenceNumber AdditionalComment > NominationParcel.Coordinator.UserAccountName > NominationParcel.Coordinator.Office..attrs.referenceNumber > NominationParcel.Coordinator..attrs.referenceNumber > NominationParcel..attrs.referenceNumber > Nomination 2007-11-25T17:01:32 2007-11-25T17:11:09 mkolker Merryn Kolker > 15351 mkolker Merryn Kolker 15351 Good Work sam 7 > Nomination 2007-11-25T17:18:01 2007-11-25T17:19:11 mkolker Merryn Kolker > 15351 mkolker Merryn Kolker 15351 Nicely performed 102 107 10 > 2007-11-25T17:18:01 > > 2 To avoid spillover of information of one cell to other because of "NULL", > I have used for loop to replace the NULL cells with NA. By using this I was > able to capture the correct data, but I could not get all the fields > information present in the XML > > *Code 2* > > doc <- xmlParse("Dummy.xml") > lstsub<-xmlToList(doc) > for(i in 1:length(lstsub)) > { > for(j in 1:length(lstsub[[i]])) > { > lstsub[[i]][[j]]= > ifelse(is.null(lstsub[[i]][[j]]),NA,lstsub[[i]][[j]]) > if(length(lstsub[[i]][[j]])>1) > { > for(k in 1:length(lstsub[[i]][[j]])) > { > lstsub[[i]][[j]][[k]]= > ifelse(is.null(lstsub[[i]][[j]][[k]]),NA,lstsub[[i]][[j]][[k]]) > if(length(lstsub[[i]][[j]][[k]])>1) > { > for(l in 1:length(lstsub[[i]][[j]][[k]])) > { > lstsub[[i]][[j]][[k]][[l]]= > ifelse(is.null(lstsub[[i]][[j]][[k]][[l]]),NA,lstsub[[i]][[j]][[k]][[l]]) > } > } > } > } > } > } > f <- function(col) do.call(rbind, lapply(lstsub, function(x) > unlist(x[cols]))); > cols <- > c("DateCreated","DateModified","Creator","Modifier","AdditionalEmailStr","AdditionalComment","DateIssued", > "DocumentaryInstructions", "NominationParcel" ); > res <- setNames(lapply(cols, f), cols); > list2env(res, .GlobalEnv) > write.csv(Creator,"dummy_2.csv") > > *Output 2* > > DateCreated DateModified Creator Modifier > AdditionalEmailStr AdditionalComment DateIssued DocumentaryInstructions > > Nomination 2007-11-25T17:01:32 2007-11-25T17:11:09 mkolker mkolker NA > Good Work NA NA > Nomination 2007-11-25T17:18:01 2007-11-25T17:19:11 mkolker mkolker NA > Nicely performed NA NA > > Could somebody please help me in how could I get the required output > > I have posted the same question in Stackoverflow and the link is here (it > might help in giving more clear picture) > > http://stackoverflow.com/questions/34963724/extracting-complete-information-from-nested-lists-in-xml-to-a-data-frame-using-r/34963821#34963821 > > > Regards, > Sowmiyan > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Oliver Keyes Count Logula Wikimedia Foundation ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.