I have a modest-size XML file (52MB) in a format suited to xmlToDataFrame (package XML).
I have successfully read it into R by splitting the file 10 ways then running xmlToDataFrame on each part, then rbind.fill (package plyr) on the result. This takes about 530 s total, and results in a data.frame with 71k rows and object.size of 21MB. But trying to run xmlToDataFrame on the whole file takes forever (> 10000 s so far). xmlParse of this file takes only 0.8 s. I tried running xmlToDataFrame on the first 10% of the file, then the first 10% repeated twice, then three times (with the outer tags adjusted of course). Timings: 1 copy: 111 s = 111 per copy 2 copy: 311 s = 155 " " 3 copy: 626 s = 209 " " The runtime is superlinear. What is going on here? Is there a better approach? Thanks, -s [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.