Dear R help; I'm currently trying to combine a large number (about 30 x 30) of large .csvs together (each at least 10000 records). They are organized by plots, hence 30 X 30, with each group of csvs in a folder which corresponds to the plot. The unmerged csvs all have the same number of columns (5). The fifth column has a different name for each csv. The number of rows is different.
The combined csvs are of course quite large, and the code I'm running is quite slow - I'm currently running it on a computer with 10 GB ram, ssd, and quad core 2.3 ghz processor; it's taken 8 hours and it's only 75% of the way through (it's hung up on one of the largest data groupings now for an hour, and using 3.5 gigs of RAM. I know that R isn't the most efficient way of doing this, but I'm not familiar with sql or C. I wonder if anyone has suggestions for a different way to do this in the R environment. For instance, the key function now is merge, but I haven't tried join from the plyr package or rbind from base. I'm willing to provide a dropbox link to a couple of these files if you'd like to see the data. My code is as follows: #multmerge is based on code by Tony cookson, http://www.r-bloggers.com/merging-multiple-data-files-into-one-data-frame/; The function takes a path. This path should be the name of a folder that contains all of the files you would like to read and merge together and only those files you would like to merge. multmerge = function(mypath){ filenames=list.files(path=mypath, full.names=TRUE) datalist = try(lapply(filenames, function(x){read.csv(file=x,header=T)})) try(Reduce(function(x,y) {merge(x, y, all=TRUE)}, datalist)) } #this function renames files using a fixed list and outputs a .csv merepk <- function (path, nf.name) { output<-multmerge(mypath=path) name <- list("x", "y", "z", "depth", "amplitude") try(names(output) <- name) write.csv(output, nf.name) } #assumes all folders are in the same directory, with nothing else there merge.by.folder <- function (folderpath){ foldernames<-list.files(path=folderpath) n<- length(foldernames) setwd(folderpath) for (i in 1:n){ path<-paste(folderpath,foldernames[i], sep="\\") nf.name <- as.character(paste(foldernames[i],".csv", sep="")) merepk (path,nf.name) } } folderpath <- "yourpath" merge.by.folder(folderpath) Thanks for looking, and happy friday! *Ben Caldwell* PhD Candidate University of California, Berkeley [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.