On Wed, Aug 06, 2008 at 05:42:21PM +0000, zack holden wrote: > > Dear R wizards, > > I have a folder containing 1000 files. For each file, I need to extract the > first row of each file, paste it to a new file, then write out that file. > Then I need to repeat this operation for each additional row (row 2, then row > 3, etc) for 23 rows in each file. > > I can do this with a for loop (as below).
Hi Zack, There's a few problems with your sketched-out for loop (see below), but if I've understood your problem, then here are a couple of solutions that use for loops in the way you were intending. They both take line i from file 1, line i from file 2, ..., and write them to a file called lines_i, for i in 1:23. The first one is for the case when you have tabular data, so it uses read.table, and write.table. You might want to mess about with the arguments to read.table and write.table, specifying whether you have a header, and whether you want the row.names printed out, etc. The second one is similar but just works line by line, regardless of what the line looks like (i.e. doesn't assume you have tabular data in the files). collate.lines.1 <- function(folder, nrows=23) { files <- list.files(folder, full.names=TRUE) for(file in files) { file.as.data.frame <- read.table(file) for(row in 1:nrows) { outfile <- paste("lines_", row, ".csv", sep="") write.table(file.as.data.frame[row,], file=outfile, append=TRUE, row.names=FALSE, col.names=FALSE, sep=",") } } } collate.lines.2 <- function(folder, nrows=23) { files <- list.files(folder, full.names=TRUE) for(file in files) { file.as.character.vector <- scan(file, what="", sep="\n") for(row in 1:nrows) { outfile <- paste("lines", row, sep="_") cat(file.as.character.vector[row], "\n", file=outfile, append=TRUE) } } } > > Is there a way to use some of the indexing power of R to get around this > nasty loop? If you really mean that you want a solution without explicit for loops in R, then that is possible. But I would recommend that you stick to a straightforward solution until you're completely comfortable with programming in that style. It's conceivable that the no-for-loop versions might be faster if you have lots of files / rows, but don't worry aout speed until it's a problem. Here's my effort at doing it without for loops; it's a bit of a stretch and wasn't as easy to write down as the first two. I've probably missed a cleaner solution. collate.lines.1.fancy <- function(folder, nrows=23) { outfiles <- paste("lines_", 1:nrows, ".csv", sep="") files <- list.files(folder, full.names=TRUE) files.as.data.frames <- lapply(files, read.table) x <- lapply(files.as.data.frames, function(df) split(df, f=factor(1:nrow(df)))) ## split all rows apart x <- do.call(mapply, c(x, list(FUN=function(...) rbind(...), SIMPLIFY=FALSE))) ## collate rows from different data frames write.function <- function(dataframe, outfile) write.table(dataframe, file=outfile, row.names=FALSE, col.names=FALSE, sep=",") invisible(mapply(write.function, x, outfiles)) } > > Thank you in advance for any suggestions > > ################### > newoutfile <- data.frame() > list <- list.files("c:/data") ## 'list' not such a good name as it's a > built-in function > > file = 1 ## you don't need this > for(file in list) { > row <- file[1, ] ## that's not going to work; 'list' is a character > vector, you haven't got the files as data.frames yet > newoutfile <- rbind(row, newoutfile) > file = file + 1 > write.csv(outfile, file = "output.csv") > } > #################### > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.