Hi: See inline.
On Mon, Nov 15, 2010 at 4:26 PM, Nate Miller <natemille...@gmail.com> wrote: > Hi All! > > I have some experience with R, but less experience writing scripts using > R and have run into a challenge that I hope someone can help me with. > > I have multiple .csv files of data each with the same 3 columns of > data, but potentially of varying lengths (some data files are from short > measurements, others from longer ones). One file for example might look > like this... > > Time, O2_conc, Chla_conc > > 0,270,300 > > 10, 260, 280 > > 20, 245, 268 > > 30, 233, 238 > > 40, 222, 212 > > 50, 215, 201 > > 60, 208, 193 > > 70, 206, 191 > > 80, 207,189 > > 90, 206, 186 > > 100, 206, 183 > > 110, 207, 178 > > 120, 205, 174 > > 130, 240, 171 > > 140, 270, 155 > > I am looking for an efficient means of batch (or sequentially) > processing these files so that I can > 1. import each data file > > 2. find the minimum value recorded in column 2 and the previous 5 data > points > Don't know what you mean by the 'previous 5 data points' ...are you referring to a rolling minimum? > > 3. and average these 10 values to get a mean, minimum value. > If the surmise above is correct, you should get 11 rolling mins for a vector of length 15. Here's an example using the rollapply() function from the zoo package: library(zoo) > x <- rpois(15, 10) > x [1] 17 12 17 9 8 10 7 11 15 11 11 15 5 9 12 > rollapply(zoo(x), 5, FUN = min) 3 4 5 6 7 8 9 10 11 12 13 8 8 7 7 7 7 7 11 5 5 5 > mean(rollapply(zoo(x), 5, FUN = min)) [1] 7 Currently I have imported the data files using the following > > filenames=list.files() > > library(plyr) > > import.list=adply(filenames, 1, read.csv) > This seems to be a reasonable approach. Does the result keep a column for the file names? > > and I know how to write a code to calculate the minimum value and the 5 > preceding values in a single column, in a single file. I think the > problem I am running into is scaling this code up so that I can import > multiple files and calculating mean, minimum value for the 2^nd column > in each of them. > As long as you have an indicator for each file, this should be pretty straightforward. Write a function that produces the summaries for one data frame and then do something like ddply(slurpedFiles, .(dsIndicator), myfunction) to map the function to all of them. The data.table package is an alternative, where you should be able to do similar things using the data set names as a key. data.table 'thinks' more like an SQL, but it can be very efficient. You should be able to do what you're asking for with either package. HTH, Dennis > > Can anyone offer some advice on how to batch processes a whole bunch of > files? I need to load them in, but then analyze them too. > > Thank you so much, > > Nate > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.