Re: [R] Batch Processing Files

Dennis Murphy Mon, 15 Nov 2010 21:14:30 -0800

Hi:

See inline.

On Mon, Nov 15, 2010 at 4:26 PM, Nate Miller <natemille...@gmail.com> wrote:

> Hi All!
>
> I have some experience with R, but less experience writing scripts using
> R and have run into a challenge that I hope someone can help me with.
>
>  I have multiple .csv files of data each with the same 3 columns of
> data, but potentially of varying lengths (some data files are from short
> measurements, others from longer ones). One file for example might look
> like this...
>
> Time, O2_conc, Chla_conc
>
> 0,270,300
>
> 10, 260, 280
>
> 20, 245, 268
>
> 30, 233, 238
>
> 40, 222, 212
>
> 50, 215, 201
>
> 60, 208, 193
>
> 70, 206, 191
>
> 80, 207,189
>
> 90, 206, 186
>
> 100, 206, 183
>
> 110, 207, 178
>
> 120, 205, 174
>
> 130, 240, 171
>
> 140, 270, 155
>
> I am looking for an efficient means of batch (or sequentially)
> processing these files so that I can
> 1. import each data file
>
> 2. find the minimum value recorded in column 2 and the previous 5 data
> points
>

Don't know what you mean by the 'previous 5 data points' ...are you
referring to a rolling minimum?

>
> 3. and average these 10 values to get a mean, minimum value.
>

If the surmise above is correct, you should get 11 rolling mins for a vector
of length 15. Here's an example using the rollapply() function from the zoo
package:

library(zoo)
> x <- rpois(15, 10)
> x
 [1] 17 12 17  9  8 10  7 11 15 11 11 15  5  9 12
> rollapply(zoo(x), 5, FUN = min)
 3  4  5  6  7  8  9 10 11 12 13
 8  8  7  7  7  7  7 11  5  5  5
> mean(rollapply(zoo(x), 5, FUN = min))
[1] 7

Currently I have imported the data files using the following
>
> filenames=list.files()
>
> library(plyr)
>
> import.list=adply(filenames, 1, read.csv)
>

This seems to be a reasonable approach. Does the result keep a column for
the file names?

>
> and I know how to write a code to calculate the minimum value and the 5
> preceding values in a single column, in a single file. I think the
> problem I am running into is scaling this code up so that I can import
> multiple files and calculating mean, minimum value for the 2^nd column
> in each of them.
>

As long as you have an indicator for each file, this should be pretty
straightforward. Write a function that produces the summaries for one data
frame and then do something like

ddply(slurpedFiles, .(dsIndicator), myfunction)

to map the function to all of them.

The data.table package is an alternative, where you should be able to do
similar things using the data set names as a key. data.table 'thinks' more
like an SQL, but it can be very efficient.

You should be able to do what you're asking for with either package.

HTH,
Dennis

>
> Can anyone offer some advice on how to batch processes a whole bunch of
> files? I need to load them in, but then analyze them too.
>
> Thank you so much,
>
> Nate
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Batch Processing Files

Reply via email to