Hi Ista, Many thanks, the plyr package was just what I needed.
Because I did such a bad job with my question (no data, etc etc), here is my current solution: First, I grabbed my data from PostgreSQL as follows: library('RPostgreSQL') m <- dbDriver("PostgreSQL") con <- dbConnect(m,user="user",password="yourpassword",host="super.secret.host.edu",dbname="yourdb") rs <- dbSendQuery(con,"select vds_id, to_char(ts,'Dy') as dow,date_trunc('hour'::text, ts) as tshour, n1,o1, n2,o2, n3,o3, n4,o4, n5,o5 from data_raw where vds_id=1201087 and (ts>'Mar 1, 2007' and ts<'Apr 1, 2007 00:00:00')") df.il <- fetch(res, n = -1) This timeperiod grabs 88,000 odd records. The data look like this: summary(df.i1) vds_id dow tshour Min. :1201087 Length:88070 Min. :2007-03-01 00:00:00 1st Qu.:1201087 Class :character 1st Qu.:2007-03-08 19:00:00 Median :1201087 Mode :character Median :2007-03-16 15:00:00 Mean :1201087 Mean :2007-03-16 13:51:31 3rd Qu.:1201087 3rd Qu.:2007-03-24 08:00:00 Max. :1201087 Max. :2007-03-31 23:00:00 ts n1 o1 Min. :2007-03-01 00:00:30 Min. : 0.000 Min. :0.00000 1st Qu.:2007-03-08 19:09:37 1st Qu.: 3.000 1st Qu.:0.01620 Median :2007-03-16 15:08:45 Median : 8.000 Median :0.04860 Mean :2007-03-16 14:21:18 Mean : 8.147 Mean :0.05024 3rd Qu.:2007-03-24 08:25:52 3rd Qu.:12.000 3rd Qu.:0.07330 Max. :2007-03-31 23:59:30 Max. :35.000 Max. :0.79440 n2 o2 n3 o3 Min. : 0.000 Min. :0.00000 Min. : 0.000 Min. :0.00000 1st Qu.: 4.000 1st Qu.:0.02160 1st Qu.: 3.000 1st Qu.:0.01940 Median : 8.000 Median :0.04580 Median : 6.000 Median :0.03900 Mean : 7.268 Mean :0.04584 Mean : 5.682 Mean :0.04178 3rd Qu.:10.000 3rd Qu.:0.06370 3rd Qu.: 8.000 3rd Qu.:0.05910 Max. :27.000 Max. :0.77750 Max. :27.000 Max. :0.63060 n4 o4 n5 o5 Min. : 0.000 Min. :0.00000 Min. : 0.000 Min. :0.00000 1st Qu.: 2.000 1st Qu.:0.01450 1st Qu.: 1.000 1st Qu.:0.00640 Median : 5.000 Median :0.03400 Median : 3.000 Median :0.02040 Mean : 5.418 Mean :0.03811 Mean : 3.706 Mean :0.03085 3rd Qu.: 8.000 3rd Qu.:0.05510 3rd Qu.: 6.000 3rd Qu.:0.04530 Max. :28.000 Max. :0.59930 Max. :27.000 Max. :0.66210 df.i1[1,] vds_id dow tshour ts n1 o1 n2 o2 n3 o3 n4 o4 n5 o5 1 1201087 Thu 2007-03-01 2007-03-01 00:00:30 6 0.0373 5 0.0291 2 0.0217 2 0.0109 1 0.0086 The 'n_i' values are vehicle counts in lane i on a freeway in 30 seconds, and the 'o_i' are occupancy values for the lane over those 30 seconds, ranging from 0 (nothing over the sensor) to 1 (somebody parked over the sensor). I want to isolate just those time periods when n and o are highly correlated, as I'm exploring whether that indicates free flow traffic conditions. My final function after hacking around a bit last night looks like this cor.dat <- function(df) { cor.l1 <- cor.test(df$n1,df$o1) cor.l2 <- cor.test(df$n2,df$o2) cor.l3 <- cor.test(df$n3,df$o3) cor.l4 <- cor.test(df$n4,df$o4) cor.l5 <- cor.test(df$n5,df$o5) c(l1=cor.l1,l2=cor.l2,l3=cor.l3,l4=cor.l4,l5=cor.l5) } (I'm not really sure what the best way to handle that return value is, but at least this works for what I want...see below). Run that function hourly with plyr output.hourly <- dlply(df.i1,"tshour",cor.dat) (because I used c(...) the output is ugly, but I can do this): get.cor <- function (l,w) { l[w] } So to get the "estimate" of the parameter for lane 1: g <- lapply(output.hourly,get.cor,"l1.estimate") g[1] $`2007-03-01 00:00:00` $`2007-03-01 00:00:00`$l1.estimate cor 0.9845006 plot(unlist(g)) etc. Super ugly, but I'm slowly remembe-R-ing. Again thanks a lot for the tip. Regards, James On Tue, Mar 09, 2010 at 10:24:05PM -0500, Ista Zahn wrote: > Hi James, > It would really help if you gave us a sample of the data you are > working with. The following is not tested, because I don't have your > data and am too lazy to construct a similar example dataset for you, > but it might get you started. > > You can try using a for loop along the lines of > > output <- data.frame(obsfivemin = obsfivemin, 5min.cor = > vector(length=length(obsfivemin))) > for (f in fivemin){ > output$5min.cor[obsfivemin==f] <- cor(df[obsfivemin==f, c("v", "o")]) > } > > Or you can try with the plyr package something like > > cor.dat <- function(df) { > cor(df[,c("v", "o")]) > } > > library(plyr) > dlply(df, obsfivemin, cor.dat) > > Good luck, > Ista > > > On Tue, Mar 9, 2010 at 9:36 PM, James Marca <jma...@translab.its.uci.edu> > wrote: > > Hello, > > > > I do not understand the correct way to approach the following problem > > in R. > > > > I have observations of pairs of variables, v1, o1, v2, o2, etc, > > observed every 30 seconds. What I would like to do is compute the > > correlation matrix, but not for all my data, just for, say 5 minutes > > or 1 hour chunks. > > > > In sql, what I would say is > > > > select id, date_trunc('hour'::text, ts) as tshour, corr(n1,o1) as corr1 > > from raw30s > > where id = 1201087 and > > (ts between 'Mar 1, 2007' and 'Apr 1, 2007') > > group by id,tshour order by id,tshour; > > > > > > I've pulled data from PostgreSQL into R, and have a dataframe > > containing a timestamp column, v, and o (both numeric). > > > > I created an grouping index for every 5 minutes along these lines: > > > > obsfivemin <- trunc(obsts,units="hours") > > +( floor( (obsts$min / 5 ) ) * 5 * 60 ) > > > > (where obsts is the sql timestamp converted into a DateTime object) > > > > Then I tried aggregate(df,by=obsfivemin,cor), but that seemed to pass > > just a single column at a time to cor, not the entire data frame. It > > worked for mean and sum, but not cor. > > > > In desperation, I tried looping over the different 5 minute levels and > > computing cor, but I'm so R-clueless I couldn't even figure out how to > > assign to a variable inside of that loop! > > > > code such as > > > > for (f in fivemin){ > > output[f] <- cor(df[grouper==f,]); } > > > > failed, as I couldn't figure out how to initialize output so that > > output[f] would accept the output of cor. > > > > Any help or steering towards the proper R-way would be appreciated. > > > > Regards, > > > > James Marca > > > > -- > > This message has been scanned for viruses and > > dangerous content by MailScanner, and is > > believed to be clean. > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Ista Zahn > Graduate student > University of Rochester > Department of Clinical and Social Psychology > http://yourpsyche.org > > > > --==============24723324=Content-Type: message/rfc822 > MIME-Version: 1.0 -- James E. Marca, PhD Researcher Institute of Transportation Studies AIRB Suite 4000 University of California Irvine, CA 92697-3600 jma...@translab.its.uci.edu (949) 824-6287 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.