Just one further point. If you do run out of memory using #2 then try this which is the same as #2 but adds a dbname argument to force the computation to be done from disk rather than memory.
sqldf("select d1.x - d2.x, count(*) from d1, d2 group by d1.x - d2.x", dbname = tempfile()) On Mon, Feb 15, 2010 at 10:45 PM, Gabor Grothendieck <ggrothendi...@gmail.com> wrote: > Here are two approaches to try: > >> # test data >> d1 <- data.frame(x = Sys.Date() + 1:3) >> d2 <- data.frame(x = Sys.Date() - 1:3) > >> # 1. you might not have enough memory for this but its short >> table(outer(1:3, -(1:3), "-")) > > 2 3 4 5 6 > 1 2 3 2 1 > >> # 2. this one performs all the operations outside of R getting >> # result back in so it won't need as much memory >> >> library(sqldf) >> sqldf("select d1.x - d2.x, count(*) from d1, d2 group by d1.x - d2.x") > d1.x - d2.x count(*) > 1 2 1 > 2 3 2 > 3 4 3 > 4 5 2 > 5 6 1 > > > On Mon, Feb 15, 2010 at 9:17 PM, Jonathan <jonsle...@gmail.com> wrote: >> Let me fix a couple of typos in that email: >> >> Hi All: >> >> Let's say I have two dataframes (Condition1 and Condition2); each >> being on the order of 12,000 and 16,000 rows; 1 column. The entries >> contain dates. >> >> I'd like to calculate, for each possible pair of dates (that is: >> Condition1[1:12,000] and Condition2[1:16,000], the number of days >> difference between the dates in the pair. The result should be a >> matrix 12,000 by 16,000, which I'll call M. The purpose of building >> such a matrix M is to create a histogram of all the values contained >> within it. >> >> Ex): >> Condition1 <- data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000)) >> Condition2 <- data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000)) >> >> First, my instinct is to try and vectorize the operation. I tried >> this by expanding each vector into a matrix of repeated vectors (I'd >> then just subtract the two resultant matrices to get matrix M). I got >> the following error: >> >>> expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)), >>> byrow=TRUE, ncol=nrow(Condition1)) >> Error: cannot allocate vector of size 732.4 Mb >>> expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)), >>> byrow=FALSE, nrow=nrow(Condition2)) >> Error: cannot allocate vector of size 732.4 Mb >> >> Since it seems these matrices are too large, I'm wondering whether >> there's a better way to call a hist command without actually building >> the said matrix.. >> >> I'd greatly appreciate any ideas! >> >> Best, >> Jonathan >> >> On Mon, Feb 15, 2010 at 8:19 PM, Jonathan <jonsle...@gmail.com> wrote: >>> Hi All: >>> >>> Let's say I have two dataframes (Condition1 and Condition2); each >>> being on the order of 12,000 and 16,000 rows; 1 column. The entries >>> contain dates. >>> >>> I'd like to calculate, for each possible pair of dates (that is: >>> Condition1[1:10,000] and Condition2[1:10,000], the number of days >>> difference between the dates in the pair. The result should be a >>> matrix 12,000 by 16,000. Really, what I need is a histogram of all >>> the values in this matrix. >>> >>> Ex): >>> Condition1 <- data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000)) >>> Condition2 <- data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000)) >>> >>> First, my instinct is to try and vectorize the operation. I tried >>> this by expanding each vector into a matrix of repeated vectors (I'd >>> then just subtract the two). I got the following error: >>> >>>> expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)), >>>> byrow=TRUE, ncol=nrow(Condition1)) >>> Error: cannot allocate vector of size 732.4 Mb >>>> expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)), >>>> byrow=FALSE, nrow=nrow(Condition2)) >>> Error: cannot allocate vector of size 732.4 Mb >>> >>> Since it seems these matrices are too large, I'm wondering whether >>> there's a better way to call a hist command without actually building >>> the said matrix.. >>> >>> I'd greatly appreciate any ideas! >>> >>> Best, >>> Jonathan >>> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.