Thanks Gabor, The method works perfectly for my needs! Phil, Thanks for the tip about set.seed() and for the code. I really like the simplicity of using mapply, and your example is very close to accomplishing what I was looking for, but it breaks down under one type of situation. In particular - if there are no cases for one or more of the categories in one of the data frames, but not in the other. I imagine mapply works by applying the function specified to the list of arguments provided (in this case, we create two lists using calls to split() ). It appears to "line up" the elements in both lists and go, pair-by-pair, through the two lists, applying the function to each pair. But the pairs become "misaligned" if one of the elements in one of the lists is empty. Thus, if there are no members of category 3 in my second data frame, I'd wind up applying the function (in my case, 'outer') to category 3 vs 4 (and 4 vs 5, etc.), instead of skipping category 3 altogether and moving to 4 vs 4. There's probably a way to mitigate that, but I suppose at this point, I can implement the method using merge().
Regards, Jonathan On Fri, May 14, 2010 at 6:44 PM, Gabor Grothendieck <ggrothendi...@gmail.com > wrote: > And here is a pure R solution: > > > m <- merge(df1, df2, by = "category") > > m$datediff <- m$date.x - m$date.y > > m <- m[order(m$category, m$date.x, m$date.y), ] > > m > category A.x date.x A.y date.y datediff > 2 1 124 2003-02-08 28 2003-05-17 -98 days > 1 1 124 2003-02-08 116 2003-11-29 -294 days > 6 1 22 2008-08-16 28 2003-05-17 1918 days > 5 1 22 2008-08-16 116 2003-11-29 1722 days > 4 1 96 2008-11-29 28 2003-05-17 2023 days > 3 1 96 2008-11-29 116 2003-11-29 1827 days > 10 2 18 2001-12-01 12 2005-02-26 -1183 days > 9 2 18 2001-12-01 6 2008-10-25 -2520 days > 8 2 150 2002-01-12 12 2005-02-26 -1141 days > 7 2 150 2002-01-12 6 2008-10-25 -2478 days > 14 3 24 2003-09-13 109 2005-10-01 -749 days > 13 3 24 2003-09-13 92 2007-08-18 -1435 days > 12 3 175 2009-08-01 109 2005-10-01 1400 days > 11 3 175 2009-08-01 92 2007-08-18 714 days > 24 4 126 2000-11-04 65 2000-11-18 -14 days > 26 4 126 2000-11-04 91 2003-05-10 -917 days > 25 4 126 2000-11-04 15 2003-07-26 -994 days > 23 4 126 2000-11-04 54 2008-11-22 -2940 days > 16 4 70 2004-03-13 65 2000-11-18 1211 days > 18 4 70 2004-03-13 91 2003-05-10 308 days > 17 4 70 2004-03-13 15 2003-07-26 231 days > 15 4 70 2004-03-13 54 2008-11-22 -1715 days > 20 4 64 2007-06-02 65 2000-11-18 2387 days > 22 4 64 2007-06-02 91 2003-05-10 1484 days > 21 4 64 2007-06-02 15 2003-07-26 1407 days > 19 4 64 2007-06-02 54 2008-11-22 -539 days > > > On Fri, May 14, 2010 at 6:38 PM, Gabor Grothendieck > <ggrothendi...@gmail.com> wrote: > > Generating df1 and df2 as in your post try this (and see > > http://sqldf.googlecode.com for more info): > > > >> library(sqldf) > >> out <- sqldf("select category, > > + df1.date date1, > > + df2.date date2, > > + df1.date - df2.date datediff > > + from df1 join df2 using(category) > > + order by category, date1, date2") > >> > >> out[[2]] <- as.Date(out[[2]], origin = "1970-01-01") > >> out[[3]] <- as.Date(out[[3]], origin = "1970-01-01") > >> out > > category date1 date2 datediff > > 1 1 2003-02-08 2003-05-17 -98 > > 2 1 2003-02-08 2003-11-29 -294 > > 3 1 2008-08-16 2003-05-17 1918 > > 4 1 2008-08-16 2003-11-29 1722 > > 5 1 2008-11-29 2003-05-17 2023 > > 6 1 2008-11-29 2003-11-29 1827 > > 7 2 2001-12-01 2005-02-26 -1183 > > 8 2 2001-12-01 2008-10-25 -2520 > > 9 2 2002-01-12 2005-02-26 -1141 > > 10 2 2002-01-12 2008-10-25 -2478 > > 11 3 2003-09-13 2005-10-01 -749 > > 12 3 2003-09-13 2007-08-18 -1435 > > 13 3 2009-08-01 2005-10-01 1400 > > 14 3 2009-08-01 2007-08-18 714 > > 15 4 2000-11-04 2000-11-18 -14 > > 16 4 2000-11-04 2003-05-10 -917 > > 17 4 2000-11-04 2003-07-26 -994 > > 18 4 2000-11-04 2008-11-22 -2940 > > 19 4 2004-03-13 2000-11-18 1211 > > 20 4 2004-03-13 2003-05-10 308 > > 21 4 2004-03-13 2003-07-26 231 > > 22 4 2004-03-13 2008-11-22 -1715 > > 23 4 2007-06-02 2000-11-18 2387 > > 24 4 2007-06-02 2003-05-10 1484 > > 25 4 2007-06-02 2003-07-26 1407 > > 26 4 2007-06-02 2008-11-22 -539 > > > > On Fri, May 14, 2010 at 6:13 PM, Jonathan <jonsle...@gmail.com> wrote: > >> Hi All, > >> I've come up with a solution for this problem that relies on a for > loop, > >> and I was wondering if anybody had any insight into a more elegant > method: > >> > >> I have two data frames, each has a column for categorical data and a > column > >> for date. What I'd like to do, ideally, is calculate the number of days > >> between all pairs of dates in data frame 1 and data frame 2 (*but only > for > >> members of the same category*). The number of members of each category > >> varies between the two data frames. > >> > >> For example: > >> > >> > >>> d <- seq(as.Date("2000-02-12"), as.Date("2009-08-18"), by="weeks") > >> > >>> df1 <- data.frame('A'=sample(1:200,10), > 'date'=d[sample(1:length(d),10)],'category'=sample(1:4,10,replace=TRUE)) > >> > >>> df2 <- data.frame('A'=sample(1:200,10), > 'date'=d[sample(1:length(d),10)],'category'=sample(1:4,10,replace=TRUE)) > >> > >> > >>> df1 > >> A date category > >> 1 93 2004-02-28 3 > >> 2 105 2001-03-17 3 > >> 3 189 2009-07-04 2 > >> 4 130 2003-07-05 2 > >> 5 160 2005-09-24 2 > >> 6 32 2004-11-06 2 > >> 7 117 2007-03-17 1 > >> 8 161 2003-07-19 4 > >> 9 153 2001-09-15 3 > >> 10 173 2005-08-27 1 > >> > >> > >>> df2 > >> A date category > >> 1 102 2006-08-19 3 > >> 2 68 2004-11-27 2 > >> 3 137 2003-01-11 1 > >> 4 39 2002-12-28 2 > >> 5 127 2004-03-06 4 > >> 6 125 2002-02-23 2 > >> 7 150 2002-05-18 4 > >> 8 19 2003-02-22 1 > >> 9 80 2000-08-05 1 > >> 10 94 2003-12-27 1 > >> > >> > >> Within a loop, I'd do the following (i is my counter; for the example, > >> I set it to 1): > >> > >> > >>> i<-1 > >> > >> # Create the data frames: > >> > >>> yeari_1 <- df1[which(df1['category']==i),]; yeari_2 <- > df2[which(df2['category']==i),] > >> > >> # Select only the data from category i > >> > >>> yeari_1 > >> A date category > >> 7 117 2007-03-17 1 > >> 10 173 2005-08-27 1 > >> > >>> yeari_2 > >> A date category > >> 3 137 2003-01-11 1 > >> 8 19 2003-02-22 1 > >> 9 80 2000-08-05 1 > >> 10 94 2003-12-27 1 > >> > >> # Convert dates to integers > >> > >> year1_i[[2]] <- as.integer(as.Date(yeari_1[[2]])); yeari_2[[2]] <- > >> as.integer(as.Date(yeari_2[[2]])); > >> > >>> yeari_1 > >> A date category > >> 7 117 13589 1 > >> 10 173 13022 1 > >>> yeari_2 > >> A date category > >> 3 137 12063 1 > >> 8 19 12105 1 > >> 9 80 11174 1 > >> 10 94 12413 1 > >> > >> # Get differences of all pairs: > >> > >>> result <- outer(yeari_1[[2]],yeari_2[[2]],'-') > >>> result > >> [,1] [,2] [,3] [,4] > >> [1,] 1526 1484 2415 1176 > >> [2,] 959 917 1848 609 > >> > >> # Now, merge the results with the results from all the earlier > >> iterations for previous values of i, increment i to the next value, > >> and repeat. > >> > >> > >> ---- > >> > >> Ideally, I could accomplish this in some sort of vectorized manner, > >> although the Force is not yet strong with me. Any ideas would be > >> appreciated! > >> > >> > >> Regards, > >> > >> Jonathan > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.