Jonathan-
When you provide an example that uses sample() or random
number generators, it's a good idea to call set.seed() before
generating the data, so that others can reproduce it.
But to answer your question:
mapply(function(d1,d2)outer(d1$date,d2$date,'-'),
split(df1,df1$category),
split(df2,df2$category))
will return a list with the information you want.
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spec...@stat.berkeley.edu
On Fri, 14 May 2010, Jonathan wrote:
Hi All,
I've come up with a solution for this problem that relies on a for loop,
and I was wondering if anybody had any insight into a more elegant method:
I have two data frames, each has a column for categorical data and a column
for date. What I'd like to do, ideally, is calculate the number of days
between all pairs of dates in data frame 1 and data frame 2 (*but only for
members of the same category*). The number of members of each category
varies between the two data frames.
For example:
d <- seq(as.Date("2000-02-12"), as.Date("2009-08-18"), by="weeks")
df1 <- data.frame('A'=sample(1:200,10),
'date'=d[sample(1:length(d),10)],'category'=sample(1:4,10,replace=TRUE))
df2 <- data.frame('A'=sample(1:200,10),
'date'=d[sample(1:length(d),10)],'category'=sample(1:4,10,replace=TRUE))
df1
A date category
1 93 2004-02-28 3
2 105 2001-03-17 3
3 189 2009-07-04 2
4 130 2003-07-05 2
5 160 2005-09-24 2
6 32 2004-11-06 2
7 117 2007-03-17 1
8 161 2003-07-19 4
9 153 2001-09-15 3
10 173 2005-08-27 1
df2
A date category
1 102 2006-08-19 3
2 68 2004-11-27 2
3 137 2003-01-11 1
4 39 2002-12-28 2
5 127 2004-03-06 4
6 125 2002-02-23 2
7 150 2002-05-18 4
8 19 2003-02-22 1
9 80 2000-08-05 1
10 94 2003-12-27 1
Within a loop, I'd do the following (i is my counter; for the example,
I set it to 1):
i<-1
# Create the data frames:
yeari_1 <- df1[which(df1['category']==i),]; yeari_2 <-
df2[which(df2['category']==i),]
# Select only the data from category i
yeari_1
A date category
7 117 2007-03-17 1
10 173 2005-08-27 1
yeari_2
A date category
3 137 2003-01-11 1
8 19 2003-02-22 1
9 80 2000-08-05 1
10 94 2003-12-27 1
# Convert dates to integers
year1_i[[2]] <- as.integer(as.Date(yeari_1[[2]])); yeari_2[[2]] <-
as.integer(as.Date(yeari_2[[2]]));
yeari_1
A date category
7 117 13589 1
10 173 13022 1
yeari_2
A date category
3 137 12063 1
8 19 12105 1
9 80 11174 1
10 94 12413 1
# Get differences of all pairs:
result <- outer(yeari_1[[2]],yeari_2[[2]],'-')
result
[,1] [,2] [,3] [,4]
[1,] 1526 1484 2415 1176
[2,] 959 917 1848 609
# Now, merge the results with the results from all the earlier
iterations for previous values of i, increment i to the next value,
and repeat.
----
Ideally, I could accomplish this in some sort of vectorized manner,
although the Force is not yet strong with me. Any ideas would be
appreciated!
Regards,
Jonathan
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.