On Jul 8, 2009, at 12:17 PM, tathta wrote:
From an email suggestion, here are two sample datasets, and my
ideal output:
dataA <- data.frame(unique.id=c("A","B","C","B"),x=11:14,y=5:2)
dataB <-
data
.frame(unique.id=c("A","B","A","B","A","C","D","A"),x=27:20,y=22:29)
## mystery operation(s) happen here....
## ideal output would be:
dataA <-
data
.frame
(unique
.id
=c("A","B","C","B"),x=11:14,y=5:2,countA=c(1,2,1,2),countB=c(4,2,1,2))
so my mystery operation(s) would count the number of times the
unique id
shows up in a given dataset.
my ideal outputs are as follows:
countA is the "mystery operation" applied to dataA (counting
occurrences
within the same dataset)
countB is applied to dataB (counting occurrences within a second
dataset).
My best try so far is to do:
tempA <- aggregate(dataA$unique.id,list(dataA$unique.id),length)
which gives me a matrix with ONE instance of each unique.id and the
counts...
(and which I thought was kinda cute)
but it only works for within a single dataset!
<snip>
Modify my initial proposal:
countA <- as.data.frame(table(dataA$unique.id), responseName = "countA")
countB <- as.data.frame(table(dataB$unique.id), responseName = "countB")
> countA
Var1 countA
1 A 1
2 B 2
3 C 1
> countB
Var1 countB
1 A 4
2 B 2
3 C 1
4 D 1
dataA <- merge(dataA, countA, by.x = "unique.id", by.y = "Var1")
dataA <- merge(dataA, countB, by.x = "unique.id", by.y = "Var1")
> dataA
unique.id x y countA countB
1 A 11 5 1 4
2 B 12 4 2 2
3 B 14 2 2 2
4 C 13 3 1 1
Note that without 'all.x = TRUE' in the merge() calls, only those
unique.id's that are common to both datasets will be in the result. If
you want to include unique.id's that are in A, but not in B, using
'all.x = TRUE'.
Note also that by default, 'unique.id' will be alpha sorted in the
output.
HTH,
Marc Schwartz
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.