On Jul 8, 2009, at 10:09 AM, tathta wrote:
I have two dataframes, the first column of each dataframe is a
unique id
number (the rest of the columns are data variables).
I would like to figure out how many times each id number appears in
each
dataframe.
So far I can use:
length( match (dataframeA$unique.id[1], dataframeB$unique.id) )
but this only works on each row of dataframe A one-at-a-time.
I would like to do this for all of the rows in dataframe A, and then
put the
results in a new variable: dataframeA$count
I'm new to R, so please be patient with me!
Sorry if this question has already been answered, my search of the
archives
only brought up one relevant post, and I didn't understand the
answer to
it.... http://www.nabble.com/match-to20799206.html#a20799206
If I am correctly understanding what you are looking for, you could do
something like the following:
# Create some simple data. Note that only a subset of the ID's (3:5)
will match across the two DF's:
set.seed(1)
DF.A <- data.frame(ID = sample(1:5, 10, replace = TRUE))
DF.B <- data.frame(ID = sample(3:7, 10, replace = TRUE))
> DF.A
ID
1 2
2 2
3 3
4 5
5 2
6 5
7 5
8 4
9 4
10 1
> DF.B
ID
1 4
2 3
3 6
4 4
5 6
6 5
7 6
8 7
9 4
10 6
Now, create counts of the IDs in each, coercing the results to data
frames and setting the count column name for each:
TAB.A <- as.data.frame(table(DF.A$ID), responseName = "Count.A")
TAB.B <- as.data.frame(table(DF.B$ID), responseName = "Count.B")
> TAB.A
Var1 Count.A
1 1 1
2 2 3
3 3 1
4 4 2
5 5 3
> TAB.B
Var1 Count.B
1 3 1
2 4 3
3 5 1
4 6 4
5 7 1
Now, use merge() to join each of the two above. 'all = TRUE' will
include non-matching keys:
> merge(TAB.A, TAB.B, by = "Var1", all = TRUE)
Var1 Count.A Count.B
1 1 1 NA
2 2 3 NA
3 3 1 1
4 4 2 3
5 5 3 1
6 6 NA 4
7 7 NA 1
Note that you will get NAs for any non-matching ID's (Var1).
See ?table, ?as.data.frame and ?merge for more information.
HTH,
Marc Schwartz
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.