Could you kindly test the following codes? It is because I found
strange answer when 'aggregate()' is used with a formula.
I am trying to count how many missing data entries are in each group.
For this exercise, I created data as below:
> tmp <- data.frame(grp=c(2,3,2,3), y=c(NA, 0.5, 3, 0.5))
> tmp
grp y
1 2 NA
2 3 0.5
3 2 3.0
4 3 0.5
I see that observations (variable y) can be grouped into two groups
(variable grp). For group 2, y has NA and 3.0. For group 3, y has 0.5
and 0.5. Hence, the number of missing values is 1 and 0 for group 2 and
3, respectively. This work can be done using 'aggregate()' in the
'stats' package as below:
> aggregate(x=tmp$y, by=list(grp=tmp$grp), function(x) sum(is.na(x)))
grp x
1 2 1
2 3 0
A formula can be used as below:
> aggregate(y~grp, data=tmp, function(x) sum(is.na(x)))
grp y
1 2 0
2 3 0
What a surprise! Is this a bug? I would appreciate if you share the
results after testing the codes. Thank you so much for your helps in
advance!
Chel Hee Lee
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.