Could you kindly test the following codes? It is because I found strange answer when 'aggregate()' is used with a formula.

I am trying to count how many missing data entries are in each group. For this exercise, I created data as below:

> tmp <- data.frame(grp=c(2,3,2,3), y=c(NA, 0.5, 3, 0.5))
> tmp
  grp   y
1   2  NA
2   3 0.5
3   2 3.0
4   3 0.5

I see that observations (variable y) can be grouped into two groups (variable grp). For group 2, y has NA and 3.0. For group 3, y has 0.5 and 0.5. Hence, the number of missing values is 1 and 0 for group 2 and 3, respectively. This work can be done using 'aggregate()' in the 'stats' package as below:

> aggregate(x=tmp$y, by=list(grp=tmp$grp), function(x) sum(is.na(x)))
  grp x
1   2 1
2   3 0

A formula can be used as below:

> aggregate(y~grp, data=tmp, function(x) sum(is.na(x)))
  grp y
1   2 0
2   3 0

What a surprise! Is this a bug? I would appreciate if you share the results after testing the codes. Thank you so much for your helps in advance!

Chel Hee Lee

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to