The aggregate function does "almost" all that I need to summarize a datasets, except that I can't specify exclusion of NAs without a little bit of hassle. > set.seed(143) > m <- data.frame(A=sample(LETTERS[1:5], 20, T), B=sample(LETTERS[1:10], 20, > T), C=sample(c(NA, 1:4), 20, T), D=sample(c(NA,1:4), 20, T)) > m A B C D 1 E I 1 NA 2 A C NA NA 3 D I NA 3 4 C I 2 4 5 A C 3 2 6 E J 1 2 7 D J 2 2 8 C G 4 1 9 C D NA 3 10 B G 3 NA 11 C B 4 2 12 A B NA NA 13 E A NA 4 14 B B 3 3 15 E I 4 1 16 E J 3 1 17 B J 4 4 18 B J 1 3 19 D D 4 2 20 B B 4 3 > aggregate(m[,-c(1:2)], by=list(m[,1]), sum) Group.1 C D 1 A NA NA 2 B 15 NA 3 C NA 10 4 D NA 7 5 E NA NA
> aggregate(m[,-c(1:2)], by=list(m[,1]), length) Group.1 C D 1 A 3 3 2 B 5 5 3 C 4 4 4 D 3 3 5 E 5 5 My own defined version of length and sum to exclude NA > mylength <- function(x) { sum(as.logical(x), na.rm=T) } > mysum <- function(x) {sum(x, na.rm=T)} > aggregate(m[,-c(1:2)], by=list(m[,1]), mysum) <----------------- this > computes correctly. Group.1 C D 1 A 3 2 2 B 15 13 3 C 10 10 4 D 6 7 5 E 9 8 > aggregate(m[,-c(1:2)], by=list(m[,1]), mylength) <----------------- this > computes correctly. Group.1 C D 1 A 1 1 2 B 5 4 3 C 3 4 4 D 2 3 5 E 4 4 There are other statistics I need to compute e.g. var, sd, and it is a hassle to create customized versions to exclude NA. Any alternative approaches ? _________________________________________________________________ [[elided Hotmail spam]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.