I was using summarize() in a data set in which one of the levels of the by variable was "". The summary statistic was consistently off by one level and the "" level was not in the output data frame. I tried to report it as a bug, but I could not log into the Hmisc bug reporting website to do so. I searched for this in the email archives. If it's there, I failed to find it. Should I try to pursue this as a bug, or am I using summarize incorrectly? Here is my example along with the output:
> tst1 <- data.frame(a=factor(c("", "A", "B", "C")), + x=1:4) > tst1 a x 1 1 2 A 2 3 B 3 4 C 4 > with(tst1, summarize(x, by=llist(a), FUN=mean)) a x 1 A 1 2 B 2 3 C 3 > with(tst1, aggregate(x, by=list(a), FUN=mean)) Group.1 x 1 1 2 A 2 3 B 3 4 C 4 > sessionInfo() R version 2.9.0 (2009-04-17) i486-pc-linux-gnu locale: LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=C;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Hmisc_3.6-0 loaded via a namespace (and not attached): [1] cluster_1.11.13 grid_2.9.0 lattice_0.17-22 Michael ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.