Hi, When 'x' is a vector of doubles, it's not clear how 'factor(x)' compares its values in order to determine the levels. For example, here all the values in 'x' are "conceptually" the same:
x <- c(11/3, 2/3 + 4/3 + 5/3, 50 + 11/3 - 50, 7.00001 - 1000003/300000) However, due to machine rounding errors, they are not strictly equal: > duplicated(x) [1] FALSE FALSE FALSE FALSE > unique(x) [1] 3.666667 3.666667 3.666667 3.666667 but they are nearly equal: > all.equal(x, rep(11/3, 4)) [1] TRUE Now factor(), and therefore table() (which seems to be using factor() internally), have a different opinion: > factor(x) [1] 3.66666666666667 3.66666666666667 3.66666666666666 3.66666666666667 Levels: 3.66666666666666 3.66666666666667 > table(x) x 3.66666666666666 3.66666666666667 1 3 So factor() doesn't seem to be using "strict equality" or "near equality" to determine the levels. What does it use? Sorry if I missed it but I couldn't find any information about this in its man page. Wouldn't it be better if factor() was consistent with either duplicated() or all.equal() instead of introducing its own way of comparing doubles that lies somewhere in between? Cheers, H. > sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8 [7] LC_PAPER=en_US.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.12.0 -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel