>>>>> "PD" == Peter Dalgaard <p.dalga...@biostat.ku.dk> >>>>> on Sun, 12 Jul 2009 11:11:37 +0200 writes:
PD> m.craw...@imperial.ac.uk wrote: >> In a Box and Whisker plot, I thought that when there are outliers both abov= >> e and below the whiskers, then the whiskers should both be the same length = >> (plus or minus 1.5 times the inter-quartile range). PD> Not according to the docs: PD> range: this determines how far the plot whiskers extend out from the PD> box. If 'range' is positive, the whiskers extend to the most PD> extreme data point which is no more than 'range' times the PD> interquartile range from the box. A value of zero causes the PD> whiskers to extend to the data extremes. PD> And the code itself has PD> stats[c(1, 5)] <- range(x[!out], na.rm = TRUE) PD> So the whisker won't be equal to 1.5 IQR unless there happens to be an PD> observation there. PD> Now, this might be wrong, but people have tried very hard to make the PD> implementation follow the original definition due to Tukey. I.e., if you PD> can point out that Tukey specified it otherwise, then we'd change it, PD> otherwise it is just not a bug. I'd bet pretty large amounts that we (and S and S-plus probably quite few otherpackages) have implemented the whiskers the way JWT defined them, very purposefully. One of JWT's point *was* exactly that most of the values "drawn" represent *observations* (and those that do not use exact mid points of obs.): It's not by coincidence or even queerness that the box is *not* delineated by the usual quartiles, but rather the *hinges* [ Digression about hinges vs quartiles : ?boxplot.stats has a section 'Details' to which I had added such information about decade ago. Whereas our R help pages ( ?boxplot.stats, ?fivenum ) do use the correct definitions, unfortunately many other places do *not*, e.g., even the Wikipedia page http://en.wikipedia.org/wiki/Five-number_summary wrongly talks about 1st and 3rd quartile, but then at least uses a numerical example using the hinges ] Martin Maechler, ETH Zurich >> If you look at the plot for SilwoodWeather on p.155 of The R Book you will = >> see that for November (month =3D 11) the upper whisker is shorter than the = >> lower, while for other months with outliers both above and below, the lines= >> are the same lengths. PD> For easier reproduction (reproducible examples should not refer to files PD> on your C: drive...): >> diff(boxplot({set.seed(9);x<-rnorm(50)})$stats) PD> [,1] PD> [1,] 1.2525857 PD> [2,] 0.5412128 PD> [3,] 0.6083348 PD> [4,] 1.4625057 PD> -- PD> O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B PD> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K PD> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 PD> ~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 PD> ______________________________________________ PD> R-devel@r-project.org mailing list PD> https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel