(Ted Harding) wrote: > Hi Folks, > I'd like to know how hist() decides how many cells to use > when it ignores my "suggestion" to use say 'hist(...,breaks=50)'. > > More specifically, I have the results of 10000 simulations, > each returning an 8-vector, therefore 8 variables each with > 10000 values. Some of these 8 have somewhat skew distributions. > Say one of these 8 variables is X. > > I ask for H <- hist(X,breaks=50), and get a histogram which > usually has a different number of cells than what I intended. > > For instance, for one of these simulations, the 8 different > values of length(H$breaks) are: > > 70, 44, 38, 68, 50, 40, 46, 45 > > ?hist tells me > > A) > breaks: one of: > * a vector giving the breakpoints between histogram > cells, > * a single number giving the number of cells for the > histogram, > * a character string naming an algorithm to compute the > number of cells (see Details), > * a function to compute the number of cells. > > In the last three cases the number is a suggestion only. > > B) > The default for 'breaks' is '"Sturges"': see 'nclass.Sturges'. > > If I look at the code for nclass.Sturges() I see > > function (x) ceiling(log2(length(x)) + 1) > > and, for length(X) = 10000, this gives 15. This is not related > to any of the numbers of breaks I actually got, in any way obvious > to me. > > So: > Question 1: hist() has apparently ignored my "suggestion" of > "break=50". Why? What is the criterion for ignoring? > > Question 2: Presumably, if it ignores the "suggestion", it > does something else, of its choice. I would then, perhaps, > expect it to fall back to its default, which is (allegedly) > Sturges. But the result from nclass.Sturges looks different > from what it actually did. So what did it actually do, and > how did it decide on this? > No, it is not ignoring you.
Try hist(rnorm(10000)) length(hist(rnorm(10000),breaks=50)$breaks) and repeat a dozen of times or so. Chances are that you'll mostly see lengths around 40, but definitely more than the 17 or so that you'll see without the breaks=50. Next, try diff(hist(rnorm(10000),breaks=50)$breaks) and notice that this is usually 0.2, although if you repeat enough times, you might get a couple of cases with 0.1 and a length of 75(-ish). Get it? Otherwise look at help(pretty) since this is what is doing the work. -p > With thanks, > Ted. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <[EMAIL PROTECTED]> > Fax-to-email: +44 (0)870 094 0861 > Date: 19-May-08 Time: 10:31:20 > ------------------------------ XFMail ------------------------------ > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.