Hi: On Thu, Jan 13, 2011 at 10:37 AM, Longe <longeli...@gmail.com> wrote:
> Dear list, > > I'm new to R, please bear with my silly questions. I'm trying to get an > understanding of why the results I get from a call to hist() are not as I > thought I would get. When I use the parameter freq=FALSE, I think the plot > will contain bars that none of them is larger than 1, because they're > probabilities. But for my code, the bars exceeded 1. > Your perception is incorrect, I'm afraid; the bars in a histogram are not probabilities, but rather crude estimates of the density in each subinterval. The *area* of each rectangle gives an approximation to the probability content (the integral of the density) in each corresponding interval. (Think of the process of Riemann integration from calculus as an analogy.) An example of a continuous distribution whose density is greater than 1 is the Uniform(0, 0.5) distribution (or any uniform distribution defined on an interval of width < 1). The distribution is a rectangle with width 0.5 and area 1 (since all continuous probability densities have total area 1 under the density function by definition). The height of the rectangle is the density of the uniform distribution... As the width of the interval gets smaller, the density (height) must get bigger since the area is fixed, and is in fact the reciprocal of its width in the uniform case. > > The actual data seems immaterial. I tried with dummy data: > > > hist(runif(1000), freq=FALSE) > > and the histogram includes bars well over 1 in height. The man page says > that freq=FALSE produces densities, so that the total area is 1. Clearly if > all the values are between 0 and 1, as is the case here, some of the bars > stand out above 1, for the area to be 1. I thought that it is the sum of > the bar heights that would be 1, so that the bars reflect probabilities for > each interval, rather than densities. So, the answer to my question would > be "because it's densities, not probabilities", but then the question is, > why densities and not probabilities? > Histograms are meant to estimate continuous probability density functions. OTOH, in a bar chart of a discrete distribution, relative frequencies are estimated probabilities of each category because the probabilities are point masses that add to 1. Perhaps this is the source of your confusion - a histogram does not have the same interpretation as a bar chart, because it's estimating a smooth curve over a continuous interval rather than a set of (probability) masses at fixed points. HTH, Dennis > > Regards, > L. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.