Re: [R] Question about histogram

Dennis Murphy Thu, 13 Jan 2011 13:47:24 -0800

Hi:

On Thu, Jan 13, 2011 at 10:37 AM, Longe <longeli...@gmail.com> wrote:


> Dear list,
>
> I'm new to R, please bear with my silly questions.  I'm trying to get an
> understanding of why the results I get from a call to hist() are not as I
> thought I would get.  When I use the parameter freq=FALSE, I think the plot
> will contain bars that none of them is larger than 1, because they're
> probabilities.  But for my code, the bars exceeded 1.
>

Your perception is incorrect, I'm afraid; the bars in a histogram are not
probabilities, but rather crude estimates of the density in each
subinterval. The *area* of each rectangle gives an approximation to the
probability content (the integral of the density) in each corresponding
interval. (Think of the process of Riemann integration from calculus as an
analogy.)

An example of a continuous distribution whose density is greater than 1 is
the Uniform(0, 0.5) distribution (or any uniform distribution defined on an
interval of width < 1). The distribution is a rectangle with width 0.5 and
area 1 (since all continuous probability densities have total area 1 under
the density function by definition). The height of the rectangle is the
density of the uniform distribution...

As the width of the interval gets smaller, the density (height) must get
bigger since the area is fixed, and is in fact the reciprocal of its width
in the uniform case.

>
> The actual data seems immaterial.  I tried with dummy data:
>
> > hist(runif(1000), freq=FALSE)
>
> and the histogram includes bars well over 1 in height.  The man page says
> that freq=FALSE produces densities, so that the total area is 1.  Clearly if
> all the values are between 0 and 1, as is the case here, some of the bars
> stand out above 1, for the area to be 1.  I thought that it is the sum of
> the bar heights that would be 1, so that the bars reflect probabilities for
> each interval, rather than densities.  So, the answer to my question would
> be "because it's densities, not probabilities", but then the question is,
> why densities and not probabilities?
>

Histograms are meant to estimate continuous probability density functions.
OTOH, in a bar chart of a discrete distribution, relative frequencies are
estimated probabilities of each category because the probabilities are point
masses that add to 1. Perhaps this is the source of your confusion - a
histogram does not have the same interpretation as a bar chart, because it's
estimating a smooth curve over a continuous interval rather than a set of
(probability) masses at fixed points.

HTH,
Dennis

>
> Regards,
> L.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about histogram

Reply via email to