On Mon, 20 Sep 2010, Duncan Murdoch wrote:
On 20/09/2010 9:54 AM, A wrote:
Dear all,
I'm performing a t-test on two normal distributions with identical mean&
standard deviation, and repeating this tests a very large number of times
to
describe an representative p value distribution in a null case. As a part
of
this, the program bins these values in 10 evenly distributed bins between 0
and 1 and reports the number of observations in each bin. What I have
noticed is that even after 500,000 replications the number in my lowest bin
is consistently ~5% smaller than the number in all the other bins, which
are
similar within about 1% of each other. Is there any reason, perhaps to do
with random number generation in R or the nature of the normal distribution
simulated by the rnorm function that could explain this depletion?
No, equal sized bins should expect equal numbers of entries. But your code
may have errors in it.
It's actually more interesting than that.
Here's the code in my preferred format for small simulations
one.test <- function (m = -0.065, s = 0.0837)
{
x <- rnorm(6, m, s)
y <- rnorm(6, m, s)
t.test(x, y)$p.value
}
pvals <- replicate(1e5, one.test())
Now, binning could be done wrong, and in any case isn't a good way to compare
distributions, so we try
qqplot(pvals, ppoints(1e5))
and, since the issue is with small p-values
qqplot(log10(pvals), log10(ppoints(1e5)))
and we see that there are in fact fewer small p-values than there should be,
starting at about 10^(-2.5).
After thinking about this for a while, we realize that the t.test() function
shouldn't be expected to give exactly uniform p-values -- it's not as if it is
doing an exact test, after all.
We smugly repeat the exercise with
one.ev.test <- function (m = -0.065, s = 0.0837)
{
x <- rnorm(6, m, s)
y <- rnorm(6, m, s)
t.test(x, y, var.equal=TRUE)$p.value
}
ev.pvals <- replicate(1e5, one.test())
and the problem is solved.
-thomas
Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.