Re: [R] null distribution of binom.test p values

Chris Wallace Fri, 27 Jan 2012 07:28:26 -0800

Greg, Thomas,

thank you dot the detailed and lucid replies. I understand now. I amdoing multiple tests and wanted to present an overview of results usingpp plots, which were looking very underdispersed. Now I understand why,I think I can generate a "correct" expected distribution which should atleast reassure the biologists when they view them.


Many thanks, Chris.

On 26/01/12 20:03, Thomas Lumley wrote:

On Fri, Jan 27, 2012 at 5:36 AM, Chris Wallace
<chris.wall...@cimr.cam.ac.uk>  wrote:

Greg, thanks for the reply.

Unfortunately, I remain unconvinced!

I ran a longer simulation, 100,000 reps.  The size of the test is
consistently too small (see below) and the histogram shows increasing bars
even within the parts of the histogram with even bar spacing.  See
https://www-gene.cimr.cam.ac.uk/staff/wallace/hist.png

y<-sapply(1:100000, function(i,n=100)
  binom.test(sum(rnorm(n)>0),n,p=0.5,alternative="two")$p.value)
mean(y<0.01)
# [1] 0.00584
mean(y<0.05)
# [1] 0.03431
mean(y<0.1)
# [1] 0.08646

Can that really be due to the discreteness of the distribution?


Yes.  All so-called exact tests tend to be conservative due to
discreteness, and there's quite a lot of discreteness in the tails

The problem is far worse for Fisher's exact test, and worse still for
Fisher's other exact test (of Hardy-Weinberg equilibrium --
http://www.genetics.org/content/180/3/1609.full).

You don't need to rely on finite-sample simulations here: you can
evaluate the level exactly.  Using binom.test() you find that the
rejection regions are y<=39 and y>=61, so the level at nominal 0.05
is:

pbinom(39,100,0.5)+pbinom(60,100,0.5,lower.tail=FALSE)

[1] 0.0352002
agreeing very well with your 0.03431

At nominal 0.01 the exact level is

pbinom(36,100,0.5)+pbinom(63,100,0.5,lower.tail=FALSE)

[1] 0.006637121
and at 0.1 it is

pbinom(41,100,0.5)+pbinom(58,100,0.5,lower.tail=FALSE)

[1] 0.08862608

Your result at nominal 0.01 is a bit low, but I think that's bad luck.
  When I ran your code I got 0.00659 for the estimated level at nominal
0.01, which matches the exact calculations very well


Theoreticians sweep this under the carpet by inventing randomized
tests, where you interpolate a random p-value between the upper and
lower values from a discrete distribution.  It's a very elegant idea
that I'm glad to say I haven't seen used in practice.

      -thomas


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] null distribution of binom.test p values

Reply via email to