On Fri, Jan 27, 2012 at 5:36 AM, Chris Wallace <chris.wall...@cimr.cam.ac.uk> wrote: > Greg, thanks for the reply. > > Unfortunately, I remain unconvinced! > > I ran a longer simulation, 100,000 reps. The size of the test is > consistently too small (see below) and the histogram shows increasing bars > even within the parts of the histogram with even bar spacing. See > https://www-gene.cimr.cam.ac.uk/staff/wallace/hist.png > > y<-sapply(1:100000, function(i,n=100) > binom.test(sum(rnorm(n)>0),n,p=0.5,alternative="two")$p.value) > mean(y<0.01) > # [1] 0.00584 > mean(y<0.05) > # [1] 0.03431 > mean(y<0.1) > # [1] 0.08646 > > Can that really be due to the discreteness of the distribution?
Yes. All so-called exact tests tend to be conservative due to discreteness, and there's quite a lot of discreteness in the tails The problem is far worse for Fisher's exact test, and worse still for Fisher's other exact test (of Hardy-Weinberg equilibrium -- http://www.genetics.org/content/180/3/1609.full). You don't need to rely on finite-sample simulations here: you can evaluate the level exactly. Using binom.test() you find that the rejection regions are y<=39 and y>=61, so the level at nominal 0.05 is: > pbinom(39,100,0.5)+pbinom(60,100,0.5,lower.tail=FALSE) [1] 0.0352002 agreeing very well with your 0.03431 At nominal 0.01 the exact level is > pbinom(36,100,0.5)+pbinom(63,100,0.5,lower.tail=FALSE) [1] 0.006637121 and at 0.1 it is > pbinom(41,100,0.5)+pbinom(58,100,0.5,lower.tail=FALSE) [1] 0.08862608 Your result at nominal 0.01 is a bit low, but I think that's bad luck. When I ran your code I got 0.00659 for the estimated level at nominal 0.01, which matches the exact calculations very well Theoreticians sweep this under the carpet by inventing randomized tests, where you interpolate a random p-value between the upper and lower values from a discrete distribution. It's a very elegant idea that I'm glad to say I haven't seen used in practice. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.