Re: [R] A question about the hypergeometric distribution and phyper()

Stefan Evert Wed, 10 Sep 2008 07:13:47 -0700


On 10 Sep 2008, at 15:19, michael watson (IAH-C) wrote:

Example 1: I have a universe of 6187 objects, and 164 have aparticularattribute, therefore 6187-164 do not have that attribute. I sample249of those objects, and find that 19 have that attribute. I get a p-value
here (looking at just over-representation):

phyper(19, 164, 6187-164, 249, lower.tail=FALSE)
[1] 7.816235e-06


Actually, if you look at ?phyper, you'll see that this should be

phyper(18, 164, 6187-164, 249, lower.tail=FALSE)
[1] 2.775819e-05

if you want to calculate Pr(X >= 19) = Pr(X > 18). Similarly:

phyper(4, 12, 6187-12, 249, lower.tail=FALSE)
[1] 6.368919e-05


phyper(3, 12, 6187-12, 249, lower.tail=FALSE)
[1] 0.0009816739

Which you'll still find counterintuitive, of course.

It seems to me that the probability of seeing 19 out of 164 in asampleof 249 being less than the probability of seeing 4 out of 12 in asample
of the same size is counter-intuitive.

Secondly, can someone point me to some documentation explaining why
these seemingly counter-intuitive p-values occur?

I think it's just because the hypergeometric distribution becomes veryskewed and non-normal for expected values < 1 (expectations should beroughly 6.6 in the first case and 0.5 in the second case). Perhaps ithelps to visualize the two distributions?

M <- rbind(dhyper(0:20, 164, 6187-164, 249), dhyper(0:20, 12, 6187-12,249))

rownames(M) <- c("164 out of 6187", "12 out of 6187")
colnames(M) <- 0:20
barplot(M, beside=TRUE, legend = TRUE)


Best regards,
Stefan Evert

[ [EMAIL PROTECTED] | http://purl.org/stefan.evert ]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A question about the hypergeometric distribution and phyper()

Reply via email to