Dear All I have a question about the hypergeomteric distribution.
Example 1: I have a universe of 6187 objects, and 164 have a particular attribute, therefore 6187-164 do not have that attribute. I sample 249 of those objects, and find that 19 have that attribute. I get a p-value here (looking at just over-representation): phyper(19, 164, 6187-164, 249, lower.tail=FALSE) [1] 7.816235e-06 Example 2: I have a universe of 6187 objects, and 12 have a particular attribute, therefore 6187-12 do not have that attribute. I sample 249 of those objects, and find that 4 have that attribute. I get a p-value here (looking at just over-representation): phyper(4, 12, 6187-12, 249, lower.tail=FALSE) [1] 6.368919e-05 It seems to me that the probability of seeing 19 out of 164 in a sample of 249 being less than the probability of seeing 4 out of 12 in a sample of the same size is counter-intuitive. First off, am I using phyper() properly? Secondly, can someone point me to some documentation explaining why these seemingly counter-intuitive p-values occur? Thanks Mick ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.