On 20/12/11 10:24, Michael Fuller wrote:
TOPIC
My question regards the philosophy behind how R implements corrections to
chi-square statistical tests. At least in recent versions (I'm using 2.13.1
(2011-07-08) on OSX 10.6.8.), the chisq.test function applies the Yates
continuity correction for 2 by 2 contingency tables. But when used as a
goodness of fit test (GoF, aka likelihood ratio test), chisq.test does not
appear to implement any corrections for widely recognized problems, such as
small sample size, non-uniform expected frequencies, and one D.F.
> From the help page:
"In the goodness-of-fit case simulation is done by random sampling from the discrete
distribution specified by p, each sample being of size n = sum(x)."
Is the thinking that random sampling completely obviates the need for
corrections?
Yes.
Wouldn't the same statistical issues still apply
No.
(e.g. poor continuity approximation with one D.F.,
There are no degrees of freedom involved. There is no continuity
involved.
The observed test statistics (say "Stat") is compared with a number of
test statistics, Stat_1, ..., Stat_N, calculated from data sets
simulated under
the null hypothesis. If the null is true, then Stat and Stat_1,
...., Stat_N are
all of ``equal status''. If there are m values of the Stat_i which
are greater
than Stat, then the ``probability of observing, under the null
hypothesis,
data as extreme as, or more extreme than, what you actually observed''
is the probability of randomly selecting one of a specified set of
m+1 ``slots''
out of a total of N+1 slots (where each slot has probability 1/(N+1)).
Thus the p-value is (exactly) equal to (m+1)/(N+1).
The only restriction is that there be no ties amongst the values of
Stat
and Stat_1, ..., Stat_N. There being ties is of fairly low
probability, but is
not of zero probability --- since there is a finite number of
possible samples
and hence of statistic values. So this restriction is a mild worry.
However a ``continuity correction'' would be of no help whatsoever.
problems with non-uniform expected frequencies, etc) with random sampling?
Don't understand what you mean by this.
cheers,
Rolf Turner
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.