On 20/12/11 10:24, Michael Fuller wrote:
TOPIC
My question regards the philosophy behind how R implements corrections to 
chi-square statistical tests. At least in recent versions (I'm using 2.13.1 
(2011-07-08) on OSX 10.6.8.), the chisq.test function applies the Yates 
continuity correction for 2 by 2 contingency tables. But when used as a 
goodness of fit test (GoF, aka likelihood ratio test), chisq.test does not 
appear to implement any corrections for widely recognized problems, such as 
small sample size, non-uniform expected frequencies, and one D.F.

> From the help page:
"In the goodness-of-fit case simulation is done by random sampling from the discrete 
distribution specified by p, each sample being of size n = sum(x)."

Is the thinking that random sampling completely obviates the need for 
corrections?
    Yes.
Wouldn't the same statistical issues still apply
    No.
(e.g. poor continuity approximation with one D.F.,
There are no degrees of freedom involved. There is no continuity involved.
    The observed test statistics (say "Stat") is compared with a number of
test statistics, Stat_1, ..., Stat_N, calculated from data sets simulated under the null hypothesis. If the null is true, then Stat and Stat_1, ...., Stat_N are all of ``equal status''. If there are m values of the Stat_i which are greater than Stat, then the ``probability of observing, under the null hypothesis,
    data as extreme as, or more extreme than, what you actually observed''
is the probability of randomly selecting one of a specified set of m+1 ``slots''
    out of a total of N+1 slots (where each slot has probability 1/(N+1)).

    Thus the p-value is (exactly) equal to (m+1)/(N+1).

The only restriction is that there be no ties amongst the values of Stat and Stat_1, ..., Stat_N. There being ties is of fairly low probability, but is not of zero probability --- since there is a finite number of possible samples
    and hence of statistic values.  So this restriction is a mild worry.

    However a ``continuity correction'' would be of no help whatsoever.
problems with non-uniform expected frequencies, etc) with random sampling?

    Don't understand what you mean by this.

        cheers,

            Rolf Turner

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to