Re: [R] On Corrections for Chi-Sq Goodness of Fit Test

Rolf Turner Thu, 22 Dec 2011 19:57:44 -0800

On 20/12/11 10:24, Michael Fuller wrote:

TOPIC
My question regards the philosophy behind how R implements corrections to 
chi-square statistical tests. At least in recent versions (I'm using 2.13.1 
(2011-07-08) on OSX 10.6.8.), the chisq.test function applies the Yates 
continuity correction for 2 by 2 contingency tables. But when used as a 
goodness of fit test (GoF, aka likelihood ratio test), chisq.test does not 
appear to implement any corrections for widely recognized problems, such as 
small sample size, non-uniform expected frequencies, and one D.F.


> From the help page:
"In the goodness-of-fit case simulation is done by random sampling from the discrete 
distribution specified by p, each sample being of size n = sum(x)."

Is the thinking that random sampling completely obviates the need for 
corrections?

    Yes.

Wouldn't the same statistical issues still apply

No.

(e.g. poor continuity approximation with one D.F.,

There are no degrees of freedom involved. There is no continuityinvolved.

    The observed test statistics (say "Stat") is compared with a number of

test statistics, Stat_1, ..., Stat_N, calculated from data setssimulated underthe null hypothesis. If the null is true, then Stat and Stat_1,...., Stat_N areall of ``equal status''. If there are m values of the Stat_i whichare greaterthan Stat, then the ``probability of observing, under the nullhypothesis,

    data as extreme as, or more extreme than, what you actually observed''

is the probability of randomly selecting one of a specified set ofm+1 ``slots''

    out of a total of N+1 slots (where each slot has probability 1/(N+1)).

    Thus the p-value is (exactly) equal to (m+1)/(N+1).

The only restriction is that there be no ties amongst the values ofStatand Stat_1, ..., Stat_N. There being ties is of fairly lowprobability, but isnot of zero probability --- since there is a finite number ofpossible samples

    and hence of statistic values.  So this restriction is a mild worry.

    However a ``continuity correction'' would be of no help whatsoever.

problems with non-uniform expected frequencies, etc) with random sampling?


    Don't understand what you mean by this.

        cheers,

            Rolf Turner

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] On Corrections for Chi-Sq Goodness of Fit Test

Reply via email to