Re: [Rd] Numerical stability in chisq.test

Kurt Hornik Thu, 28 Dec 2017 04:08:39 -0800

>>>>> Jan Motl writes:

> The chisq.test on line 57 contains following code:
>       STATISTIC <- sum(sort((x - E)^2/E, decreasing = TRUE))


The preceding 2 lines seem relevant:

            ## Sorting before summing may look strange, but seems to be
            ## a sensible way to deal with rounding issues (PR#3486):
            STATISTIC <- sum(sort((x - E) ^ 2 / E, decreasing = TRUE))

-k

> However, based on book "Accuracy and stability of numerical algorithms" 
> available from:
>       
> http://ftp.demec.ufpr.br/CFD/bibliografia/Higham_2002_Accuracy%20and%20Stability%20of%20Numerical%20Algorithms.pdf
> Table 4.1 on page 89, it is better to sort the data in increasing order than 
> in decreasing order, when the data are non-negative.

> An example:
>       x = matrix(c(rep(1.1, 10000)), 10^16, nrow = 10001, ncol = 1)    # We 
> have a vector with 10000*1.1 and 1*10^16
>       c(sum(sort(x, decreasing = TRUE)), sum(sort(x, decreasing = FALSE)))
> The result:
>       10000000000010996 10000000000011000
> When we sort the data in the increasing order, we get the correct result. If 
> we sort the data in the decreasing order, we get a result that is off by 4.

> Shouldn't the sort be in the increasing order rather than in the decreasing 
> order?

> Best regards,
> Jan Motl


> PS: This post is based on discussion on 
> https://stackoverflow.com/questions/47847295/why-does-chisq-test-sort-data-in-descending-order-before-summation
>  and the response from the post to [email protected].
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Numerical stability in chisq.test

Reply via email to