> On 28 Dec 2017, at 13:08 , Kurt Hornik <kurt.hor...@wu.ac.at> wrote:
> 
>>>>>> Jan Motl writes:
> 
>> The chisq.test on line 57 contains following code:
>>      STATISTIC <- sum(sort((x - E)^2/E, decreasing = TRUE))
> 
> The preceding 2 lines seem relevant:
> 
>            ## Sorting before summing may look strange, but seems to be
>            ## a sensible way to deal with rounding issues (PR#3486):
>            STATISTIC <- sum(sort((x - E) ^ 2 / E, decreasing = TRUE))
> 
> -k

My thoughts too. PR 3486 is about simulated tables that theoretically have 
STATISTIC equal to the one observed, but come out slightly different, messing 
up the simulated p value. The sort is not actually intended to squeeze the very 
last bit of accuracy out of the computation, just to make sure that the 
round-off affects equivalent tables in the same way. "Fixing" the code may 
therefore unfix PR#3486; at the very least some care is required if this is 
modified.  

-pd


> 
>> However, based on book "Accuracy and stability of numerical algorithms" 
>> available from:
>>      
>> http://ftp.demec.ufpr.br/CFD/bibliografia/Higham_2002_Accuracy%20and%20Stability%20of%20Numerical%20Algorithms.pdf
>> Table 4.1 on page 89, it is better to sort the data in increasing order than 
>> in decreasing order, when the data are non-negative.
> 
>> An example:
>>      x = matrix(c(rep(1.1, 10000)), 10^16, nrow = 10001, ncol = 1)    # We 
>> have a vector with 10000*1.1 and 1*10^16
>>      c(sum(sort(x, decreasing = TRUE)), sum(sort(x, decreasing = FALSE)))
>> The result:
>>      10000000000010996 10000000000011000
>> When we sort the data in the increasing order, we get the correct result. If 
>> we sort the data in the decreasing order, we get a result that is off by 4.
> 
>> Shouldn't the sort be in the increasing order rather than in the decreasing 
>> order?
> 
>> Best regards,
>> Jan Motl
> 
> 
>> PS: This post is based on discussion on 
>> https://stackoverflow.com/questions/47847295/why-does-chisq-test-sort-data-in-descending-order-before-summation
>>  and the response from the post to r-h...@r-project.org.
>> ______________________________________________
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to