Re: [Rd] stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3

Serguei Sokol Wed, 31 May 2017 08:30:47 -0700

Le 31/05/2017 à 16:39, Joris Meys a écrit :

Seriously, if a method gives a wrong result, it's wrong.

I did not understand why you and others were using term "wrong"
based on something that I was considering as just "different" implementation.
More thorough reading revealed that I have overlooked this phrase in the
line's doc: "left and right /thirds/ of the data" (emphasis is mine).


Should I be exiled to Excel department for this sin? That's tough ;)
Serguei.

line() does NOT implement the algorithm of Tukey, even not after the patch. 
We're not discussing Excel here, are we?

The method of Tukey is rather clear, and it is NOT using the default quantile definition from the quantile function. Actually, it doesn't even use quantilesto define the groups. It just says that the groups should be more or less equally spaced. As the method of Tukey relies on the medians of the subgroups, itwould make sense to pick a method that is approximately unbiased with regard to the median. That would be type 8 imho.


To get the size of the outer groups, Tukey would've been more than happy enough 
with a:

> floor(length(dfr$time) / 3)
[1] 6

There you have the size of your left and right group, and now we can discuss 
about which median type should be used for the robust fitting.

But I can honestly not understand why anyone in his right mind would defend a method that is clearly wrong while not working at Microsoft's spreadsheetdepartment.


Cheers
Joris

On Wed, May 31, 2017 at 4:03 PM, Serguei Sokol <[email protected] 
<mailto:[email protected]>> wrote:

    Le 31/05/2017 à 15:40, Joris Meys a écrit :

        OTOH,

        > sapply(1:9, function(i){
        +   sum(dfr$time <= quantile(dfr$time, 1./3., type = i))
        + })
        [1] 8 8 6 6 6 6 8 6 6

        Only the default (type = 7) and the first two types give the result 
lines() gives now. I think there is plenty of reasons to give why any of the 
other
        6 types might be better suited in Tukey's method.

        So to my mind, chaning the definition of line() to give sensible output 
that is in accordance with the theory, does not imply any inconsistency with
        the quantile definition in R. At least not with 6 out of the 9 
different ones ;-)

    Nice shot.
    But OTOE (on the other end ;)
    > sapply(1:9, function(i){
    +   sum(dfr$time >= quantile(dfr$time, 2./3., type = i))
    + })
    [1] 8 8 8 8 6 6 8 6 6

    Here "8" gains 5 votes against 4 for "6". There were two defector methods
    that changed the point number and should be discarded. Which leaves us
    with the score 3:4, still in favor of "6" but the default method should 
prevail
    in my sens.

    Serguei.




--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79
[email protected]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php



--
Serguei Sokol
Ingenieur de recherche INRA
Metabolisme Integre et Dynamique des Systemes Metaboliques (MetaSys)

LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504
135 Avenue de Rangueil
31077 Toulouse Cedex 04

tel: +33 5 6155 9276
fax: +33 5 6704 8825
email: [email protected]
http://metasys.insa-toulouse.fr
http://www.lisbp.fr

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3

Reply via email to