Re: [Rd] stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3

Serguei Sokol Thu, 01 Jun 2017 02:40:49 -0700

Le 31/05/2017 à 22:00, Martin Maechler a écrit :

Serguei Sokol <so...@insa-toulouse.fr>
     on Wed, 31 May 2017 18:46:34 +0200 writes:

     > Le 31/05/2017 à 17:30, Serguei Sokol a écrit :
     >>
     >> More thorough reading revealed that I have overlooked this phrase in the
     >> line's doc: "left and right /thirds/ of the data" (emphasis is mine).
     > Oops. I have read the first ref returned by google and it happened to be
     > tibco's doc, not the R's one. The layout is very similar hence my 
mistake.
     > The latter does not mention "thirds" but ...
     > Anyway, here is a new line's patch which still gives a result slightly 
different
     > form MMline(). The slope is the same but not the intercept.
     > What are the exact terms for intercept calculation that should be 
implemented?


     > Serguei.

Sorry Serguei,   I have new version of line.c  since yesterday,
and will not be disturbed anymore.

Note that I *did* give the litterature, and it seems most
discussants don't have paper books in physical libraries anymore;
In this case, interestingly, you need one of those I think -
almost everything I found online did not have the exact details.

Fortunately, you keep old good habits regarding paper books ;)

Peter Dalgaard definitely was right that Tukey did not use
quantiles at all, and notably did *not* define the three groups
via   {i;  x_i <= x_L}  and {i; x_i >= X_R}  which (as I think
you noticed) may make the groups quite unbalanced in case of duplicated x's.

But then, for now I had decided to fix the bug (namely computing
the x-medians wrongly as you diagnosed correctly(!) -- but your
first 2 patches only fixed partly) *and* go at least one step in
the direction of Tukey's original, namely by allowing iteration via a new 
'iter' argument.

Hm, I did not use iterations. A newly introduced indx is used to keep
index permutation when x is sorted.

I have also updated the help page to document what  line()  has
been computing all these years {apart from the bug which
typically shows for non-equidistant x[]}.

You mean "non equally sized"? (bis ;) )

We could also consider to eventually add a new   'method = <string>'
argument to line()  one version of which would continue to
compute the current solution,

If the current solution is considered as plainly wrong, why to continue
to implement it? Unless "by current version" you mean your implementation
equivalent to my patch2 which fixes group sizes.

  another would compute the one
corresponding to Velleman & Hoaglin (1981)'s  FORTRAN
implementation (which had to be corrected for some infinite-loop
cases!)... not in the close future though

What would be the interest of this fortran version? Faster? More accurate?

Given all this discussions here, I think I should commit what I
currently have  ASAP.

+1.

Serguei.

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3

Reply via email to