Tukey divides the points into three groups, not the x and y values separately.
I'll try to get hold of the book for a direct quote, might take a couple of days. On Mon, May 29, 2017 at 8:40 AM, Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > On 27/05/2017 9:28 PM, GlenB wrote: > >> Bug: stats::line() does not produce correct Tukey line when n mod 6 is 2 >> or >> 3 >> >> Example: line(1:9,1:9) should have intercept 0 and slope 1 but it gives >> intercept -1 and slope 1.2 >> >> Trying line(1:i,1:i) across a range of i makes it clear there's a cycle of >> length 6, with four of every six correct. >> >> Bug has been present across many versions. >> >> The machine I just tried it on just now has R3.2.3: >> > > If you look at the source (in src/library/stats/src/line.c), the > explanation is clear: the x value is chosen as the 1/6 quantile (according > to a particular definition of quantile), and the y value is chosen as the > median of the y values where x is less than or equal to the 1/3 quantile. > Those are different definitions (though I think they would be > asymptotically equivalent under pretty weak assumptions), so it's not > surprising the x value doesn't correspond perfectly to the y value, and the > line ends up "wrong". > > So is it a bug? Well, that depends on Tukey's definition. I don't have a > copy of his book handy so I can't really say. Maybe the R function is > doing exactly what Tukey said it should, and that's not a bug. Or maybe R > is wrong. > > Duncan Murdoch > > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel