>>>>> Duncan Murdoch <murdoch.dun...@gmail.com> >>>>> on Sat, 11 Sep 2010 10:32:38 -0400 writes:
> On 11/09/2010 10:04 AM, Martin Maechler wrote: >>>>>>> "SW" == Samuel Wuest <wue...@tcd.ie> >>>>>>> on Thu, 26 Aug 2010 14:34:26 +0100 writes: >> SW> Hi Greg, SW> thanks for the suggestion: >> SW> I have attached some small dataset that can be used to reproduce the SW> odd behavior of the approxfun-function. >> SW> If it gets stripped off my email, it can also be downloaded at: SW> http://bioinf.gen.tcd.ie/approx.data.Rdata >> SW> Strangely, the problem seems specific to the data structure in my SW> expression set, when I use simulated data, everything worked fine. >> SW> Here is some code that I run and resulted in the strange output that I SW> have described in my initial post: >> >> >> ### load the data: a list called approx.data >> >> load(file="approx.data.Rdata") >> >> ### contains the slots "x", "y", "input" >> >> names(approx.data) SW> [1] "x" "y" "input" >> >> ### with y ranging between 0 and 1 >> >> range(approx.data$y) SW> [1] 0 1 >> >> ### compare ranges of x and input-x values (the latter is a small subset of 500 data points): >> >> range(approx.data$x) SW> [1] 3.098444 7.268812 >> >> range(approx.data$input) SW> [1] 3.329408 13.026700 >> >> >> >> >> >> ### generate the interpolation function (warning message benign) >> >> interp <- approxfun(approx.data$x, approx.data$y, yleft=1, yright=0, rule=2) SW> Warning message: SW> In approxfun(approx.data$x, approx.data$y, yleft = 1, yright = 0, : SW> collapsing to unique 'x' values >> >> >> >> ### apply to input-values >> >> y.out <- sapply(approx.data$input, interp) >> >> >> >> ### still I find output values >1, even though yleft=1: >> >> range(y.out) SW> [1] 0.000000 7.207233 >> >> >> I get completely different (and correct) results, >> by the way the *same* you have in the bug report you've >> submitted >> (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14377) >> and which does *not* show any bug: >> >>> range(y.out) >> [1] 0.0000000 0.9816907 >> >> Of course, I do believe that you've seen the above problems, >> -- on 64-bit Mac ? as you report in sessionInfo() ? -- >> but I cannot reproduce them. >> >> And also, you seem yourself to be able to get different results >> for the same data... what are the circumstances? > I think Greg Snow's analysis (linked from the bug report) is right. > length(unique(x)) can be different than length(levels(as.factor(x))) > (which is what tapply uses to determine the number of levels). > I've fixed it by replacing > y <- as.vector(tapply(y,x,ties)) > with > y <- as.vector(tapply(y,match(x,x),ties)) > since match() uses the same rules as unique. > An argument could be made that tapply should be changed instead, but it > is behaving as documented, so I didn't want to do that. I think you've worked around the problem (and I had seen Greg's analysis), thank you. BTW, the problem wasn't there in R 2.8.x, as we since had improved factor() .... {.. and btw, the same changes should happen to approx(), spline(), splinefun() ..} *However* why on earth can the results become *random*, i.e., differ from one call to the other, with identical data in the same R session? I currently guess that sometimes the operands (of the linear interpolation rational arithmetic) are kept in high-precision registers and sometimes not, I must say I am puzzled that this happens dynamically rather than determined by the exact compiler flags at (R's) build time ?? (and yes, this thread should have been on R-devel rather than R-help from the start ...) Martin ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.