On 11/09/2010 10:04 AM, Martin Maechler wrote:
"SW" == Samuel Wuest <wue...@tcd.ie>
    on Thu, 26 Aug 2010 14:34:26 +0100 writes:

    SW> Hi Greg,
    SW> thanks for the suggestion:

    SW> I have attached some small dataset that can be used to reproduce the
    SW> odd behavior of the approxfun-function.

    SW> If it gets stripped off my email, it can also be downloaded at:
    SW> http://bioinf.gen.tcd.ie/approx.data.Rdata

    SW> Strangely, the problem seems specific to the data structure in my
    SW> expression set, when I use simulated data, everything worked fine.

    SW> Here is some code that I run and resulted in the strange output that I
    SW> have described in my initial post:

    >> ### load the data: a list called approx.data
    >> load(file="approx.data.Rdata")
    >> ### contains the slots "x", "y", "input"
    >> names(approx.data)
    SW> [1] "x"     "y"     "input"
    >> ### with y ranging between 0 and 1
    >> range(approx.data$y)
    SW> [1] 0 1
    >> ### compare ranges of x and input-x values (the latter is a small subset 
of 500 data points):
    >> range(approx.data$x)
    SW> [1] 3.098444 7.268812
    >> range(approx.data$input)
    SW> [1]  3.329408 13.026700
>> >> >> ### generate the interpolation function (warning message benign)
    >> interp <- approxfun(approx.data$x, approx.data$y, yleft=1, yright=0, 
rule=2)
    SW> Warning message:
    SW> In approxfun(approx.data$x, approx.data$y, yleft = 1, yright = 0,  :
    SW> collapsing to unique 'x' values
>> >> ### apply to input-values
    >> y.out <- sapply(approx.data$input, interp)
>> >> ### still I find output values >1, even though yleft=1:
    >> range(y.out)
    SW> [1] 0.000000 7.207233


I get completely different (and correct) results,
by the way the *same* you have in the bug report you've
submitted (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14377)
and which does *not* show any bug:

range(y.out)
[1] 0.0000000 0.9816907

Of course, I do believe that you've seen the above problems,
-- on 64-bit Mac ? as you report in sessionInfo() ? --
but I cannot reproduce them.

And also, you seem yourself to be able to get different results
for the same data... what are the circumstances?

I think Greg Snow's analysis (linked from the bug report) is right. length(unique(x)) can be different than length(levels(as.factor(x))) (which is what tapply uses to determine the number of levels).

I've fixed it by replacing

y <- as.vector(tapply(y,x,ties))

with

y <- as.vector(tapply(y,match(x,x),ties))

since match() uses the same rules as unique.

An argument could be made that tapply should be changed instead, but it is behaving as documented, so I didn't want to do that.

Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to