It looks like you have found a bug, I can confirm that with your data on my computer that I am getting nonsense results for some cases. I even found that when calling the function on element 164 of the input vector that I don't even get consistent results, I ran it several times and many times got different results.
I looked at the code for the function and I don't think that the problem is in the R code portion, but is in the compiled C portion. You should send this in as a bug report. One note on your code, you don't need sapply, you could just say interp(approx.data$input) and it would give the vector of interpolations. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -----Original Message----- > From: Samuel Wuest [mailto:wue...@tcd.ie] > Sent: Thursday, August 26, 2010 7:34 AM > To: Greg Snow > Cc: r-help@r-project.org > Subject: Re: [R] approxfun-problems (yleft and yright ignored) > > Hi Greg, > thanks for the suggestion: > > I have attached some small dataset that can be used to reproduce the > odd behavior of the approxfun-function. > > If it gets stripped off my email, it can also be downloaded at: > http://bioinf.gen.tcd.ie/approx.data.Rdata > > Strangely, the problem seems specific to the data structure in my > expression set, when I use simulated data, everything worked fine. > > Here is some code that I run and resulted in the strange output that I > have described in my initial post: > > > ### load the data: a list called approx.data > > load(file="approx.data.Rdata") > > ### contains the slots "x", "y", "input" > > names(approx.data) > [1] "x" "y" "input" > > ### with y ranging between 0 and 1 > > range(approx.data$y) > [1] 0 1 > > ### compare ranges of x and input-x values (the latter is a small > subset of 500 data points): > > range(approx.data$x) > [1] 3.098444 7.268812 > > range(approx.data$input) > [1] 3.329408 13.026700 > > > > > > ### generate the interpolation function (warning message benign) > > interp <- approxfun(approx.data$x, approx.data$y, yleft=1, yright=0, > rule=2) > Warning message: > In approxfun(approx.data$x, approx.data$y, yleft = 1, yright = 0, : > collapsing to unique 'x' values > > > > ### apply to input-values > > y.out <- sapply(approx.data$input, interp) > > > > ### still I find output values >1, even though yleft=1: > > range(y.out) > [1] 0.000000 7.207233 > > hist(y.out) > > > > ### and the input-data points for which strange interpolation does > occur have no unusual distribution (however, they lie close to max(x)): > > hist(approx.data$input[which(y.out>1)]) > > The session info can be found below, thanks a million for any help. > > Sam > > On 25 August 2010 19:31, Greg Snow <greg.s...@imail.org> wrote: > > The plots did not come through, see the posting guide for which > attachments are allowed. It will be easier for us to help if you can > send reproducible code (we can copy and paste to run, then examine, > edit, etc.). Try finding a subset of your data for which the problem > still occurs, then send the data if possible, or similar simulated data > if you cannot send original data. > > > > -- > > Gregory (Greg) L. Snow Ph.D. > > Statistical Data Center > > Intermountain Healthcare > > greg.s...@imail.org > > 801.408.8111 > > > > > >> -----Original Message----- > >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > >> project.org] On Behalf Of Samuel Wuest > >> Sent: Wednesday, August 25, 2010 8:20 AM > >> To: r-help@r-project.org > >> Subject: [R] approxfun-problems (yleft and yright ignored) > >> > >> Dear all, > >> > >> I have run into a problem when running some code implemented in the > >> Bioconductor panp-package (applied to my own expression data), > whereby > >> gene > >> expression values of known true negative probesets (x) are > interpolated > >> onto > >> present/absent p-values (y) between 0 and 1 using the *approxfun - > >> function*{stats}; when I have used R version 2.8, everything had > >> worked fine, > >> however, after updating to R 2.11.1., I got unexpected output > >> (explained > >> below). > >> > >> Please correct me here, but as far as I understand, the yleft and > >> yright > >> arguments set the extreme values of the interpolated y-values in > case > >> the > >> input x-values (on whose approxfun is applied) fall outside > range(x). > >> So if > >> I run approxfun with yleft=1 and yright=0 with y-values between 0 > and > >> 1, > >> then I should never get any values higher than 1. However, this is > not > >> the > >> case, as this code-example illustrates: > >> > >> > ### define the x-values used to construct the approxfun, basically > >> these > >> are 2000 expression values ranging from ~ 3 to 7: > >> > xNeg <- NegExprs[, 1] > >> > xNeg <- sort(xNeg, decreasing = TRUE) > >> > > >> > ### generate 2000 y-values between 0 and 1: > >> > yNeg <- seq(0, 1, 1/(length(xNeg) - 1)) > >> > ### define yleft and yright as well as the rule to clarify what > >> should > >> happen if input x-values lie outside range(x): > >> > interp <- approxfun(xNeg, yNeg, yleft = 1, yright = 0, rule=2) > >> Warning message: > >> In approxfun(xNeg, yNeg, yleft = 1, yright = 0, rule = 2) : > >> collapsing to unique 'x' values > >> > ### apply the approxfun to expression data that range from ~2.9 to > >> 13.9 > >> and can therefore lie outside range(xNeg): > >> > PV <- sapply(AllExprs[, 1], interp) > >> > range(PV) > >> [1] 0.000 6208.932 > >> > summary(PV) > >> Min. 1st Qu. Median Mean 3rd Qu. Max. > >> 0.000e+00 0.000e+00 2.774e-03 1.299e+00 3.164e-01 6.209e+03 > >> > >> So the resulting output PV object contains data ranging from 0 to > 6208, > >> the > >> latter of which lies outside yleft and is not anywhere close to > extreme > >> y-values that were used to set up the interp-function. This seems > wrong > >> to > >> me, and from what I understand, yleft and yright are simply ignored? > >> > >> I have attached a few histograms that visualize the data > distributions > >> of > >> the objects I xNeg, yNeg, AllExprs[,1] (== input x-values) and PV > (the > >> output), so that it is easier to make sense of the data > structures... > >> > >> Does anyone have an explanation for this or can tell me how to fix > the > >> problem? > >> > >> Thanks a million for any help, best, Sam > >> > >> > sessionInfo() > >> R version 2.11.1 (2010-05-31) > >> x86_64-apple-darwin9.8.0 > >> > >> locale: > >> [1] en_IE.UTF-8/en_IE.UTF-8/C/C/en_IE.UTF-8/en_IE.UTF-8 > >> > >> attached base packages: > >> [1] stats graphics grDevices utils datasets methods base > >> > >> other attached packages: > >> [1] panp_1.18.0 affy_1.26.1 Biobase_2.8.0 > >> > >> loaded via a namespace (and not attached): > >> [1] affyio_1.16.0 preprocessCore_1.10.0 > >> > >> > >> -- > >> ----------------------------------------------------- > >> Samuel Wuest > >> Smurfit Institute of Genetics > >> Trinity College Dublin > >> Dublin 2, Ireland > >> Phone: +353-1-896 2444 > >> Web: http://www.tcd.ie/Genetics/wellmer-2/index.html > >> Email: wue...@tcd.ie > >> ------------------------------------------------------ > > > > > > > > -- > ----------------------------------------------------- > Samuel Wuest > Smurfit Institute of Genetics > Trinity College Dublin > Dublin 2, Ireland > Phone: +353-1-896 2444 > Web: http://www.tcd.ie/Genetics/wellmer-2/index.html > Email: wue...@tcd.ie > ------------------------------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.