Hi Greg, thanks for the suggestion: I have attached some small dataset that can be used to reproduce the odd behavior of the approxfun-function.
If it gets stripped off my email, it can also be downloaded at: http://bioinf.gen.tcd.ie/approx.data.Rdata Strangely, the problem seems specific to the data structure in my expression set, when I use simulated data, everything worked fine. Here is some code that I run and resulted in the strange output that I have described in my initial post: > ### load the data: a list called approx.data > load(file="approx.data.Rdata") > ### contains the slots "x", "y", "input" > names(approx.data) [1] "x" "y" "input" > ### with y ranging between 0 and 1 > range(approx.data$y) [1] 0 1 > ### compare ranges of x and input-x values (the latter is a small subset of > 500 data points): > range(approx.data$x) [1] 3.098444 7.268812 > range(approx.data$input) [1] 3.329408 13.026700 > > > ### generate the interpolation function (warning message benign) > interp <- approxfun(approx.data$x, approx.data$y, yleft=1, yright=0, rule=2) Warning message: In approxfun(approx.data$x, approx.data$y, yleft = 1, yright = 0, : collapsing to unique 'x' values > > ### apply to input-values > y.out <- sapply(approx.data$input, interp) > > ### still I find output values >1, even though yleft=1: > range(y.out) [1] 0.000000 7.207233 > hist(y.out) > > ### and the input-data points for which strange interpolation does occur have > no unusual distribution (however, they lie close to max(x)): > hist(approx.data$input[which(y.out>1)]) The session info can be found below, thanks a million for any help. Sam On 25 August 2010 19:31, Greg Snow <greg.s...@imail.org> wrote: > The plots did not come through, see the posting guide for which attachments > are allowed. It will be easier for us to help if you can send reproducible > code (we can copy and paste to run, then examine, edit, etc.). Try finding a > subset of your data for which the problem still occurs, then send the data if > possible, or similar simulated data if you cannot send original data. > > -- > Gregory (Greg) L. Snow Ph.D. > Statistical Data Center > Intermountain Healthcare > greg.s...@imail.org > 801.408.8111 > > >> -----Original Message----- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- >> project.org] On Behalf Of Samuel Wuest >> Sent: Wednesday, August 25, 2010 8:20 AM >> To: r-help@r-project.org >> Subject: [R] approxfun-problems (yleft and yright ignored) >> >> Dear all, >> >> I have run into a problem when running some code implemented in the >> Bioconductor panp-package (applied to my own expression data), whereby >> gene >> expression values of known true negative probesets (x) are interpolated >> onto >> present/absent p-values (y) between 0 and 1 using the *approxfun - >> function*{stats}; when I have used R version 2.8, everything had >> worked fine, >> however, after updating to R 2.11.1., I got unexpected output >> (explained >> below). >> >> Please correct me here, but as far as I understand, the yleft and >> yright >> arguments set the extreme values of the interpolated y-values in case >> the >> input x-values (on whose approxfun is applied) fall outside range(x). >> So if >> I run approxfun with yleft=1 and yright=0 with y-values between 0 and >> 1, >> then I should never get any values higher than 1. However, this is not >> the >> case, as this code-example illustrates: >> >> > ### define the x-values used to construct the approxfun, basically >> these >> are 2000 expression values ranging from ~ 3 to 7: >> > xNeg <- NegExprs[, 1] >> > xNeg <- sort(xNeg, decreasing = TRUE) >> > >> > ### generate 2000 y-values between 0 and 1: >> > yNeg <- seq(0, 1, 1/(length(xNeg) - 1)) >> > ### define yleft and yright as well as the rule to clarify what >> should >> happen if input x-values lie outside range(x): >> > interp <- approxfun(xNeg, yNeg, yleft = 1, yright = 0, rule=2) >> Warning message: >> In approxfun(xNeg, yNeg, yleft = 1, yright = 0, rule = 2) : >> collapsing to unique 'x' values >> > ### apply the approxfun to expression data that range from ~2.9 to >> 13.9 >> and can therefore lie outside range(xNeg): >> > PV <- sapply(AllExprs[, 1], interp) >> > range(PV) >> [1] 0.000 6208.932 >> > summary(PV) >> Min. 1st Qu. Median Mean 3rd Qu. Max. >> 0.000e+00 0.000e+00 2.774e-03 1.299e+00 3.164e-01 6.209e+03 >> >> So the resulting output PV object contains data ranging from 0 to 6208, >> the >> latter of which lies outside yleft and is not anywhere close to extreme >> y-values that were used to set up the interp-function. This seems wrong >> to >> me, and from what I understand, yleft and yright are simply ignored? >> >> I have attached a few histograms that visualize the data distributions >> of >> the objects I xNeg, yNeg, AllExprs[,1] (== input x-values) and PV (the >> output), so that it is easier to make sense of the data structures... >> >> Does anyone have an explanation for this or can tell me how to fix the >> problem? >> >> Thanks a million for any help, best, Sam >> >> > sessionInfo() >> R version 2.11.1 (2010-05-31) >> x86_64-apple-darwin9.8.0 >> >> locale: >> [1] en_IE.UTF-8/en_IE.UTF-8/C/C/en_IE.UTF-8/en_IE.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] panp_1.18.0 affy_1.26.1 Biobase_2.8.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.16.0 preprocessCore_1.10.0 >> >> >> -- >> ----------------------------------------------------- >> Samuel Wuest >> Smurfit Institute of Genetics >> Trinity College Dublin >> Dublin 2, Ireland >> Phone: +353-1-896 2444 >> Web: http://www.tcd.ie/Genetics/wellmer-2/index.html >> Email: wue...@tcd.ie >> ------------------------------------------------------ > > -- ----------------------------------------------------- Samuel Wuest Smurfit Institute of Genetics Trinity College Dublin Dublin 2, Ireland Phone: +353-1-896 2444 Web: http://www.tcd.ie/Genetics/wellmer-2/index.html Email: wue...@tcd.ie ------------------------------------------------------
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.