Hi Greg,
thanks for the suggestion:

I have attached some small dataset that can be used to reproduce the
odd behavior of the approxfun-function.

If it gets stripped off my email, it can also be downloaded at:
http://bioinf.gen.tcd.ie/approx.data.Rdata

Strangely, the problem seems specific to the data structure in my
expression set, when I use simulated data, everything worked fine.

Here is some code that I run and resulted in the strange output that I
have described in my initial post:

> ### load the data: a list called approx.data
> load(file="approx.data.Rdata")
> ### contains the slots "x", "y", "input"
> names(approx.data)
[1] "x"     "y"     "input"
> ### with y ranging between 0 and 1
> range(approx.data$y)
[1] 0 1
> ### compare ranges of x and input-x values (the latter is a small subset of 
> 500 data points):
> range(approx.data$x)
[1] 3.098444 7.268812
> range(approx.data$input)
[1]  3.329408 13.026700
>
>
> ### generate the interpolation function (warning message benign)
> interp <- approxfun(approx.data$x, approx.data$y, yleft=1, yright=0, rule=2)
Warning message:
In approxfun(approx.data$x, approx.data$y, yleft = 1, yright = 0,  :
  collapsing to unique 'x' values
>
> ### apply to input-values
> y.out <- sapply(approx.data$input, interp)
>
> ### still I find output values >1, even though yleft=1:
> range(y.out)
[1] 0.000000 7.207233
> hist(y.out)
>
> ### and the input-data points for which strange interpolation does occur have 
> no unusual distribution (however, they lie close to max(x)):
> hist(approx.data$input[which(y.out>1)])

The session info can be found below, thanks a million for any help.

Sam

On 25 August 2010 19:31, Greg Snow <greg.s...@imail.org> wrote:
> The plots did not come through, see the posting guide for which attachments 
> are allowed.  It will be easier for us to help if you can send reproducible 
> code (we can copy and paste to run, then examine, edit, etc.).  Try finding a 
> subset of your data for which the problem still occurs, then send the data if 
> possible, or similar simulated data if you cannot send original data.
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.s...@imail.org
> 801.408.8111
>
>
>> -----Original Message-----
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
>> project.org] On Behalf Of Samuel Wuest
>> Sent: Wednesday, August 25, 2010 8:20 AM
>> To: r-help@r-project.org
>> Subject: [R] approxfun-problems (yleft and yright ignored)
>>
>> Dear all,
>>
>> I have run into a problem when running some code implemented in the
>> Bioconductor panp-package (applied to my own expression data), whereby
>> gene
>> expression values of known true negative probesets (x) are interpolated
>> onto
>> present/absent p-values (y) between 0 and 1 using the *approxfun -
>> function*{stats}; when I have used R version 2.8, everything had
>> worked fine,
>> however, after updating to R 2.11.1., I got unexpected output
>> (explained
>> below).
>>
>> Please correct me here, but as far as I understand, the yleft and
>> yright
>> arguments set the extreme values of the interpolated y-values in case
>> the
>> input x-values (on whose approxfun is applied) fall outside range(x).
>> So if
>> I run approxfun with yleft=1 and yright=0 with y-values between 0 and
>> 1,
>> then I should never get any values higher than 1. However, this is not
>> the
>> case, as this code-example illustrates:
>>
>> > ### define the x-values used to construct the approxfun, basically
>> these
>> are 2000 expression values ranging from ~ 3 to 7:
>> > xNeg <- NegExprs[, 1]
>> > xNeg <- sort(xNeg, decreasing = TRUE)
>> >
>> > ### generate 2000 y-values between 0 and 1:
>> > yNeg <- seq(0, 1, 1/(length(xNeg) - 1))
>> > ### define yleft and yright as well as the rule to clarify what
>> should
>> happen if input x-values lie outside range(x):
>> > interp <- approxfun(xNeg, yNeg, yleft = 1, yright = 0, rule=2)
>> Warning message:
>> In approxfun(xNeg, yNeg, yleft = 1, yright = 0, rule = 2) :
>>   collapsing to unique 'x' values
>> > ### apply the approxfun to expression data that range from ~2.9 to
>> 13.9
>> and can therefore lie outside range(xNeg):
>> >  PV <- sapply(AllExprs[, 1], interp)
>> > range(PV)
>> [1]    0.000 6208.932
>> > summary(PV)
>>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
>> 0.000e+00 0.000e+00 2.774e-03 1.299e+00 3.164e-01 6.209e+03
>>
>> So the resulting output PV object contains data ranging from 0 to 6208,
>> the
>> latter of which lies outside yleft and is not anywhere close to extreme
>> y-values that were used to set up the interp-function. This seems wrong
>> to
>> me, and from what I understand, yleft and yright are simply ignored?
>>
>> I have attached a few histograms that visualize the data distributions
>> of
>> the objects I xNeg, yNeg, AllExprs[,1] (== input x-values) and PV (the
>> output), so that it is easier to make sense of the data structures...
>>
>> Does anyone have an explanation for this or can tell me how to fix the
>> problem?
>>
>> Thanks a million for any help, best, Sam
>>
>> > sessionInfo()
>> R version 2.11.1 (2010-05-31)
>> x86_64-apple-darwin9.8.0
>>
>> locale:
>> [1] en_IE.UTF-8/en_IE.UTF-8/C/C/en_IE.UTF-8/en_IE.UTF-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] panp_1.18.0   affy_1.26.1   Biobase_2.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] affyio_1.16.0         preprocessCore_1.10.0
>>
>>
>> --
>> -----------------------------------------------------
>> Samuel Wuest
>> Smurfit Institute of Genetics
>> Trinity College Dublin
>> Dublin 2, Ireland
>> Phone: +353-1-896 2444
>> Web: http://www.tcd.ie/Genetics/wellmer-2/index.html
>> Email: wue...@tcd.ie
>> ------------------------------------------------------
>
>



-- 
-----------------------------------------------------
Samuel Wuest
Smurfit Institute of Genetics
Trinity College Dublin
Dublin 2, Ireland
Phone: +353-1-896 2444
Web: http://www.tcd.ie/Genetics/wellmer-2/index.html
Email: wue...@tcd.ie
------------------------------------------------------
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to