I'm coming late to this, but this *does* need a correction just for the archives !
>>>>> "MS" == Marc Schwartz <[EMAIL PROTECTED]> >>>>> on Sat, 01 Dec 2007 13:33:21 -0600 writes: MS> On Sat, 2007-12-01 at 18:40 +0000, David Winsemius wrote: >> David Winsemius <[EMAIL PROTECTED]> wrote in >> news:[EMAIL PROTECTED]: >> >> > "tom soyer" <[EMAIL PROTECTED]> wrote in >> > news:[EMAIL PROTECTED]: >> > >> >> John, >> >> >> >> The Excel's percentrank function works like this: if one has a number, >> >> x for example, and one wants to know the percentile of this number in >> >> a given data set, dataset, one would type =percentrank(dataset,x) in >> >> Excel to calculate the percentile. So for example, if the data set is >> >> c(1:10), and one wants to know the percentile of 2.5 in the data set, >> >> then using the percentrank function one would get 0.166, i.e., 2.5 is >> >> in the 16.6th percentile. >> >> >> >> I am not sure how to program this function in R. I couldn't find it as >> >> a built-in function in R either. It seems to be an obvious choice for >> >> a built-in function. I am very surprised, but maybe we both missed it. >> > >> > My nomination for a function with a similar result would be ecdf(), the >> > empirical cumulative distribution function. It is of class "function" >> so >> > efforts to index ecdf(.)[.] failed for me. I think you did not understand ecdf() !!! It *returns* a function, that you can then apply to old (or new) data; see below MS> You can use ls.str() to look into the function environment: >> ls.str(environment(ecdf(x))) MS> f : num 0 MS> method : int 2 MS> n : int 25 MS> x : num [1:25] -2.215 -1.989 -0.836 -0.820 -0.626 ... MS> y : num [1:25] 0.04 0.08 0.12 0.16 0.2 0.24 0.28 0.32 0.36 0.4 ... MS> yleft : num 0 MS> yright : num 1 MS> You can then use get() or mget() within the function environment to MS> return the requisite values. Something along the lines of the following MS> within the function percentrank(): MS> percentrank <- function(x, val) MS> { MS> env.x <- environment(ecdf(x)) MS> res <- mget(c("x", "y"), env.x) MS> Ind <- which(sapply(seq(length(res$x)), MS> function(i) isTRUE(all.equal(res$x[i], val)))) MS> res$y[Ind] MS> } sorry Marc, but "Yuck !!" - this percentrank() only works when you apply it to original x[i] values - only works for 'val' of length 1 - is a complicated hack and absolutely unneeded (see below) MS> Thus: MS> set.seed(1) MS> x <- rnorm(25) >> x MS> [1] -0.62645381 0.18364332 -0.83562861 1.59528080 0.32950777 MS> [6] -0.82046838 0.48742905 0.73832471 0.57578135 -0.30538839 MS> [11] 1.51178117 0.38984324 -0.62124058 -2.21469989 1.12493092 MS> [16] -0.04493361 -0.01619026 0.94383621 0.82122120 0.59390132 MS> [21] 0.91897737 0.78213630 0.07456498 -1.98935170 0.61982575 >> percentrank(x, 0.48742905) MS> [1] 0.56 [gives 0.52 in my version of R ] Well, that is *THE SAME* as using ecdf() the way you should have used it : ecdf(x)(0.48742905) {in two lines, that is mypercR <- ecdf(x) mypercR(0.48742905) which maybe easier to understand, if you have never used the nice concept that underlies all of approxfun(), splinefun() or ecdf() } You can also use ecdf(x)(x) and indeed check that it is identical to the convoluted percentrank() function above : > ecdf(x)(0.48742905) [1] 0.52 > ecdf(x)(x) [1] 0.20 0.44 0.12 1.00 0.48 0.16 0.56 0.72 0.60 0.28 0.96 0.52 0.24 0.04 0.92 [16] 0.32 0.36 0.88 0.80 0.64 0.84 0.76 0.40 0.08 0.68 > all(ecdf(x)(x) == sapply(x, function(v) percentrank(x,v))) [1] TRUE > Regards (and apologies for my apparent indignation ;-) by the author of ecdf() , Martin Maechler, ETH Zurich MS> One other approach, which returns the values and their respective rank MS> percentiles is: >> cumsum(prop.table(table(x))) [...... snip ........] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.