[R] CDF of Sample Quantile

Bentley Coffey Mon, 07 Mar 2011 10:33:18 -0800

Just to tie up this thread, I wanted to report my solution:

When (n-1)p is an integer, there is a closed form solution:
pbinom(j-1,n,...)


When it is not an integer, its fairly easy to approximate the solution by
interpolating between the closed-form solutions: fitting log(1 - probability
from closed form solution) on an orthogonal polynomial in n.  This is a
_very_ fast and fairly accurate solution.

Thanks to all who offered their help...

On Thu, Feb 17, 2011 at 11:11 PM, Bentley Coffey
<bentleygcof...@gmail.com>wrote:

>
> Duncan,
>
> I'm not sure how I missed your message. Sorry. What you describe is what I
> do when (n-1)p is an integer so that R computes the sample quantile using a
> single order statistic. My later posting has that exact binomial expression
> in there as a special case. When (n-1)p is not an integer then R uses a
> weighted average of 2 order statistics, in which case I'm left with my
> standing problem...
>
>
> On Mon, Feb 14, 2011 at 2:26 PM, Duncan Murdoch 
> <murdoch.dun...@gmail.com>wrote:
>
>> On 14/02/2011 9:58 AM, Bentley Coffey wrote:
>>
>>> I need to calculate the probability that a sample quantile will exceed a
>>> threshold given the size of the iid sample and the parameters describing
>>> the
>>> distribution of each observation (normal, in my case). I can compute the
>>> probability with brute force simulation: simulate a size N sample, apply
>>> R's
>>> quantile() function on it, compare it to the threshold, replicate this
>>> MANY
>>> times, and count the number of times the sample quantile exceeded the
>>> threshold (dividing by the total number of replications yields the
>>> probability of interest). The problem is that the number of replications
>>> required to get sufficient precision (3 digits say) is so HUGE that this
>>> takes FOREVER. I have to perform this task so much in my script
>>> (searching
>>> over the sample size and repeated for several different distribution
>>> parameters) that it takes too many hours to run.
>>>
>>> I've searched for pre-existing code to do this in R and haven't found
>>> anything. Perhaps I'm missing something. Is anyone aware of an R function
>>> to
>>> compute this probability?
>>>
>>> I've tried writing my own code using the fact that R's quantile()
>>> function
>>> is a linear combination of 2 order statistics. Basically, I wrote down
>>> the
>>> mathematical form of the joint pdf for the 2 order statistics (a function
>>> of
>>> the sample size and the distribution parameters) then performed a
>>> pseudo-Monte Carlo integration (i.e. using Halton Draws rather than R's
>>> random draws) over the region where the sample quantile exceeds the
>>> threshold. In theory, this should work and it takes about 1000 times
>>> fewer
>>> clock cycles to compute than the Brute Force approach. My problem is that
>>> there is a significant discrepancy between the results using Brute Force
>>> and
>>> using this more efficient approach that I have coded up. I believe that
>>> the
>>> problem is numerical error but it could be some programming bug;
>>> regardless,
>>> I have been unable to locate the source of this problem and have spent
>>> over
>>> 20 hours trying to identify it this weekend. Please, somebody help!!!
>>>
>>> So, again, my question: is there code in R for quickly evaluating the CDF
>>> of
>>> a Sample Quantile given the sample size and the parameters governing the
>>> distribution of each iid point in the sample?
>>>
>>
>> I think the answer to your question is no, but I think it's the wrong
>> question.  Suppose Xm:n is the mth sample quantile in a sample of size n,
>> and you want to calculate P(Xm:n > x).  Let X be a draw from the underlying
>> distribution, and suppose P(X > x) = p.  Then the event Xm:n > x
>> is the same as the event that out of n independent draws of X, at least
>> n-m+1 are bigger than x:  a binomial probability.  And R can calculate
>> binomial probabilities using pbinom().
>>
>> Duncan Murdoch
>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] CDF of Sample Quantile

Reply via email to