Re: [R] Code find exact distribution for runs test?

Greg Snow Thu, 11 Feb 2010 19:06:02 -0800

Try this:

druns2 <- function(x, n1, n2) {
        
        if( x %% 2 ) { # odd x
                choose( n1 - 1, n1 - (x+1)/2 ) * choose( n2 - 1, n2 - (x-1)/2 ) 
/ choose( n1+n2, n1 ) + 
                choose( n2 - 1, n2 - (x+1)/2 ) * choose( n1 - 1, n1 - (x-1)/2 ) 
/ choose( n1+n2, n2 )
        } else { # even x
                choose( n1 - 1, n1 - x/2 ) * choose( n2 - 1, n2 - x/2 ) / 
choose( n1 + n2, n1 ) +
                choose( n2 - 1, n2 - x/2 ) * choose( n1 - 1, n1 - x/2 ) / 
choose( n1 + n2, n2 )
        }
}


exp.2 <- sapply( 2:7, druns2, n1=3, n2=4 )
exp.2 - exp.r/factorial(7)



-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -----Original Message-----
> From: Dale Steele [mailto:dale.w.ste...@gmail.com]
> Sent: Thursday, February 11, 2010 5:28 PM
> To: Greg Snow
> Cc: R-help@r-project.org
> Subject: Re: [R] Code find exact distribution for runs test?
> 
> Thanks for this!  My original query was probably unclear.  I think you
> have answered a different, possibly more interesting question.  My
> goal was to find an exact distribution of a total number of runs R in
> samples of two types of size (n1, n2) under the null hypothesis of
> randomness.
> 
> The horribly inefficient code below generates results which match
> Table 10 in Wackerly, Mendehall and Scheaffer for the distribution of
> the total number of runs R in samples of  size (n1, n2); P(R <= r),
> under the null hypothesis of randomness.  Horribly inefficient because
> couldn't figure out how to generate only the distinguishable
> permutations of my sample vector. Hope this brute force approach
> illustrates what I am after.
> 
> 
> nruns <- function(x) {
>     signs <- sign(x)
>     runs <- rle(signs)
>     r <- length(runs$lengths)
>     return(r)
> }
> 
> library(combinat)
> # for example (n1=3, n2=4)
> n1 <- 3;  n2 <- 4; n <- n1+n2
> vect <- rep( c(-1,1), c(3,4))
> vect
> 
> # ! 'nruns(vect)' generates factorial(7) permutations, only
> choose(7,3) are distinguishable
> 
> exp.r <- table(unlist(permn(vect, nruns )))
> cumsum(dist/factorial(7))
> 
> # Result agrees with Table 10, pg 814 (n1=3, n2=4)
> #in Wackerly, Mendenhall and Scheaffer, 2002, presumably done using
> # exact calculations.
> 
> Thanks.  --Dale
> 
> On Thu, Feb 11, 2010 at 6:08 PM, Greg Snow <greg.s...@imail.org> wrote:
> > OK, I think I have it worked out for both cases:
> >
> > library(vcd)
> >
> > druns <- function(x, n) { # x runs in n data points, not vectorized
> >                                                # based on sample
> median
> >
> >        if( n%%2 ) stop('n must be even')
> >
> >        if( x %% 2 ) { # odd x
> >                choose( n/2 - 1, n/2-(x+1)/2 ) * choose( n/2 - 1, n/2-
> (x-1)/2 )/
> >                                choose(n,n/2) * 2
> >        } else {                # even x
> >                choose( n/2 - 1, n/2-x/2 ) * choose( n/2 - 1, n/2-x/2
> )/
> >                                choose(n,n/2) * 2
> >        }
> > }
> >
> > ## population median
> > out1 <- replicate( 100000, {x <- rnorm(10);
> length(rle(sign(x))$lengths) } )
> >
> > rootogram( table(out1), dbinom(0:9, size=9, prob=0.5)*100000 )
> > chisq.test( table(out1), p=dbinom(0:9, size=9, prob=0.5) )
> >
> >
> > ## sample median
> > out2 <- replicate( 100000, {x <- rnorm(10); x <- x - median(x);
> length(rle(sign(x))$lengths) } )
> >
> > fit <- sapply( 2:10, druns, n=10 )
> >
> > rootogram( table(out2), fit * 100000 )
> > chisq.test( table(out2), p=fit )
> >
> >
> > The tests only fail to prove me wrong, not a firm proof that I am
> correct.  But given the simulation size I am at least pretty close.  I
> can see why people worked out approximations before we had good
> computers, I would not want to calculate the above by hand.
> >
> > Enjoy,
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.s...@imail.org
> > 801.408.8111
> >
> >
> >> -----Original Message-----
> >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> >> project.org] On Behalf Of Greg Snow
> >> Sent: Thursday, February 11, 2010 12:19 PM
> >> To: Dale Steele; R-help@r-project.org
> >> Subject: Re: [R] Code find exact distribution for runs test?
> >>
> >> I am not an expert in this area, but here are some thoughts that may
> >> get you started towards an answer.
> >>
> >> First, there are 2 ways (possibly more) that can lead to the data
> for a
> >> runs test that lead to different theoretical distributions:
> >>
> >> 1. We have a true or hypothesized value of the median that we
> >> subtracted from the data, therefore each value has 50% probability
> of
> >> being positive/negative and all are independent of each other
> (assuming
> >> being exactly equal to the median is impossible or discarded).
> >>
> >> 2. We have subtracted the sample median from each sample value (and
> >> discarded any 0's) leaving us with exactly half positive and half
> >> negative and not having independence.
> >>
> >> In case 1, the 1st observation will always start a run.  The second
> >> observation has a 50% chance of being the same sign (F) or different
> >> sign (S), with the probability being 0.5 for each new observation
> and
> >> them all being independent (under assumption of random selections
> from
> >> population with known/hypothesized median) and the number of runs
> >> equaling the number of S's, this looks like a binomial to me (with
> some
> >> '-1's inserted in appropriate places.
> >>
> >> In case 2, this looks like a hypergeometric distribution, there
> would
> >> be n!/((n/2)!(n/2)!) possible permutations, just need to compute how
> >> many of those permutations result in x runs to get the probability.
> >> There is probably a way to think about this in terms of balls and
> urns,
> >> but I have not worked that out yet.
> >>
> >> Hope this helps,
> >>
> >> --
> >> Gregory (Greg) L. Snow Ph.D.
> >> Statistical Data Center
> >> Intermountain Healthcare
> >> greg.s...@imail.org
> >> 801.408.8111
> >>
> >>
> >> > -----Original Message-----
> >> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> >> > project.org] On Behalf Of Dale Steele
> >> > Sent: Wednesday, February 10, 2010 6:16 PM
> >> > To: R-help@r-project.org
> >> > Subject: [R] Code find exact distribution for runs test?
> >> >
> >> > I've been attempting to understand the one-sample run test for
> >> > randomness.  I've found run.test{tseries} and run.test{lawstat}.
> >> Both
> >> > use a large sample approximation for distribution of the total
> number
> >> > of runs in a sample of n1 observations of one type and n2
> >> observations
> >> > of another type.
> >> >
> >> > I've been unable to find R code to generate the exact distribution
> >> and
> >> > would like to see how this could be done (not homework).
> >> >
> >> > For example, given the data:
> >> >
> >> > dtemp <- c(12, 13, 12, 11, 5, 2, -1, 2, -1, 3, 2, -6, -7, -7, -12,
> -
> >> 9,
> >> > 6, 7, 10, 6, 1, 1, 3, 7, -2, -6, -6, -5, -2, -1)
> >> >
> >> > The Monte Carlo permutation approach seems to get me part way.
> >> >
> >> >
> >> > # calculate the number of runs in the data vector
> >> > nruns <- function(x) {
> >> >     signs <- sign(x)
> >> >     runs <- rle(signs)
> >> >     r <- length(runs$lengths)
> >> >     return(r)
> >> > }
> >> >
> >> > MC.runs <- function(x, nperm) {
> >> > RUNS <- numeric(nperm)
> >> > for (i in  1:nperm) {
> >> >     RUNS[i] <- nruns(sample(x))
> >> > }
> >> >     cdf <- cumsum(table(RUNS))/nperm
> >> >     return(list(RUNS=RUNS, cdf=cdf, nperm=nperm))
> >> > }
> >> >
> >> > MC.runs(dtemp, 100000)
> >> >
> >> > Thanks.  --Dale
> >> >
> >> > ______________________________________________
> >> > R-help@r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide http://www.R-project.org/posting-
> >> > guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >> ______________________________________________
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-
> >> guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Code find exact distribution for runs test?

Reply via email to