Try this: druns2 <- function(x, n1, n2) { if( x %% 2 ) { # odd x choose( n1 - 1, n1 - (x+1)/2 ) * choose( n2 - 1, n2 - (x-1)/2 ) / choose( n1+n2, n1 ) + choose( n2 - 1, n2 - (x+1)/2 ) * choose( n1 - 1, n1 - (x-1)/2 ) / choose( n1+n2, n2 ) } else { # even x choose( n1 - 1, n1 - x/2 ) * choose( n2 - 1, n2 - x/2 ) / choose( n1 + n2, n1 ) + choose( n2 - 1, n2 - x/2 ) * choose( n1 - 1, n1 - x/2 ) / choose( n1 + n2, n2 ) } }
exp.2 <- sapply( 2:7, druns2, n1=3, n2=4 ) exp.2 - exp.r/factorial(7) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -----Original Message----- > From: Dale Steele [mailto:dale.w.ste...@gmail.com] > Sent: Thursday, February 11, 2010 5:28 PM > To: Greg Snow > Cc: R-help@r-project.org > Subject: Re: [R] Code find exact distribution for runs test? > > Thanks for this! My original query was probably unclear. I think you > have answered a different, possibly more interesting question. My > goal was to find an exact distribution of a total number of runs R in > samples of two types of size (n1, n2) under the null hypothesis of > randomness. > > The horribly inefficient code below generates results which match > Table 10 in Wackerly, Mendehall and Scheaffer for the distribution of > the total number of runs R in samples of size (n1, n2); P(R <= r), > under the null hypothesis of randomness. Horribly inefficient because > couldn't figure out how to generate only the distinguishable > permutations of my sample vector. Hope this brute force approach > illustrates what I am after. > > > nruns <- function(x) { > signs <- sign(x) > runs <- rle(signs) > r <- length(runs$lengths) > return(r) > } > > library(combinat) > # for example (n1=3, n2=4) > n1 <- 3; n2 <- 4; n <- n1+n2 > vect <- rep( c(-1,1), c(3,4)) > vect > > # ! 'nruns(vect)' generates factorial(7) permutations, only > choose(7,3) are distinguishable > > exp.r <- table(unlist(permn(vect, nruns ))) > cumsum(dist/factorial(7)) > > # Result agrees with Table 10, pg 814 (n1=3, n2=4) > #in Wackerly, Mendenhall and Scheaffer, 2002, presumably done using > # exact calculations. > > Thanks. --Dale > > On Thu, Feb 11, 2010 at 6:08 PM, Greg Snow <greg.s...@imail.org> wrote: > > OK, I think I have it worked out for both cases: > > > > library(vcd) > > > > druns <- function(x, n) { # x runs in n data points, not vectorized > > # based on sample > median > > > > if( n%%2 ) stop('n must be even') > > > > if( x %% 2 ) { # odd x > > choose( n/2 - 1, n/2-(x+1)/2 ) * choose( n/2 - 1, n/2- > (x-1)/2 )/ > > choose(n,n/2) * 2 > > } else { # even x > > choose( n/2 - 1, n/2-x/2 ) * choose( n/2 - 1, n/2-x/2 > )/ > > choose(n,n/2) * 2 > > } > > } > > > > ## population median > > out1 <- replicate( 100000, {x <- rnorm(10); > length(rle(sign(x))$lengths) } ) > > > > rootogram( table(out1), dbinom(0:9, size=9, prob=0.5)*100000 ) > > chisq.test( table(out1), p=dbinom(0:9, size=9, prob=0.5) ) > > > > > > ## sample median > > out2 <- replicate( 100000, {x <- rnorm(10); x <- x - median(x); > length(rle(sign(x))$lengths) } ) > > > > fit <- sapply( 2:10, druns, n=10 ) > > > > rootogram( table(out2), fit * 100000 ) > > chisq.test( table(out2), p=fit ) > > > > > > The tests only fail to prove me wrong, not a firm proof that I am > correct. But given the simulation size I am at least pretty close. I > can see why people worked out approximations before we had good > computers, I would not want to calculate the above by hand. > > > > Enjoy, > > > > -- > > Gregory (Greg) L. Snow Ph.D. > > Statistical Data Center > > Intermountain Healthcare > > greg.s...@imail.org > > 801.408.8111 > > > > > >> -----Original Message----- > >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > >> project.org] On Behalf Of Greg Snow > >> Sent: Thursday, February 11, 2010 12:19 PM > >> To: Dale Steele; R-help@r-project.org > >> Subject: Re: [R] Code find exact distribution for runs test? > >> > >> I am not an expert in this area, but here are some thoughts that may > >> get you started towards an answer. > >> > >> First, there are 2 ways (possibly more) that can lead to the data > for a > >> runs test that lead to different theoretical distributions: > >> > >> 1. We have a true or hypothesized value of the median that we > >> subtracted from the data, therefore each value has 50% probability > of > >> being positive/negative and all are independent of each other > (assuming > >> being exactly equal to the median is impossible or discarded). > >> > >> 2. We have subtracted the sample median from each sample value (and > >> discarded any 0's) leaving us with exactly half positive and half > >> negative and not having independence. > >> > >> In case 1, the 1st observation will always start a run. The second > >> observation has a 50% chance of being the same sign (F) or different > >> sign (S), with the probability being 0.5 for each new observation > and > >> them all being independent (under assumption of random selections > from > >> population with known/hypothesized median) and the number of runs > >> equaling the number of S's, this looks like a binomial to me (with > some > >> '-1's inserted in appropriate places. > >> > >> In case 2, this looks like a hypergeometric distribution, there > would > >> be n!/((n/2)!(n/2)!) possible permutations, just need to compute how > >> many of those permutations result in x runs to get the probability. > >> There is probably a way to think about this in terms of balls and > urns, > >> but I have not worked that out yet. > >> > >> Hope this helps, > >> > >> -- > >> Gregory (Greg) L. Snow Ph.D. > >> Statistical Data Center > >> Intermountain Healthcare > >> greg.s...@imail.org > >> 801.408.8111 > >> > >> > >> > -----Original Message----- > >> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > >> > project.org] On Behalf Of Dale Steele > >> > Sent: Wednesday, February 10, 2010 6:16 PM > >> > To: R-help@r-project.org > >> > Subject: [R] Code find exact distribution for runs test? > >> > > >> > I've been attempting to understand the one-sample run test for > >> > randomness. I've found run.test{tseries} and run.test{lawstat}. > >> Both > >> > use a large sample approximation for distribution of the total > number > >> > of runs in a sample of n1 observations of one type and n2 > >> observations > >> > of another type. > >> > > >> > I've been unable to find R code to generate the exact distribution > >> and > >> > would like to see how this could be done (not homework). > >> > > >> > For example, given the data: > >> > > >> > dtemp <- c(12, 13, 12, 11, 5, 2, -1, 2, -1, 3, 2, -6, -7, -7, -12, > - > >> 9, > >> > 6, 7, 10, 6, 1, 1, 3, 7, -2, -6, -6, -5, -2, -1) > >> > > >> > The Monte Carlo permutation approach seems to get me part way. > >> > > >> > > >> > # calculate the number of runs in the data vector > >> > nruns <- function(x) { > >> > signs <- sign(x) > >> > runs <- rle(signs) > >> > r <- length(runs$lengths) > >> > return(r) > >> > } > >> > > >> > MC.runs <- function(x, nperm) { > >> > RUNS <- numeric(nperm) > >> > for (i in 1:nperm) { > >> > RUNS[i] <- nruns(sample(x)) > >> > } > >> > cdf <- cumsum(table(RUNS))/nperm > >> > return(list(RUNS=RUNS, cdf=cdf, nperm=nperm)) > >> > } > >> > > >> > MC.runs(dtemp, 100000) > >> > > >> > Thanks. --Dale > >> > > >> > ______________________________________________ > >> > R-help@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide http://www.R-project.org/posting- > >> > guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posting- > >> guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.