OK, I think I have it worked out for both cases: library(vcd)
druns <- function(x, n) { # x runs in n data points, not vectorized # based on sample median if( n%%2 ) stop('n must be even') if( x %% 2 ) { # odd x choose( n/2 - 1, n/2-(x+1)/2 ) * choose( n/2 - 1, n/2-(x-1)/2 )/ choose(n,n/2) * 2 } else { # even x choose( n/2 - 1, n/2-x/2 ) * choose( n/2 - 1, n/2-x/2 )/ choose(n,n/2) * 2 } } ## population median out1 <- replicate( 100000, {x <- rnorm(10); length(rle(sign(x))$lengths) } ) rootogram( table(out1), dbinom(0:9, size=9, prob=0.5)*100000 ) chisq.test( table(out1), p=dbinom(0:9, size=9, prob=0.5) ) ## sample median out2 <- replicate( 100000, {x <- rnorm(10); x <- x - median(x); length(rle(sign(x))$lengths) } ) fit <- sapply( 2:10, druns, n=10 ) rootogram( table(out2), fit * 100000 ) chisq.test( table(out2), p=fit ) The tests only fail to prove me wrong, not a firm proof that I am correct. But given the simulation size I am at least pretty close. I can see why people worked out approximations before we had good computers, I would not want to calculate the above by hand. Enjoy, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > project.org] On Behalf Of Greg Snow > Sent: Thursday, February 11, 2010 12:19 PM > To: Dale Steele; R-help@r-project.org > Subject: Re: [R] Code find exact distribution for runs test? > > I am not an expert in this area, but here are some thoughts that may > get you started towards an answer. > > First, there are 2 ways (possibly more) that can lead to the data for a > runs test that lead to different theoretical distributions: > > 1. We have a true or hypothesized value of the median that we > subtracted from the data, therefore each value has 50% probability of > being positive/negative and all are independent of each other (assuming > being exactly equal to the median is impossible or discarded). > > 2. We have subtracted the sample median from each sample value (and > discarded any 0's) leaving us with exactly half positive and half > negative and not having independence. > > In case 1, the 1st observation will always start a run. The second > observation has a 50% chance of being the same sign (F) or different > sign (S), with the probability being 0.5 for each new observation and > them all being independent (under assumption of random selections from > population with known/hypothesized median) and the number of runs > equaling the number of S's, this looks like a binomial to me (with some > '-1's inserted in appropriate places. > > In case 2, this looks like a hypergeometric distribution, there would > be n!/((n/2)!(n/2)!) possible permutations, just need to compute how > many of those permutations result in x runs to get the probability. > There is probably a way to think about this in terms of balls and urns, > but I have not worked that out yet. > > Hope this helps, > > -- > Gregory (Greg) L. Snow Ph.D. > Statistical Data Center > Intermountain Healthcare > greg.s...@imail.org > 801.408.8111 > > > > -----Original Message----- > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > > project.org] On Behalf Of Dale Steele > > Sent: Wednesday, February 10, 2010 6:16 PM > > To: R-help@r-project.org > > Subject: [R] Code find exact distribution for runs test? > > > > I've been attempting to understand the one-sample run test for > > randomness. I've found run.test{tseries} and run.test{lawstat}. > Both > > use a large sample approximation for distribution of the total number > > of runs in a sample of n1 observations of one type and n2 > observations > > of another type. > > > > I've been unable to find R code to generate the exact distribution > and > > would like to see how this could be done (not homework). > > > > For example, given the data: > > > > dtemp <- c(12, 13, 12, 11, 5, 2, -1, 2, -1, 3, 2, -6, -7, -7, -12, - > 9, > > 6, 7, 10, 6, 1, 1, 3, 7, -2, -6, -6, -5, -2, -1) > > > > The Monte Carlo permutation approach seems to get me part way. > > > > > > # calculate the number of runs in the data vector > > nruns <- function(x) { > > signs <- sign(x) > > runs <- rle(signs) > > r <- length(runs$lengths) > > return(r) > > } > > > > MC.runs <- function(x, nperm) { > > RUNS <- numeric(nperm) > > for (i in 1:nperm) { > > RUNS[i] <- nruns(sample(x)) > > } > > cdf <- cumsum(table(RUNS))/nperm > > return(list(RUNS=RUNS, cdf=cdf, nperm=nperm)) > > } > > > > MC.runs(dtemp, 100000) > > > > Thanks. --Dale > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > > guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.