Here is one quick way using the combinat package: > library(combinat) > > tmpfun <- function(x) { + tmp <- rep(1,5) + tmp[x] <- -1 + tmp + } > > combn(5,2, tmpfun) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] -1 -1 -1 -1 1 1 1 1 1 1 [2,] -1 1 1 1 -1 -1 -1 1 1 1 [3,] 1 -1 1 1 -1 1 1 -1 -1 1 [4,] 1 1 -1 1 1 -1 1 -1 1 -1 [5,] 1 1 1 -1 1 1 -1 1 -1 -1 >
Of course in this case the tmpfun function needs to be rewritten for each vector size, so is not generalizable. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -----Original Message----- > From: Dale Steele [mailto:dale.w.ste...@gmail.com] > Sent: Friday, February 12, 2010 6:34 AM > To: Greg Snow > Cc: R-help@r-project.org > Subject: Re: [R] Code find exact distribution for runs test? > > Wow! Your reply amply demonstrates the power of understanding the > math (or finding someone who does) before turning on the computer. > > One last R question...how could I efficiently enumerate all > distinguishable permutations of a vector of signs? > > vect <- c( -1, -1, 1, 1, 1) > > permn(vect) #length: factorial(length(vect)) > ???? #length: choose(n, n1) > > Best Regards. --Dale > > > On Thu, Feb 11, 2010 at 10:04 PM, Greg Snow <greg.s...@imail.org> > wrote: > > Try this: > > > > druns2 <- function(x, n1, n2) { > > > > if( x %% 2 ) { # odd x > > choose( n1 - 1, n1 - (x+1)/2 ) * choose( n2 - 1, n2 - > (x-1)/2 ) / choose( n1+n2, n1 ) + > > choose( n2 - 1, n2 - (x+1)/2 ) * choose( n1 - 1, n1 - > (x-1)/2 ) / choose( n1+n2, n2 ) > > } else { # even x > > choose( n1 - 1, n1 - x/2 ) * choose( n2 - 1, n2 - x/2 > ) / choose( n1 + n2, n1 ) + > > choose( n2 - 1, n2 - x/2 ) * choose( n1 - 1, n1 - x/2 > ) / choose( n1 + n2, n2 ) > > } > > } > > > > exp.2 <- sapply( 2:7, druns2, n1=3, n2=4 ) > > exp.2 - exp.r/factorial(7) > > > > > > > > -- > > Gregory (Greg) L. Snow Ph.D. > > Statistical Data Center > > Intermountain Healthcare > > greg.s...@imail.org > > 801.408.8111 > > > > > >> -----Original Message----- > >> From: Dale Steele [mailto:dale.w.ste...@gmail.com] > >> Sent: Thursday, February 11, 2010 5:28 PM > >> To: Greg Snow > >> Cc: R-help@r-project.org > >> Subject: Re: [R] Code find exact distribution for runs test? > >> > >> Thanks for this! My original query was probably unclear. I think > you > >> have answered a different, possibly more interesting question. My > >> goal was to find an exact distribution of a total number of runs R > in > >> samples of two types of size (n1, n2) under the null hypothesis of > >> randomness. > >> > >> The horribly inefficient code below generates results which match > >> Table 10 in Wackerly, Mendehall and Scheaffer for the distribution > of > >> the total number of runs R in samples of size (n1, n2); P(R <= r), > >> under the null hypothesis of randomness. Horribly inefficient > because > >> couldn't figure out how to generate only the distinguishable > >> permutations of my sample vector. Hope this brute force approach > >> illustrates what I am after. > >> > >> > >> nruns <- function(x) { > >> signs <- sign(x) > >> runs <- rle(signs) > >> r <- length(runs$lengths) > >> return(r) > >> } > >> > >> library(combinat) > >> # for example (n1=3, n2=4) > >> n1 <- 3; n2 <- 4; n <- n1+n2 > >> vect <- rep( c(-1,1), c(3,4)) > >> vect > >> > >> # ! 'nruns(vect)' generates factorial(7) permutations, only > >> choose(7,3) are distinguishable > >> > >> exp.r <- table(unlist(permn(vect, nruns ))) > >> cumsum(dist/factorial(7)) > >> > >> # Result agrees with Table 10, pg 814 (n1=3, n2=4) > >> #in Wackerly, Mendenhall and Scheaffer, 2002, presumably done using > >> # exact calculations. > >> > >> Thanks. --Dale > >> > >> On Thu, Feb 11, 2010 at 6:08 PM, Greg Snow <greg.s...@imail.org> > wrote: > >> > OK, I think I have it worked out for both cases: > >> > > >> > library(vcd) > >> > > >> > druns <- function(x, n) { # x runs in n data points, not > vectorized > >> > # based on sample > >> median > >> > > >> > if( n%%2 ) stop('n must be even') > >> > > >> > if( x %% 2 ) { # odd x > >> > choose( n/2 - 1, n/2-(x+1)/2 ) * choose( n/2 - 1, > n/2- > >> (x-1)/2 )/ > >> > choose(n,n/2) * 2 > >> > } else { # even x > >> > choose( n/2 - 1, n/2-x/2 ) * choose( n/2 - 1, n/2- > x/2 > >> )/ > >> > choose(n,n/2) * 2 > >> > } > >> > } > >> > > >> > ## population median > >> > out1 <- replicate( 100000, {x <- rnorm(10); > >> length(rle(sign(x))$lengths) } ) > >> > > >> > rootogram( table(out1), dbinom(0:9, size=9, prob=0.5)*100000 ) > >> > chisq.test( table(out1), p=dbinom(0:9, size=9, prob=0.5) ) > >> > > >> > > >> > ## sample median > >> > out2 <- replicate( 100000, {x <- rnorm(10); x <- x - median(x); > >> length(rle(sign(x))$lengths) } ) > >> > > >> > fit <- sapply( 2:10, druns, n=10 ) > >> > > >> > rootogram( table(out2), fit * 100000 ) > >> > chisq.test( table(out2), p=fit ) > >> > > >> > > >> > The tests only fail to prove me wrong, not a firm proof that I am > >> correct. But given the simulation size I am at least pretty close. > I > >> can see why people worked out approximations before we had good > >> computers, I would not want to calculate the above by hand. > >> > > >> > Enjoy, > >> > > >> > -- > >> > Gregory (Greg) L. Snow Ph.D. > >> > Statistical Data Center > >> > Intermountain Healthcare > >> > greg.s...@imail.org > >> > 801.408.8111 > >> > > >> > > >> >> -----Original Message----- > >> >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > >> >> project.org] On Behalf Of Greg Snow > >> >> Sent: Thursday, February 11, 2010 12:19 PM > >> >> To: Dale Steele; R-help@r-project.org > >> >> Subject: Re: [R] Code find exact distribution for runs test? > >> >> > >> >> I am not an expert in this area, but here are some thoughts that > may > >> >> get you started towards an answer. > >> >> > >> >> First, there are 2 ways (possibly more) that can lead to the data > >> for a > >> >> runs test that lead to different theoretical distributions: > >> >> > >> >> 1. We have a true or hypothesized value of the median that we > >> >> subtracted from the data, therefore each value has 50% > probability > >> of > >> >> being positive/negative and all are independent of each other > >> (assuming > >> >> being exactly equal to the median is impossible or discarded). > >> >> > >> >> 2. We have subtracted the sample median from each sample value > (and > >> >> discarded any 0's) leaving us with exactly half positive and half > >> >> negative and not having independence. > >> >> > >> >> In case 1, the 1st observation will always start a run. The > second > >> >> observation has a 50% chance of being the same sign (F) or > different > >> >> sign (S), with the probability being 0.5 for each new observation > >> and > >> >> them all being independent (under assumption of random selections > >> from > >> >> population with known/hypothesized median) and the number of runs > >> >> equaling the number of S's, this looks like a binomial to me > (with > >> some > >> >> '-1's inserted in appropriate places. > >> >> > >> >> In case 2, this looks like a hypergeometric distribution, there > >> would > >> >> be n!/((n/2)!(n/2)!) possible permutations, just need to compute > how > >> >> many of those permutations result in x runs to get the > probability. > >> >> There is probably a way to think about this in terms of balls and > >> urns, > >> >> but I have not worked that out yet. > >> >> > >> >> Hope this helps, > >> >> > >> >> -- > >> >> Gregory (Greg) L. Snow Ph.D. > >> >> Statistical Data Center > >> >> Intermountain Healthcare > >> >> greg.s...@imail.org > >> >> 801.408.8111 > >> >> > >> >> > >> >> > -----Original Message----- > >> >> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > >> >> > project.org] On Behalf Of Dale Steele > >> >> > Sent: Wednesday, February 10, 2010 6:16 PM > >> >> > To: R-help@r-project.org > >> >> > Subject: [R] Code find exact distribution for runs test? > >> >> > > >> >> > I've been attempting to understand the one-sample run test for > >> >> > randomness. I've found run.test{tseries} and > run.test{lawstat}. > >> >> Both > >> >> > use a large sample approximation for distribution of the total > >> number > >> >> > of runs in a sample of n1 observations of one type and n2 > >> >> observations > >> >> > of another type. > >> >> > > >> >> > I've been unable to find R code to generate the exact > distribution > >> >> and > >> >> > would like to see how this could be done (not homework). > >> >> > > >> >> > For example, given the data: > >> >> > > >> >> > dtemp <- c(12, 13, 12, 11, 5, 2, -1, 2, -1, 3, 2, -6, -7, -7, - > 12, > >> - > >> >> 9, > >> >> > 6, 7, 10, 6, 1, 1, 3, 7, -2, -6, -6, -5, -2, -1) > >> >> > > >> >> > The Monte Carlo permutation approach seems to get me part way. > >> >> > > >> >> > > >> >> > # calculate the number of runs in the data vector > >> >> > nruns <- function(x) { > >> >> > signs <- sign(x) > >> >> > runs <- rle(signs) > >> >> > r <- length(runs$lengths) > >> >> > return(r) > >> >> > } > >> >> > > >> >> > MC.runs <- function(x, nperm) { > >> >> > RUNS <- numeric(nperm) > >> >> > for (i in 1:nperm) { > >> >> > RUNS[i] <- nruns(sample(x)) > >> >> > } > >> >> > cdf <- cumsum(table(RUNS))/nperm > >> >> > return(list(RUNS=RUNS, cdf=cdf, nperm=nperm)) > >> >> > } > >> >> > > >> >> > MC.runs(dtemp, 100000) > >> >> > > >> >> > Thanks. --Dale > >> >> > > >> >> > ______________________________________________ > >> >> > R-help@r-project.org mailing list > >> >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> >> > PLEASE do read the posting guide http://www.R- > project.org/posting- > >> >> > guide.html > >> >> > and provide commented, minimal, self-contained, reproducible > code. > >> >> > >> >> ______________________________________________ > >> >> R-help@r-project.org mailing list > >> >> https://stat.ethz.ch/mailman/listinfo/r-help > >> >> PLEASE do read the posting guide http://www.R- > project.org/posting- > >> >> guide.html > >> >> and provide commented, minimal, self-contained, reproducible > code. > >> > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.