Thanks Marc, The sampling is so easy that I often forget that we can do the exact permutation test for smaller samples (and I can never remember when small is small enough for this). With the exact permutations we really don't need to do the prop.test or binom.test, I usually do that to get the confidence interval on the p-value due to sampling from the permutations rather than doing all possible (and this tells me if I need to increase the number of permutations to be sure my p-value is precise enough). With all possible permutations, there is no sampling, and no need for an interval, the p-value is exact.
Thanks again, I need to remember combn. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -----Original Message----- > From: Marc Schwartz [mailto:marc_schwa...@me.com] > Sent: Friday, September 25, 2009 12:17 PM > To: Greg Snow > Cc: John Sorkin; r-help@r-project.org > Subject: Re: [R] Non-parametric test for location with two unpaired > sets of data measured on ordinal scale. > > Greg and John, > > Just to throw it out there, the data sets here are small enough that > you co do a fully enumerable permutation test by replacing your > replicate() call with: > > perms <- combn(17, 9, function(x) median(sets[x]) - median(sets[- > x])) > > This is based on an off-list communication that I had with Peter > Dalgaard about 3 years ago for a different scenario and gives you: > > > choose(17, 9) > [1] 24310 > > permutations. > > > It does not take long: > > > system.time(perms <- combn(17, 9, function(x) median(sets[x]) - > median(sets[-x]))) > user system elapsed > 3.863 0.019 3.898 > > > Which yields: > > > str(perms) > num [1:24310(1d)] -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 ... > > > > table(perms) > perms > -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 > 285 175 2595 7000 8050 875 3720 1425 185 > > > perms <- c(orig,perms) > > prop.test( sum(perms>=orig), length(perms) ) > # or > binom.test(sum(perms >= orig), length(perms)) > > > # Variation on the graphic... > plot(table(perms), type = "h") > abline(v = orig, col = "blue") > > > See ?combn for more information. > > HTH, > > Marc Schwartz > > > > On Sep 25, 2009, at 11:47 AM, Greg Snow wrote: > > > Yes, I agree that the median makes the most sense here, but there > > could be other measures of location that would be of interest > > (quartiles, some version of the rank sum). > > > > Here is some sample code for a permutation test on the medians > > (there are a couple of packages that will do this as well, but this > > is pretty straight forward with straight R code): > > > > set1 <- c(1,3,2,2,4,3,3,2,2) > > set2 <- c(4,4,4,3,3,5,4,4) > > > > sets <- c(set1,set2) > > g1 <- seq_along(set1) > > > > orig <- median( sets[ -g1 ] ) - median( sets[ g1 ] ) > > > > perms <- replicate( 1999, { tmp <- sample(sets) > > median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } ) > > > > # or > > > > pb <- winProgressBar(max=1999) > > > > setWinProgressBar(pb, 0) > > > > perms <- replicate(1999, { setWinProgressBar( pb, > > getWinProgressBar(pb) + 1 ) > > tmp <- sample(sets) > > median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } ) > > > > close(pb) > > > > > > perms <- c(orig,perms) > > sum( perms >= orig ) > > mean( perms >= orig ) > > prop.test( sum(perms>=orig), length(perms) ) > > hist(perms) > > abline(v=orig, col='blue') > > > > (if you want the progress bar on an os other than windows, then use > > the tcltk package and the tkProgressBar). > > > > Hope this helps, > > > > > > -- > > Gregory (Greg) L. Snow Ph.D. > > Statistical Data Center > > Intermountain Healthcare > > greg.s...@imail.org > > 801.408.8111 > > > > > >> -----Original Message----- > >> From: John Sorkin [mailto:jsor...@grecc.umaryland.edu] > >> Sent: Thursday, September 24, 2009 2:52 PM > >> To: Greg Snow; r-help@r-project.org > >> Subject: Re: [R] Non-parametric test for location with two unpaired > >> setsof data measured on ordinal scale. > >> > >> Greg, > >> I used the term location because I did not want to use the terms > mean > >> or median for the exact reason that you gave; these to values can be > >> different in a given distribution. I want to test the null > hypothesis > >> that the data come from a single distribution. This is often done by > >> comparing a measure of location (e.g. mean for ANOVOA), but as you > >> know > >> the mean need not be the only measure of location that is tested. > >> Giiven that my data are measured on an ordinal scale, the mean is > >> without meaning, so I suspect that the best measure for me would be > a > >> comparison of medians, but I am open to other suggestions. > >> John > >> > >> John David Sorkin M.D., Ph.D. > >> Chief, Biostatistics and Informatics > >> University of Maryland School of Medicine Division of Gerontology > >> Baltimore VA Medical Center > >> 10 North Greene Street > >> GRECC (BT/18/GR) > >> Baltimore, MD 21201-1524 > >> (Phone) 410-605-7119 > >> (Fax) 410-605-7913 (Please call phone number above prior to faxing) > >> > >>>>> Greg Snow <greg.s...@imail.org> 9/24/2009 4:30 PM >>> > >> What do you mean by location? I can think of examples where 2 > >> distributions have the same median but different means, or the same > >> means but different medians. > >> > >> Are you willing to assume that the distributions are exactly the > same > >> under the null hypothesis? (not just the same 'center/location') > >> > >> I would probably do a permutation test on the difference between the > >> means or medians (which ever you think is more meaningful), this > >> assumes the exact same distribution under the null. > >> > >> You can also do a Mann-Whitney/Wilcoxin test (but I don't like > >> explaining, or sometimes even thinking about, what it is actually > >> testing), you could do a bootstrap confidence interval on the > >> difference between means/medians (does not assume distributions are > >> the > >> same, just have same mean/median), or you could just replace all > >> values > >> by their ranks and do a t-test (essentially transforms the data to a > >> uniform distribution, the CLT for the uniform kicks in around n=5, > >> but > >> I would simulate just to check). > >> > >> This is not the nice simple answer that you were probably looking > >> for, > >> but hopefully it gives you some things to think about that will > help, > >> > >> -- > >> Gregory (Greg) L. Snow Ph.D. > >> Statistical Data Center > >> Intermountain Healthcare > >> greg.s...@imail.org > >> 801.408.8111 > >> > >> > >>> -----Original Message----- > >>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > >>> project.org] On Behalf Of John Sorkin > >>> Sent: Thursday, September 24, 2009 1:08 PM > >>> To: r-help@r-project.org > >>> Subject: [R] Non-parametric test for location with two unpaired > sets > >> of > >>> data measured on ordinal scale. > >>> > >>> Please forgive a stats question. > >>> > >>> I have to sets of data (unpaired) measured on an ordinal scale. I > >> want > >>> to test to see if the two sets are different (i.e. do they have the > >>> same location): > >>> > >>> set1: 1,3,2,2,4,3,3,2,2 > >>> set: 4,4,4,3,3,5,4,4 > >>> > >>> What is the most appropriate non-parametric test to test location? > >>> > >>> Thanks, > >>> John > >>> > >>> Confidentiality Statement: > >>> This email message, including any attachments, is for > >>> th...{{dropped:6}} > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide http://www.R-project.org/posting- > >>> guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posting- > >> guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > >> Confidentiality Statement: > >> This email message, including any attachments, is for ...{{dropped: > >> 6}} > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.