Greg and John,

Just to throw it out there, the data sets here are small enough that you co do a fully enumerable permutation test by replacing your replicate() call with:

  perms <- combn(17, 9, function(x) median(sets[x]) - median(sets[-x]))

This is based on an off-list communication that I had with Peter Dalgaard about 3 years ago for a different scenario and gives you:

> choose(17, 9)
[1] 24310

permutations.


It does not take long:

> system.time(perms <- combn(17, 9, function(x) median(sets[x]) - median(sets[-x])))
   user  system elapsed
  3.863   0.019   3.898


Which yields:

> str(perms)
 num [1:24310(1d)] -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...


> table(perms)
perms
  -2 -1.5   -1 -0.5    0  0.5    1  1.5    2
 285  175 2595 7000 8050  875 3720 1425  185


perms <- c(orig,perms)

prop.test( sum(perms>=orig), length(perms) )
# or
binom.test(sum(perms >= orig), length(perms))


# Variation on the graphic...
plot(table(perms), type = "h")
abline(v = orig, col = "blue")


See ?combn for more information.

HTH,

Marc Schwartz



On Sep 25, 2009, at 11:47 AM, Greg Snow wrote:

Yes, I agree that the median makes the most sense here, but there could be other measures of location that would be of interest (quartiles, some version of the rank sum).

Here is some sample code for a permutation test on the medians (there are a couple of packages that will do this as well, but this is pretty straight forward with straight R code):

set1 <- c(1,3,2,2,4,3,3,2,2)
set2 <- c(4,4,4,3,3,5,4,4)

sets <- c(set1,set2)
g1 <- seq_along(set1)

orig <- median( sets[ -g1 ] ) - median( sets[ g1 ] )

perms <- replicate( 1999, { tmp <- sample(sets)
        median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } )

#  or

pb <- winProgressBar(max=1999)

setWinProgressBar(pb, 0)

perms <- replicate(1999, { setWinProgressBar( pb, getWinProgressBar(pb) + 1 )
        tmp <- sample(sets)
        median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } )

close(pb)


perms <- c(orig,perms)
sum( perms >= orig )
mean( perms >= orig )
prop.test( sum(perms>=orig), length(perms) )
hist(perms)
abline(v=orig, col='blue')

(if you want the progress bar on an os other than windows, then use the tcltk package and the tkProgressBar).

Hope this helps,


--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


-----Original Message-----
From: John Sorkin [mailto:jsor...@grecc.umaryland.edu]
Sent: Thursday, September 24, 2009 2:52 PM
To: Greg Snow; r-help@r-project.org
Subject: Re: [R] Non-parametric test for location with two unpaired
setsof data measured on ordinal scale.

Greg,
I used the term location because I did not want to use the terms mean
or median for the exact reason that you gave; these to values can be
different in a given distribution. I want to test the null hypothesis
that the data come from a single distribution. This is often done by
comparing a measure of location (e.g. mean for ANOVOA), but as you know
the mean need not be the only measure of location that is tested.
Giiven that my data are measured on an ordinal scale, the mean is
without meaning, so I suspect that the best measure for me would be a
comparison of medians, but I am open to other suggestions.
John

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Greg Snow <greg.s...@imail.org> 9/24/2009 4:30 PM >>>
What do you mean by location?  I can think of examples where 2
distributions have the same median but different means, or the same
means but different medians.

Are you willing to assume that the distributions are exactly the same
under the null hypothesis? (not just the same 'center/location')

I would probably do a permutation test on the difference between the
means or medians (which ever you think is more meaningful), this
assumes the exact same distribution under the null.

You can also do a Mann-Whitney/Wilcoxin test (but I don't like
explaining, or sometimes even thinking about, what it is actually
testing), you could do a bootstrap confidence interval on the
difference between means/medians (does not assume distributions are the same, just have same mean/median), or you could just replace all values
by their ranks and do a t-test (essentially transforms the data to a
uniform distribution, the CLT for the uniform kicks in around n=5, but
I would simulate just to check).

This is not the nice simple answer that you were probably looking for,
but hopefully it gives you some things to think about that will help,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
project.org] On Behalf Of John Sorkin
Sent: Thursday, September 24, 2009 1:08 PM
To: r-help@r-project.org
Subject: [R] Non-parametric test for location with two unpaired sets
of
data measured on ordinal scale.

Please forgive a stats question.

I have to sets of data (unpaired) measured on an ordinal scale. I
want
to test to see if the two sets are different (i.e. do they have the
same location):

set1: 1,3,2,2,4,3,3,2,2
set:   4,4,4,3,3,5,4,4

What is the most appropriate non-parametric test to test location?

Thanks,
John

Confidentiality Statement:
This email message, including any attachments, is for
th...{{dropped:6}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is for ...{{dropped: 6}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to