On Feb 14, 2009, at 3:23 AM, Thomas Lumley wrote:
On Fri, 13 Feb 2009, David Winsemius wrote:
I must disagree with both this general characterization of the
Wilcoxon test and with the specific example offered. First, we
ought to spell the author's correctly and then clarify that it is
the Wilcoxon rank-sum test that is being considered. Next, the WRS
test is a test for differences in the location parameter of
independent samples conditional on the samples having been drawn
from the same distribution. The WRS test would have no
discriminatory power for samples drawn from the same distribution
having equal location parameters but only different with respect to
unequal dispersion. Look at the formula, for Pete's sake. It
summarizes differences in ranking, so it is in fact designed NOT to
be sensitive to the spread of the values in the sample. It would
have no power, for instance, to test the variances of two samples,
both with a mean of 0, and one having a variance of 1 with the
other having a variance of 3. One can think of the WRS as a test
for unequal medians.
One can, and it may be helpful to do so, as long as one knows it
isn't actually true. Unfortunately, some text books claim or
strongly imply it is true.
Yes. I have been corrected on that point before, which was why a chose
the words I did. Doing a Google search on "derivation wilcoxon rank-
sum test", the first hit is to a text "Introductory Biostatistics" by
Le that is an example of such a text ... and many others further down
the hit list.
To make the test consistent for differences in the median you have
to know in advance that the distributions differ only by a location
shift, and then it is also consistent for differences in mean (or in
any other location parameter).
That is a typical assumption in the derivation of sampling
distributions of the WRS W-statistic, is it not?
Troendle's article in Statistics and Medicine 18, 2763-2773 (1999)
(would only be available to subscribers and libraries):
http://www3.interscience.wiley.com.online.uchc.edu/journal/66002289/abstract
An interesting on-line accessible discussion by O'Brien and Castellanoe:
http://www.amstat.org/sections/SRMS/Proceedings/y2005/Files/JSM2005-000930.pdf
Googling also brought up a Univ Of Minn website that has r scripts
illustrating permutation tests (including WRS) from Hollander and
Wolfe and a page for the WRS:
http://www.stat.umn.edu/geyer/old/5601/examp/perm.html
http://www.stat.umn.edu/geyer/5601/examp/ranksum.html#test
Also, the operating characteristics aren't particularly similar to a
real test for medians, which has pretty low efficiency at the Normal
location-shift model (2/pi, IIRC) and is much more sensitive to ties
in the data.
My memory from Conover and Iman (only having seen the first edition)
was that the Pittman efficiency of the WRS in the Gaussian case of
unequal means was around 85% relative to the t-test. I suppose the
choice of a central measure for reporting ought to be based on the
purposes of investigation. If one is planning classification, and the
distributions were skewed, then the median might be preferable because
it is less subject to sampling effects:
> var( apply( sapply(1:500, function(x) rlnorm(20)), 2, median))
[1] 0.08123678
>
>
> var( apply( sapply(1:500, function(x) rlnorm(20)), 2, mean))
[1] 0.2168887
Thank you for the clarification.
--
David Winsemius
And I could go on and on about non-transitivity, but I won't. Anyone
who is interested can Google for 'Efron dice'.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlum...@u.washington.edu University of Washington, Seattle
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.