On Feb 14, 2009, at 3:23 AM, Thomas Lumley wrote:

On Fri, 13 Feb 2009, David Winsemius wrote:

I must disagree with both this general characterization of the Wilcoxon test and with the specific example offered. First, we ought to spell the author's correctly and then clarify that it is the Wilcoxon rank-sum test that is being considered. Next, the WRS test is a test for differences in the location parameter of independent samples conditional on the samples having been drawn from the same distribution. The WRS test would have no discriminatory power for samples drawn from the same distribution having equal location parameters but only different with respect to unequal dispersion. Look at the formula, for Pete's sake. It summarizes differences in ranking, so it is in fact designed NOT to be sensitive to the spread of the values in the sample. It would have no power, for instance, to test the variances of two samples, both with a mean of 0, and one having a variance of 1 with the other having a variance of 3. One can think of the WRS as a test for unequal medians.


One can, and it may be helpful to do so, as long as one knows it isn't actually true. Unfortunately, some text books claim or strongly imply it is true.

Yes. I have been corrected on that point before, which was why a chose the words I did. Doing a Google search on "derivation wilcoxon rank- sum test", the first hit is to a text "Introductory Biostatistics" by Le that is an example of such a text ... and many others further down the hit list.

To make the test consistent for differences in the median you have to know in advance that the distributions differ only by a location shift, and then it is also consistent for differences in mean (or in any other location parameter).

That is a typical assumption in the derivation of sampling distributions of the WRS W-statistic, is it not?

Troendle's article in Statistics and Medicine 18, 2763-2773 (1999) (would only be available to subscribers and libraries):
http://www3.interscience.wiley.com.online.uchc.edu/journal/66002289/abstract

An interesting on-line accessible discussion by O'Brien and Castellanoe:
http://www.amstat.org/sections/SRMS/Proceedings/y2005/Files/JSM2005-000930.pdf

Googling also brought up a Univ Of Minn website that has r scripts illustrating permutation tests (including WRS) from Hollander and Wolfe and a page for the WRS:

http://www.stat.umn.edu/geyer/old/5601/examp/perm.html

http://www.stat.umn.edu/geyer/5601/examp/ranksum.html#test

Also, the operating characteristics aren't particularly similar to a real test for medians, which has pretty low efficiency at the Normal location-shift model (2/pi, IIRC) and is much more sensitive to ties in the data.

My memory from Conover and Iman (only having seen the first edition) was that the Pittman efficiency of the WRS in the Gaussian case of unequal means was around 85% relative to the t-test. I suppose the choice of a central measure for reporting ought to be based on the purposes of investigation. If one is planning classification, and the distributions were skewed, then the median might be preferable because it is less subject to sampling effects:

> var( apply( sapply(1:500, function(x) rlnorm(20)), 2, median))
[1] 0.08123678
>
>
> var( apply( sapply(1:500, function(x) rlnorm(20)), 2, mean))
[1] 0.2168887

Thank you for the clarification.

--
David Winsemius




And I could go on and on about non-transitivity, but I won't. Anyone who is interested can Google for 'Efron dice'.

      -thomas


Thomas Lumley                   Assoc. Professor, Biostatistics
tlum...@u.washington.edu        University of Washington, Seattle



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to