On Apr 13, 2011, at 01:57 , Michael G Rupert wrote: > I have a question concerning the Wilcoxon signed-rank test, and > specifically, which R subroutine I should use for my particular dataset. > There are three different commands in R (that I'm aware of) that calculate > the Wilcoxon signed-rank test; wilcox.test, wilcox.exact, and > wilcoxsign_test. When I run the three commands on the same dataset, I get > different p-values. I'm hoping that someone can give me guidance on the > strengths and weaknesses of each command, why they produce different > p-values, and which one is the most appropriate for my particular needs.
Well, there are two version of zero-handling, and for each of these, you can have exact p values or asymptotic p values with or without continuity correction, so that's 6 possibilities already. > > First, let me describe the dataset I am working with. The project I am > working on collected water samples from groups/networks of about 30 water > wells and analyzed them for nitrate, major ions, and other chemical > constituents. We revisited those same wells about 10 years later and > analyzed the water samples for the same chemical constituents. I now have > a paired dataset, and the question I would like to answer is whether there > was a "significant" change in concentrations of those chemical > constituents (such as nitrate or chloride). Concentrations measured in > water from some wells have increased, some have decreased, and some have > stayed the same over the ten-year time period. In water from some wells, > the concentrations were below the laboratory detection limits, so those > concentrations are "tied" at the reporting level. The following is an > example of the data I am evaluating. > > x <- c(13.60, 9.10, 22.01, 9.08, 1.97, 2.81, 0.66, 0.97, 0.21, 2.23, 0.08, > 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, > 0.06, 0.06, 3.44, 15.18, 5.25, 4.27, 17.81) > y <- c( 4.32, 3.39, 16.36, 7.10, 0.08, 2.02, 0.19, 0.59, 0.06, 2.15, 0.06, > 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, > 0.06, 0.06, 4.02, 16.13, 7.30, 7.98, 24.37) > > The nonparametric Wilcoxon signed-rank test seems to be the most > appropriate test for these data. There are two different methods to > calculate the signed-rank test. The first is by Wilcoxon (1945), who > discards any tied data and then calculates the signed ranks. The second > method incorporates tied values in the ranking procedure (see J.W. Pratt, > 1959, Remarks on zeros and ties in the Wilcoxon signed rank procedure: > Journal of the American Statistical Association, Vol. 54, No. 287, pp. > 655-667). There are two commands in R that calculate the original method > by Wilcoxon (that I know of), wilcox.test and wilcoxsign_test (make sure > to include the argument "zero.method = c("Wilcoxon")"). There are two > other commands in R that incorporate ties in the signed-rank test, > wilcox.exact and wilcoxsign_test (make sure to include the > argument"zero.method = c("Pratt")"). > > Here's my problem. I get different p-values from each of the 4 signed-rank > tests in R, and I don't know which one to believe. Wilcox.test and > wilcoxsign_test(zero.method = c("Wilcoxon") calculate the standard > Wilcoxon signed-rank test. Even though they are not designed to deal with > tied data, they should at least calculate the same p-value, but they do > not. They do if you turn off the continuity correction in wilcox.test: > wilcox.test(x, y, alternative='two.sided', paired=TRUE, correct=F) Wilcoxon signed rank test data: x and y V = 39, p-value = 0.05061 alternative hypothesis: true location shift is not equal to 0 > I ran the same datasets in SYSTAT and Minitab to check on the results > from R. Minitab gives the same results as wilcox.test, and SYSTAT gives > the same results as wilcoxsign_test(zero.method = c("Wilcoxon"). So one does continuity correction and the other not. > > Similarly, wilcox.exact and wilcoxsign_test(zero.method = c("Pratt")) are > designed to incorporate ties, but they give different p-values from each > other. They still handle zeros differently. wilcox.exact does not handle the Pratt ranking. To get exact p values for Pratt ranks, try > perm.test(c(-3,-4,-5,6:11)) 1-sample Permutation Test data: c(-3, -4, -5, 6:11) T = 51, p-value = 0.08984 alternative hypothesis: true mu is not equal to 0 ... and for the asymptotic counterpart: > perm.test(c(-3,-4,-5,6:11), exact=F) Asymptotic 1-sample Permutation Test data: c(-3, -4, -5, 6:11) T = 51, p-value = 0.08144 alternative hypothesis: true mu is not equal to 0 > The signed-rank test procedure is relatively straightforward, so > I'm surprised I'm not getting identical results. > > To check on these R commands, I calculated the signed-rank tests using the > dataset shown on page 658-659 of Pratt (1959). Not found. Apparently, you _constructed_ a data set to get the same set of ranks. > These R routines do not > produce the same results as that listed in Pratt, which makes me think > that the R routines are not calculating the statistics correctly. Apparently, the Pratt paper predates the convention that a p value is the probability of observing "the test statistic or more extreme" and he switches back and forth between "less than" and "less than or equal" (to a negative rank sum of 6 and 12 resp.). Also, his p-values are one-sided. Using modern technology, it is pretty easy to generate the enumerations that Pratt is referring to: > M <- as.matrix(do.call("expand.grid", rep(list(0:1),9))) > table(M%*%1:9) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 1 1 1 2 2 3 4 5 6 8 9 10 12 13 15 17 18 19 21 21 22 23 23 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 23 23 22 21 21 19 18 17 15 13 12 10 9 8 6 5 4 3 2 2 1 1 1 > table(M%*%3:11) 0 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 1 1 1 1 1 2 2 3 3 4 4 5 6 7 7 8 10 10 11 12 13 13 15 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 15 16 16 17 17 18 17 17 18 17 17 16 16 15 15 13 13 12 11 10 10 8 7 48 49 50 51 52 53 54 55 56 57 58 59 60 63 7 6 5 4 4 3 3 2 2 1 1 1 1 1 >From these, you can see that the probability of "6 or less" with the Wilcoxon >ranking is 14/512 and that of "12 or less" with the Pratt procedure is 23/512. To get two-sided tests, include the (symmetric) opposite tail, i.e., multiply by two and get > 14/256 [1] 0.0546875 > 23/256 [1] 0.08984375 Both of those should be recognizable from the exact tests in coin/exactRankTests. > The > following text shows the commands I use in R to calculate the signed-rank > test using these different R commands: > > Thanks in advance for any assistance. > > --Mike > > ################################################################################# > library(exactRankTests) #this loads the package for calculating the > modified signed-rank test > library(coin) #this adds additional routines for the wilcoxon signed-rank > test and the Pratt signed-rank test > # > # Data from Page 658 of Pratt > x <- c(1, 1, 1, 1, 1, 7, 10, 12, 13, 16, 17) > y <- c(1, 1, 3, 4, 6, 1, 1, 1, 1, 1, 1) > # > # STANDARD WILCOXON SIGNED RANK USING WILCOX.TEST > wilcox.test(x, y, alternative='two.sided', paired=TRUE) > # STANDARD WILCOXON SIGNED-RANK USING WILCOXSIGN_TEST. > wilcoxsign_test (x ~ y, zero.method = c("Wilcoxon")) > # > # > # MODIFIED WILCOXON SIGNED-RANK USING WILCOX.EXACT > wilcox.exact(x, y, alternative='two.sided', paired=TRUE, mu=0) > # PRATT SIGNED-RANK TEST > wilcoxsign_test (x ~ y, zero.method = c("Pratt")) > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.