On 01-Dec-09 23:11:20, rkevinbur...@charter.net wrote: > If I have data that I feed into shapio.test and jarque.bera.test yet > they seem to disagree. What do I use for a decision? > > For my data set I have p.value of 0.05496421 returned from the > shapiro.test and 0.882027 returned from the jarque.bera.test. I have > included the data set below. > > Thank you. > > Kevin [Data snipped]
The reason is that the Jarque-Bera test (JB) works with the squared skewness plus 1/4 of (kurtosis - 3)^2. For a Normal distribution, the skewness is zero and the kurtosis is 3. Hence large values of JB are evidence against the hypothesis that the distribution is Normal, in the basis that skewness and/or kurtosis depart from the values to be expected for a Normal distribution. However, it is perfectly possible to get skewness near 0, and kurtosis near 3, for manifestly non-Normal distributions. I get Skewness=0.014 and Kurtosis=3.32 for your data, both quite close to the Normal values. However, you only have to look at the histogram to see that the distribution has a distinctly non-Normal appearance: hist(x,breaks=20) The Shapiro-Wilk test, on the other hand, works (broadly speaking) in terms of standardised spacings between the order statistics of the sample, compared with what they should be from a Normal. It is therefore sensitive to features of the sample which are rather different to the features that the J-B test is sensitive to. Given the appearance of the histogram, it is to be expected that many of the spacings between order statistics are different from what are to be expected from a Normal distribution. Much of this, and also the insensitivity of the J-B test, arises from the clump of values at the top of the range (6 out of the 59 between 1.83 and 2.00). Leave these out and you get quite different results. The following is an explicit implementation of the J-B test, based on the Wikipedia description, and using the chi-squared(2) approximation for the P-value (and also returning the skewness and kurtosis): jarque.bera <- function(x){ m1 <- mean(x) ; m2 <- mean((x-m1)^2) m3 <- mean((x-m1)^3) ; m4 <- mean((x-m1)^4) n <- length(x) ; S <- m3/(m2^(3/2)) ; K <- m4/(m2^2) JB <- (n/6)*(S^2 + ((K-3)^2)/4) P <- 1-pchisq(JB,2) list(JB=JB,P=P,S=S,K=K) } For your original data x (as explicitly extracted by Ben Bolker): jarque.bera(x) # $JB # [1] 0.251065 # $P # [1] 0.882027 ##### (As you found yourself) # $S # [1] 0.01396711 # $K # [1] 3.318352 For the data excluding the 6 values above 1.83: jarque.bera(x[x<=1.83]) # $JB # [1] 6.047885 # $P # [1] 0.04860919 # $S # [1] -0.6831185 # $K # [1] 3.933842 So excluding these values has produced a distinct negative skewness and a kurtosis distinctly greater than 3. Hence those 6 values were primarily responsible for almost completely eliminating the skewness and the kurtosis of the remainder of the distribution, and hence frustrating the J-B test. Now compare with the Shapiro-Wilk test: shapiro.test(x) # Shapiro-Wilk normality test # data: x # W = 0.9608, p-value = 0.05496 so the S-W P-value 0.05496 for the full data is close to the J-B P-value 0.04861 for the reduced data. Now compare with the S-W test on the reduced data: shapiro.test(x[x <= 1.83]) # Shapiro-Wilk normality test # data: x[x <= 1.83] # W = 0.9595, p-value = 0.06968 The S-W P-vale has increased slightly (from 0.055 to 0.070), but the S-W test is still picking up the non-Normality in the reduced dataset. The summary is that the S-W test and the J-B test are looking at different aspects of the data. The J-B test depends only on two summary functions (skewness and kurtosis) as indices of non-Normality, while the S-W test is sensitive to a wider variation of the fine detail of the distribution of the data. The failure of the J-B test to detect the non-Normality in the data is primarily due to the fact that the 6 data values at the top end have, in effect, compensated for the marked skewness and kurtosis in the remainder of the data. The ultimate lesson from all this is that different tests test for different kinds of deperture from the Null Hypothesis. See also Uwe Ligges's remarks ... Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 02-Dec-09 Time: 11:53:22 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.