Re: [R] Normality test

Duncan Murdoch Wed, 03 Sep 2008 08:25:28 -0700

On 03/09/2008 10:33 AM, Williams, Robin wrote:

Hi,
I am looking for a normality test in R to see if a vector of data I have
can be assumed to be normally distributed and hence used in a linear

regression.

Raw data that is suitable for standard linear regression is normallydistributed, but the mean varies from observation to observation. Thenecessary assumption is that the errors are normally distributed withzero mean, but the data itself also includes the non-random parts of themodel. The effect of the varying means is that the data will generally*not* appear to come from a normal distribution if you just throw it allinto a vector and look at it.

So let's assume you're working with residuals from a linear fit. Theresiduals should be normally distributed with mean zero, but theirvariances won't be equal. It may be that in a large dataset this willbe enough to get a false declaration of non-normality even withperfectly normal errors. In a small dataset you'll rarely have enoughpower to detect non-normality.

So overall, don't use something like shapiro.test for what you have inmind. Any recent regression text should give advice on modeldiagnostics that will do a better job.

help.search("normality test")

suggests the Shapiro test, ?shapiro.test.
Now maybe I am interpreting things incorrectly (as is usually the case),
am I right in assuming that this is a composite test for normality, and
hence a high p-value would suggest that the sample is normally

distributed?

A low p-value (e.g. p < 0.05) could suggest there is evidence ofnon-normality, but p > 0.05 just shows a lack of evidence. In the casewhere the data is truly normally distributed, you'd expect p to beuniformly distributed between 0 and 1. (I have an article in thecurrent American Statistician suggesting ways to teach p-values toemphasize this; unfortunately, it seems to be a surprise to a lot ofpeople.)


Duncan Murdoch

As a test I did

shapiro.test(rnorm(4500))
a few times, and achieved very different p-values, so I cannot be sure.
I had assumed that a random sample of 4500 would have a very high

p-value on all occasions but it appears not, this is interesting.Are there any other tests that people would recommend over this one in

the base packages? I assume not as help.search did not suggest any.
  So am I right about a high p-value suggesting normality?

Many thanks for any help.

Robin Williams
Met Office summer intern - Health Forecasting

[EMAIL PROTECTED]

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Normality test

Reply via email to