What is the distribution of the p-value when the null hypothesis is true?

This is an important question that unfortunately tends to get glossed over or 
left out completely in many courses due to the amount of information that needs 
to be packed into them.

For most appropriate tests, when the null hypothesis is true and all other 
assumptions are true, the p-value is distributed as uniform(0,1).  Hence the 
probability of a type I error is alpha for any value of alpha.  Therefore, when 
the null is true, the likelihoods of getting a p-value of 0.99, 0.051, 0.049, 
or 0.0001 are all exactly the same.

If you want a high p-value for a normality test, just collect only 1 data 
point, no matter what it's value is, it is completely consistant with the 
assumption that it came from some normal distribution (p-value=1).

For large sample sizes the important question is not "did this data come from 
an exact normal distribution?", but rather, "Is the distribution this data came 
from close enough to normal?".

If you really feel the need for a test of normality in large sample sizes, then 
see this post:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/136160.html

Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111



> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Williams, Robin
> Sent: Wednesday, September 03, 2008 8:34 AM
> To: r-help@r-project.org
> Subject: [R] Normality test
>
> Hi,
> I am looking for a normality test in R to see if a vector of
> data I have can be assumed to be normally distributed and
> hence used in a linear regression.
> > help.search("normality test")
> suggests the Shapiro test, ?shapiro.test.
> Now maybe I am interpreting things incorrectly (as is usually
> the case), am I right in assuming that this is a composite
> test for normality, and hence a high p-value would suggest
> that the sample is normally distributed? As a test I did
> shapiro.test(rnorm(4500))
> a few times, and achieved very different p-values, so I
> cannot be sure.
> I had assumed that a random sample of 4500 would have a very
> high p-value on all occasions but it appears not, this is interesting.
>   Are there any other tests that people would recommend over
> this one in the base packages? I assume not as help.search
> did not suggest any.
>   So am I right about a high p-value suggesting normality?
> Many thanks for any help.
>
>
> Robin Williams
> Met Office summer intern - Health Forecasting
> [EMAIL PROTECTED]
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to