Re: [R] strange (to me) p-value distribution

Wolfgang Huber Sat, 07 Jun 2008 15:42:39 -0700


Dear Mark,

try out the example code below. Such a p-value distribution often occursif you have "batch" effects, i.e. if the between-group variability isin fact less than the within-group variability.

In the example below, I do, for each row of x, a t-test between thevalues in the even and odd columns; for rt2, a "batch effect" has beenadded to columns 1:10.


 hope this helps
        Wolfgang


library("genefilter")

nr = 31000
nc = 20

x  = matrix(rnorm(nr*nc), nrow=nr, ncol=nc)

rt1 = rowttests(x, factor(1:nc %% 2))

## add a batch effect
x[, 1:10] = x[, 1:10] + pi/2
rt2 = rowttests(x, factor(1:nc %% 2))

par(mfrow=c(2,1))
hist(rt1$p.value, breaks=100, col="mistyrose")
hist(rt2$p.value, breaks=100, col="mistyrose")


------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber


Mark Kimpel a écrit 07/06/2008 18:39:

I'm working with a genomic data-set with ~31k end-points and have
performed an F-test across 5 groups for each end-point. The QA
measurments on the individual micro-arrays all look good. One of the
first things I do in my work-flow is take a look at the p-valued
distribution. it is my understanding that, if the findings are due to
chance alone, the p-value distribution should be uniform. In this case
the histogram, even with 1000 break points, starts low on the left and
climbs almost linearly to the right. In other words, very skewed
towards high p-values. I understand that this could be happening by
chance alone, but the same behavior is seen in the two contrasts of
interest I looked at and I have seen it in a couple of our other
genomic, high-dimensional experiments as well. I might also add that I
looked at the actual numbers of genes with p-val < X and indeed, for
each X < 0.05, there are far fewer sig. genes than one would expect by
chance.

I can't figure out what is causing this and, if there is a cause, I'd
like to be able to tell the experimenter if it indicates a technical
factor. I've had other experiments where the p-value dist approximates
normal and of course those that have nice spikes at low p-values
indicating we have some significant genes.

I'm addressing this hear rather than to BioC because I suspect there
is some basis statistical mechanism that could explain this. Is there?

Mark


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] strange (to me) p-value distribution

Reply via email to