Dear Mark,
try out the example code below. Such a p-value distribution often occurs if you have "batch" effects, i.e. if the between-group variability is in fact less than the within-group variability.
In the example below, I do, for each row of x, a t-test between the values in the even and odd columns; for rt2, a "batch effect" has been added to columns 1:10.
hope this helps Wolfgang library("genefilter") nr = 31000 nc = 20 x = matrix(rnorm(nr*nc), nrow=nr, ncol=nc) rt1 = rowttests(x, factor(1:nc %% 2)) ## add a batch effect x[, 1:10] = x[, 1:10] + pi/2 rt2 = rowttests(x, factor(1:nc %% 2)) par(mfrow=c(2,1)) hist(rt1$p.value, breaks=100, col="mistyrose") hist(rt2$p.value, breaks=100, col="mistyrose") ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber Mark Kimpel a écrit 07/06/2008 18:39:
I'm working with a genomic data-set with ~31k end-points and have performed an F-test across 5 groups for each end-point. The QA measurments on the individual micro-arrays all look good. One of the first things I do in my work-flow is take a look at the p-valued distribution. it is my understanding that, if the findings are due to chance alone, the p-value distribution should be uniform. In this case the histogram, even with 1000 break points, starts low on the left and climbs almost linearly to the right. In other words, very skewed towards high p-values. I understand that this could be happening by chance alone, but the same behavior is seen in the two contrasts of interest I looked at and I have seen it in a couple of our other genomic, high-dimensional experiments as well. I might also add that I looked at the actual numbers of genes with p-val < X and indeed, for each X < 0.05, there are far fewer sig. genes than one would expect by chance. I can't figure out what is causing this and, if there is a cause, I'd like to be able to tell the experimenter if it indicates a technical factor. I've had other experiments where the p-value dist approximates normal and of course those that have nice spikes at low p-values indicating we have some significant genes. I'm addressing this hear rather than to BioC because I suspect there is some basis statistical mechanism that could explain this. Is there? Mark
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.