Hi all, I am having a problem with the function "p.adjust" in stats. I have looked at the manuals and searched the R site, but didn't get anything that seems directly relevant. Can anybody throw any light on it or confirm my suspicion that this might be a bug?
I am trying to use the p.adjust() function to do Benjamini/Hochberg FDR control on a vector of p-values that are the smallest K p-values selected from an N-length vector. So I am using p.adjust(p, n=N, method="BH") This seems to give smaller adjusted p-values than if I use p.adjust(p, method="BH"). For example (K=10, N=20) > p=runif(10) > p [1] 0.4690849 0.5108430 0.8703507 0.1950010 0.9938629 0.8582079 0.7987490 0.9312799 0.9195361 0.7814532 > p.adjust(p, n=20, method="BH") [1] 0.7818081 0.7859123 0.9802947 0.3545472 0.9938629 0.9802947 0.9802947 0.9802947 0.9802947 0.9802947 > p.adjust(p, method="BH") [1] 0.9938629 0.9938629 0.9938629 0.9938629 0.9938629 0.9938629 0.9938629 0.9938629 0.9938629 0.9938629 This seemed unusual, since if p is being selected from a larger vector, one would expect the FDR to increase. Then I figured out that what is happening is that the p.adjust function is working as if p is being padded with zeros to make it N-length. For example > p.adjust(c(p, rep(0,10)), method="BH") [1] 0.7818081 0.7859123 0.9802947 0.3545472 0.9938629 0.9802947 0.9802947 0.9802947 0.9802947 0.9802947 0.0000000 [12] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 Which gives the same adjusted p-values as when I had used n=20 in the function. This looks contrary to what the manual for the function actually says: "Note that you can set 'n' larger than 'length(p)' which means the unobserved p-values are assumed to be greater than all the observed p for '"bonferroni"' and '"holm"' methods and equal to 1 for the other methods." Instead of assuming un-observed p-values to be equal to 1, the function seems to assume un-observed p-values to be 0. I am using R version 2.7.2 on Windows XP Professional. sessionInfo() data pasted at end of email. Regards Sudhir Varma R version 2.7.2 (2008-08-25) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] tools stats graphics grDevices utils datasets methods base other attached packages: [1] GEOquery_2.4.1 RCurl_0.9-3 Biobase_2.0.1 mvtnorm_0.9-2 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.