On Sat, Apr 9, 2011 at 10:24 AM, Sean Farris <farris...@vcu.edu> wrote: > I am in need of someone's help in correlating gene expression. I'm somewhat > new to R, and can't seem to find anyone local to help me with what I think > is a simple problem. > > I need to obtain pearson and spearman correlation coefficients, and > corresponding p-values for all of the genes in my dataset that correlate to > one specific gene of interest. I'm working with mouse Affymetrix Mouse 430 > 2.0 arrays, so I've got about 45,000 probesets (rows; with 1st column > containing identifiers) and 30 biological replicates (columns; with the top > row containing the header information).
Sean, I'm the maintainer of the package WGCNA that does correlation network analysis of gene expression data. I recommend you check out the package and the tutorials at http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/index.html The package contains a couple useful functions for correlation p-values. Unlike cor.test which only takes two vectors (not matrices), you can use the function corAndPvalue to calculate Pearson correlations and the corresponding p-values for matrices. If you already have the correlation matrix pre-calculated AND you have no missing data (i.e., constant number of observations), you can also use corPvalueStudent to calculate the p-values. We don't use Spearman correlations much (we prefer the biweight midcorrelation, functions bicor and bicorAndPvalue, as a robust alternative to Pearson correlation), but you can approximate the Spearman p-values by the Student p-values (that are used for Pearson correlations). Statisticians who read this, please don't execute me for this suggestion :) To use the function cor(), you need to transpose the data so that genes are in columns and samples in rows. Just be aware that to correlate all probe sets at a time you need a 40k+ times 40k+ matrix to hold the result. Only a large computer (at least 32GB of memory, possibly needing 64GB) will be able to handle such a matrix and the necessary manipulations. The WGCNA package contains methods to construct co-expression networks on such big sets if necessary. HTH, Peter ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.