Peter, Many thank for suggesting me this package. I very much believe that this will help me. But I was trying to correlate all probes(correlation between entities not variables) to calculate differentially coexpressed gene sets using package coXpress in R. I could not reduce the number on the basis of intensity, since most of the genes are down regulated and upregulated in treated conditions, so they are of my interest and cannot be removed from control samples(since I have to compare both).
can you further suggest me an alternative for differentially coexpression analysis, since this is what I need to know the most-- the sets which are behaving differently across conditions. Has any one ever used this package--coXpress?? Regards .. Jyotasana ----- Original Message ----- From: "Peter Langfelder" <peter.langfel...@gmail.com> To: "Jyotasana Gulati" <jgul...@ice.mpg.de> Cc: r-help@r-project.org Sent: Thursday, September 30, 2010 4:05:44 AM Subject: Re: [R] cor() alternative for huge data set On Wed, Sep 29, 2010 at 1:27 PM, Jyotasana Gulati <jgul...@ice.mpg.de> wrote: > Hi, > > I am have a data set of around 43000 probes(rows), and have to calculate > correlation matrix. When I run cor function in R, its throwing an error > message of RAM shortage which was obvious for such huge number of rows. I am > not getting a logical way to cut off this huge number of entities, is there > an alternative to pearson correlation or with other dist() methods > calculation(euclidean) that can be run on such a huge data set?? > Every help will be appreciated. Hmm... Are you calculating a correlation of 43000 probes, or of some number of samples across 43000 probes? If the former, read below. If the latter, I'm surprised you are running out of memory. Issuing garbage collection (gc()) before the calculation, closing all other programs, removing all other large objects from the R workspace etc. may help. If you really need the 43k times 43k correlation matrix of your 43k probes, read on. [Disclosure: this is a shameless plug for the package WGCNA (Weighted Gene Co-expression Network Analysis, also known as Weighted Correlation Network Analysis), from the package author, namely me.] First, since the distance matrix will be huge, you will not gain using other distance methods either. Second, depending on what you want to do with the 43k probes, the package WGCNA may help you. It has methods for creating correlation networks among a large number of probes. The idea is to pre-cluster the probes using what I call projective K-means, function projectiveKMeans. The pre-clustering will return what we call blocks of probes (or genes). We assume (this is a big assumption) that correlations among probes belonging to different blocks can be neglected. Then we treat each block separately for network construction (or, in your case, possibly simple calculation of correlation). Although this isn't strictly an R topic but rather microarray analysis issue, in my experience it is often useful to filter out probes before actually calculating and interpreting large correlation matrices. In conjunction with filtering, it can be advantageous to only keep one probe per gene (presumably there is more than one probe per gene in you data set). The filtering criterion varies from analysis to analysis, but if your data represent intensities, it is often a good idea to throw away probes whose intensity is always low, because such signals are mostly noise. If you decide to check out WGCNA, look at http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA/. Peter ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.