Hmm.. that is interesting... I did this on our server machine which has about 200 cores. So memory is not an issue. Also, building the dataframe takes about a few minutes maximum for me. My code is similar to yours but for the fact that I create my dataframe from read.delim("filename") and then I drop the first column because it has characters. I don't know why it takes long on my machine.
On Wed, May 23, 2012 at 11:26 AM, Benno Pütz <pu...@mpipsykl.mpg.de> wrote: > I wonder how you do this (or maybe on what kind of machine you execute it). > > I tried it out of curiosity and get > > > df = as.data.frame(lapply(1:300,function(x)sample(200,250000,T))) > > colnames(df) = sample(letters[1:20],300,T) > > system.time(dfmed<-lapply(unique(colnames(df)), function(x) > + rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))) > user system elapsed > 5.680 0.952 7.171 > > and those times are in seconds! The time consuming part was building the > data.frame not the calculation. > > The only thing I noticed is that my R process claims some 1.4 GB of memory > but that should not be a problem on any recent hardware but my guess at > answering your question would be that this might be your problem, > especially if you have other memory-hogging variables like this data frame > lying around and you see severe memory swapping effects > > Benno > > Hello Everybody, > > The code: > > dfmed<-lapply(unique(colnames(df)), function(x) > rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE)) > > takes really long time to execute ( in hours). Is there a faster way to do > this? > > Thanks! > > On Tue, May 22, 2012 at 3:46 PM, Preeti <pre...@sci.utah.edu> wrote: > > Thanks Henrik! Here is the one-liner that I wrote: > > > dfmed<-lapply(unique(colnames(df)), function(x) > > rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE)) > > > Thanks again! > > > > On Tue, May 22, 2012 at 3:23 PM, Henrik Bengtsson <h...@biostat.ucsf.edu > >wrote: > > > See rowMedians() of the matrixStats package for replacing apply(x, > > MARGIN=1, FUN=median). /Henrik > > > On Tue, May 22, 2012 at 12:34 PM, Preeti <pre...@sci.utah.edu> wrote: > > Hi, > > > I have a 250,000 by 300 matrix. I am trying to calculate the median of > > those columns (by row) with column names that are identical. I would > > like > > this to be efficient since apply(x,1,median) where x is created by > > choosing > > only those columns with same column name and looping on this is taking a > > really long time. Is there an efficient way to do this? > > > Thanks! > > > [[alternative HTML version deleted]] > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > Benno Pütz > Statistical Genetics > MPI of Psychiatry > Kraepelinstr. 2-10 > 80804 Munich, Germany > T: ++49-(0)89-306 22 222 > F: ++49-(0)89-306 22 601 > > > > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.