Hmm.. that is interesting... I did this on our server machine which has
about 200 cores. So memory is not an issue. Also, building the dataframe
takes about a few minutes maximum for me. My code is similar to yours but
for the fact that I create my dataframe from read.delim("filename") and
then I drop the first column because it has characters. I don't know why it
takes long on my machine.

On Wed, May 23, 2012 at 11:26 AM, Benno Pütz <pu...@mpipsykl.mpg.de> wrote:

> I wonder how you do this (or maybe on what kind of machine you execute it).
>
> I tried it out of curiosity and get
>
> > df = as.data.frame(lapply(1:300,function(x)sample(200,250000,T)))
> > colnames(df) = sample(letters[1:20],300,T)
> > system.time(dfmed<-lapply(unique(colnames(df)), function(x)
> + rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE)))
>    user  system elapsed
>   5.680   0.952   7.171
>
> and those times are in seconds! The time consuming part was building the
> data.frame not the calculation.
>
> The only thing I noticed is that my R process claims some 1.4 GB of memory
> but that should not be a problem on any recent hardware but my guess at
> answering your question would be that this might be your problem,
> especially if you have other memory-hogging variables like this data frame
> lying around and you see severe memory swapping effects
>
> Benno
>
> Hello Everybody,
>
> The code:
>
> dfmed<-lapply(unique(colnames(df)), function(x)
> rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))
>
> takes really long time to execute ( in hours). Is there a faster way to do
> this?
>
> Thanks!
>
> On Tue, May 22, 2012 at 3:46 PM, Preeti <pre...@sci.utah.edu> wrote:
>
> Thanks Henrik! Here is the one-liner that I wrote:
>
>
> dfmed<-lapply(unique(colnames(df)), function(x)
>
> rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))
>
>
> Thanks again!
>
>
>
> On Tue, May 22, 2012 at 3:23 PM, Henrik Bengtsson <h...@biostat.ucsf.edu
> >wrote:
>
>
> See rowMedians() of the matrixStats package for replacing apply(x,
>
> MARGIN=1, FUN=median). /Henrik
>
>
> On Tue, May 22, 2012 at 12:34 PM, Preeti <pre...@sci.utah.edu> wrote:
>
> Hi,
>
>
> I have a 250,000 by 300 matrix. I am trying to calculate the median of
>
> those columns (by row) with column names that are identical. I would
>
> like
>
> this to be efficient since apply(x,1,median) where x is created by
>
> choosing
>
> only those columns with same column name and looping on this is taking a
>
> really long time. Is there an efficient way to do this?
>
>
> Thanks!
>
>
>       [[alternative HTML version deleted]]
>
>
> ______________________________________________
>
> R-help@r-project.org mailing list
>
> https://stat.ethz.ch/mailman/listinfo/r-help
>
> PLEASE do read the posting guide
>
> http://www.R-project.org/posting-guide.html
>
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> Benno Pütz
> Statistical Genetics
> MPI of Psychiatry
> Kraepelinstr. 2-10
> 80804 Munich, Germany
> T: ++49-(0)89-306 22 222
> F: ++49-(0)89-306 22 601
>
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to