I wonder how you do this (or maybe on what kind of machine you execute it).
I tried it out of curiosity and get
> df = as.data.frame(lapply(1:300,function(x)sample(200,250000,T)))
> colnames(df) = sample(letters[1:20],300,T)
> system.time(dfmed<-lapply(unique(colnames(df)), function(x)
+ rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE)))
user system elapsed
5.680 0.952 7.171
and those times are in seconds! The time consuming part was building the
data.frame not the calculation.
The only thing I noticed is that my R process claims some 1.4 GB of memory but
that should not be a problem on any recent hardware but my guess at answering
your question would be that this might be your problem, especially if you have
other memory-hogging variables like this data frame lying around and you see
severe memory swapping effects
Benno
> Hello Everybody,
>
> The code:
>
> dfmed<-lapply(unique(colnames(df)), function(x)
> rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))
>
> takes really long time to execute ( in hours). Is there a faster way to do
> this?
>
> Thanks!
>
> On Tue, May 22, 2012 at 3:46 PM, Preeti <[email protected]> wrote:
>
>> Thanks Henrik! Here is the one-liner that I wrote:
>>
>> dfmed<-lapply(unique(colnames(df)), function(x)
>> rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))
>>
>> Thanks again!
>>
>>
>> On Tue, May 22, 2012 at 3:23 PM, Henrik Bengtsson
>> <[email protected]>wrote:
>>
>>> See rowMedians() of the matrixStats package for replacing apply(x,
>>> MARGIN=1, FUN=median). /Henrik
>>>
>>> On Tue, May 22, 2012 at 12:34 PM, Preeti <[email protected]> wrote:
>>>> Hi,
>>>>
>>>> I have a 250,000 by 300 matrix. I am trying to calculate the median of
>>>> those columns (by row) with column names that are identical. I would
>>> like
>>>> this to be efficient since apply(x,1,median) where x is created by
>>> choosing
>>>> only those columns with same column name and looping on this is taking a
>>>> really long time. Is there an efficient way to do this?
>>>>
>>>> Thanks!
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [email protected] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Benno Pütz
Statistical Genetics
MPI of Psychiatry
Kraepelinstr. 2-10
80804 Munich, Germany
T: ++49-(0)89-306 22 222
F: ++49-(0)89-306 22 601
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.