Re: [R] help to speed up loops in r

Jeremiah Rounds Tue, 09 Jun 2009 02:22:27 -0700

Other then the reengineering of the approach, one thing that helps is don't 
index rows of data frames via loops... ever.  It is actually faster to convert 
to a matrix, do the operations, and then convert back to a data frame if you 
have too.


 

As an example I have your code in a function:

 

foo = function(averagedreplicates, zz){
    iindex = 1:(dim(averagedreplicates)[2])
    for (i in iindex) {
     cat(i,'\n')  #calculates Meanss
        #Sample A
     averagedreplicates[i,2] <- (zz[i,2] + zz[i,3])/2
     averagedreplicates[i,3] <- (zz[i,4] + zz[i,5])/2
     averagedreplicates[i,4] <- (zz[i,6] + zz[i,7])/2
     averagedreplicates[i,5] <- (zz[i,8] + zz[i,9])/2
     averagedreplicates[i,6] <- (zz[i,10] + zz[i,11])/2
     #Sample B
     averagedreplicates[i,7] <- (zz[i,12] + zz[i,13])/2
     averagedreplicates[i,8] <- (zz[i,14] + zz[i,15])/2
     averagedreplicates[i,9] <- (zz[i,16] + zz[i,17])/2
     averagedreplicates[i,10] <- (zz[i,18] + zz[i,19])/2
     averagedreplicates[i,11] <- (zz[i,20] + zz[i,21])/2
     #Sample C
     averagedreplicates[i,12] <- (zz[i,22] + zz[i,23])/2
     averagedreplicates[i,13] <- (zz[i,24] + zz[i,25])/2
     averagedreplicates[i,14] <- (zz[i,26] + zz[i,27])/2
     averagedreplicates[i,15] <- (zz[i,28] + zz[i,29])/2
     averagedreplicates[i,16] <- (zz[i,30] + zz[i,31])/2
     #Sample D
     averagedreplicates[i,17] <- (zz[i,32] + zz[i,33])/2
     averagedreplicates[i,18] <- (zz[i,34] + zz[i,35])/2
     averagedreplicates[i,19] <- (zz[i,36] + zz[i,37])/2
     averagedreplicates[i,20] <- (zz[i,38] + zz[i,39])/2
     averagedreplicates[i,21] <- (zz[i,40] + zz[i,41])/2
    }
    return(averagedreplicates)
}

 

I then make matrix and data.frame versions of things similar in size to what 
you are working with:

 

zz.as.m = matrix(runif(95000*41),95000,41)
zz.as.df = as.data.frame(zz.as.m)
ar.as.m = matrix(0,95000,21)
ar.as.df = as.data.frame(ar.as.m)


 

And we can time the matrix versions:

 

start = Sys.time()
x = foo(ar.as.m,zz.as.m)
stop = Sys.time()
stop-start  # .06 seconds for me

 

 

And on the data frame versions?

 

#using the data frame versions
start = Sys.time()
x = foo(ar.as.df,zz.as.df)
stop = Sys.time()
stop-start  # 31 seconds for me

 

 

It takes for me 516 times as long to do the same work in data frames as it 
would have took in matrixes for me.

 

People say "never use loops in R", and I wish they wouldn't say it like that 
because it distracts from the facts of the matter which is that sometimes 
looping in R is quite reasonably fast.  And sometimes... like when you are 
indexing rows of a data frame it is horrible.  These are the little things I 
learned combing through my Masters project for speed.

 

The only caveat of following this advice of always do this sort of work in 
matrixes is that it can be a little time consuming(developer time)  repairing 
factors. But in terms of code run time it is absolute essential to use the 
right data structure for the job.

 

Hope this is of assistance,

Jeremiah Rounds

  
 
> Date: Mon, 8 Jun 2009 15:45:40 +0000
> From: amitrh...@yahoo.co.uk
> To: r-help@r-project.org
> Subject: [R] help to speed up loops in r
> 
> 
> Hi
> i am using a script which involves the following loop. It attempts to reduce 
> a data frame(zz) of 95000 * 41 down to a data frame (averagedreplicates) of 
> 95000 * 21 by averaging the replicate values as you can see in the script 
> below. This script however is very slow (2days). Any suggestions to speed it 
> up. 
> 
> NB I have also tried using rowMeans rather than adding the 2 values and 
> dividing by 2. (same problem)
> 
> 
> 
> 
> #SCRIPT STARTS
> for (i in 1:length(averagedreplicates[,1]))
> #for (i in 1:dim(averagedreplicates)[1])
> {
> cat(i,'\n')
> 
> 
> #calculates Meanss
> #Sample A
> averagedreplicates[i,2] <- (zz[i,2] + zz[i,3])/2
> averagedreplicates[i,3] <- (zz[i,4] + zz[i,5])/2
> averagedreplicates[i,4] <- (zz[i,6] + zz[i,7])/2
> averagedreplicates[i,5] <- (zz[i,8] + zz[i,9])/2
> averagedreplicates[i,6] <- (zz[i,10] + zz[i,11])/2
> 
> #Sample B
> averagedreplicates[i,7] <- (zz[i,12] + zz[i,13])/2
> averagedreplicates[i,8] <- (zz[i,14] + zz[i,15])/2
> averagedreplicates[i,9] <- (zz[i,16] + zz[i,17])/2
> averagedreplicates[i,10] <- (zz[i,18] + zz[i,19])/2
> averagedreplicates[i,11] <- (zz[i,20] + zz[i,21])/2
> 
> #Sample C
> averagedreplicates[i,12] <- (zz[i,22] + zz[i,23])/2
> averagedreplicates[i,13] <- (zz[i,24] + zz[i,25])/2
> averagedreplicates[i,14] <- (zz[i,26] + zz[i,27])/2
> averagedreplicates[i,15] <- (zz[i,28] + zz[i,29])/2
> averagedreplicates[i,16] <- (zz[i,30] + zz[i,31])/2
> 
> #Sample D
> averagedreplicates[i,17] <- (zz[i,32] + zz[i,33])/2
> averagedreplicates[i,18] <- (zz[i,34] + zz[i,35])/2
> averagedreplicates[i,19] <- (zz[i,36] + zz[i,37])/2
> averagedreplicates[i,20] <- (zz[i,38] + zz[i,39])/2
> averagedreplicates[i,21] <- (zz[i,40] + zz[i,41])/2
> }
> 
> 
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

_________________________________________________________________
Hotmail® has ever-growing storage! Dont worry about storage limits. 

rial_Storage_062009
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help to speed up loops in r

Reply via email to