Other then the reengineering of the approach, one thing that helps is don't index rows of data frames via loops... ever. It is actually faster to convert to a matrix, do the operations, and then convert back to a data frame if you have too.
As an example I have your code in a function: foo = function(averagedreplicates, zz){ iindex = 1:(dim(averagedreplicates)[2]) for (i in iindex) { cat(i,'\n') #calculates Meanss #Sample A averagedreplicates[i,2] <- (zz[i,2] + zz[i,3])/2 averagedreplicates[i,3] <- (zz[i,4] + zz[i,5])/2 averagedreplicates[i,4] <- (zz[i,6] + zz[i,7])/2 averagedreplicates[i,5] <- (zz[i,8] + zz[i,9])/2 averagedreplicates[i,6] <- (zz[i,10] + zz[i,11])/2 #Sample B averagedreplicates[i,7] <- (zz[i,12] + zz[i,13])/2 averagedreplicates[i,8] <- (zz[i,14] + zz[i,15])/2 averagedreplicates[i,9] <- (zz[i,16] + zz[i,17])/2 averagedreplicates[i,10] <- (zz[i,18] + zz[i,19])/2 averagedreplicates[i,11] <- (zz[i,20] + zz[i,21])/2 #Sample C averagedreplicates[i,12] <- (zz[i,22] + zz[i,23])/2 averagedreplicates[i,13] <- (zz[i,24] + zz[i,25])/2 averagedreplicates[i,14] <- (zz[i,26] + zz[i,27])/2 averagedreplicates[i,15] <- (zz[i,28] + zz[i,29])/2 averagedreplicates[i,16] <- (zz[i,30] + zz[i,31])/2 #Sample D averagedreplicates[i,17] <- (zz[i,32] + zz[i,33])/2 averagedreplicates[i,18] <- (zz[i,34] + zz[i,35])/2 averagedreplicates[i,19] <- (zz[i,36] + zz[i,37])/2 averagedreplicates[i,20] <- (zz[i,38] + zz[i,39])/2 averagedreplicates[i,21] <- (zz[i,40] + zz[i,41])/2 } return(averagedreplicates) } I then make matrix and data.frame versions of things similar in size to what you are working with: zz.as.m = matrix(runif(95000*41),95000,41) zz.as.df = as.data.frame(zz.as.m) ar.as.m = matrix(0,95000,21) ar.as.df = as.data.frame(ar.as.m) And we can time the matrix versions: start = Sys.time() x = foo(ar.as.m,zz.as.m) stop = Sys.time() stop-start # .06 seconds for me And on the data frame versions? #using the data frame versions start = Sys.time() x = foo(ar.as.df,zz.as.df) stop = Sys.time() stop-start # 31 seconds for me It takes for me 516 times as long to do the same work in data frames as it would have took in matrixes for me. People say "never use loops in R", and I wish they wouldn't say it like that because it distracts from the facts of the matter which is that sometimes looping in R is quite reasonably fast. And sometimes... like when you are indexing rows of a data frame it is horrible. These are the little things I learned combing through my Masters project for speed. The only caveat of following this advice of always do this sort of work in matrixes is that it can be a little time consuming(developer time) repairing factors. But in terms of code run time it is absolute essential to use the right data structure for the job. Hope this is of assistance, Jeremiah Rounds > Date: Mon, 8 Jun 2009 15:45:40 +0000 > From: amitrh...@yahoo.co.uk > To: r-help@r-project.org > Subject: [R] help to speed up loops in r > > > Hi > i am using a script which involves the following loop. It attempts to reduce > a data frame(zz) of 95000 * 41 down to a data frame (averagedreplicates) of > 95000 * 21 by averaging the replicate values as you can see in the script > below. This script however is very slow (2days). Any suggestions to speed it > up. > > NB I have also tried using rowMeans rather than adding the 2 values and > dividing by 2. (same problem) > > > > > #SCRIPT STARTS > for (i in 1:length(averagedreplicates[,1])) > #for (i in 1:dim(averagedreplicates)[1]) > { > cat(i,'\n') > > > #calculates Meanss > #Sample A > averagedreplicates[i,2] <- (zz[i,2] + zz[i,3])/2 > averagedreplicates[i,3] <- (zz[i,4] + zz[i,5])/2 > averagedreplicates[i,4] <- (zz[i,6] + zz[i,7])/2 > averagedreplicates[i,5] <- (zz[i,8] + zz[i,9])/2 > averagedreplicates[i,6] <- (zz[i,10] + zz[i,11])/2 > > #Sample B > averagedreplicates[i,7] <- (zz[i,12] + zz[i,13])/2 > averagedreplicates[i,8] <- (zz[i,14] + zz[i,15])/2 > averagedreplicates[i,9] <- (zz[i,16] + zz[i,17])/2 > averagedreplicates[i,10] <- (zz[i,18] + zz[i,19])/2 > averagedreplicates[i,11] <- (zz[i,20] + zz[i,21])/2 > > #Sample C > averagedreplicates[i,12] <- (zz[i,22] + zz[i,23])/2 > averagedreplicates[i,13] <- (zz[i,24] + zz[i,25])/2 > averagedreplicates[i,14] <- (zz[i,26] + zz[i,27])/2 > averagedreplicates[i,15] <- (zz[i,28] + zz[i,29])/2 > averagedreplicates[i,16] <- (zz[i,30] + zz[i,31])/2 > > #Sample D > averagedreplicates[i,17] <- (zz[i,32] + zz[i,33])/2 > averagedreplicates[i,18] <- (zz[i,34] + zz[i,35])/2 > averagedreplicates[i,19] <- (zz[i,36] + zz[i,37])/2 > averagedreplicates[i,20] <- (zz[i,38] + zz[i,39])/2 > averagedreplicates[i,21] <- (zz[i,40] + zz[i,41])/2 > } > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. _________________________________________________________________ Hotmail® has ever-growing storage! Don’t worry about storage limits. rial_Storage_062009 [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.