On Sun, 2007-09-16 at 08:46 -0700, kevinchang wrote: > Hey everyone, > > The code I wrote executes correctly but is stalled seriously. Is there a > way to hasten execution without coming up with a brand new algorithm > ?please help. Thanks a lot for your time. > > > #a simplified version of the code
Simple thing to do first is pre-allocate your storage. When you do: c <- NA You have a vector of length 1. Then in the loop, you extend C by 1 each time/iteration. To do this, R has to copy c and then replace it. If you set c to be the correct size in the first place, R doesn't have to do all this copying and replacing and is much faster as a result. If have modified your script as follows: a <- c("superman", "xman", "spiderman", "wolfman", "mansuper", "manspider") ## uncomment the below to test how it scales #a <- rep(a, 150000) b <- sapply(a, function(.srt) {paste(sort(strsplit(.srt, '')[[1]]), collapse="")}) ## store number of iterations we will do n.loop <- 1:length(b) ## use this to allocate storage space for c c <- numeric(length = n.loop) for(i in seq(along = c)) { if(length(which(b == b[i])) > 1) c[i] <- b[i] } c <- c[!is.na(c)] which when timed using system.time() with a now being a vector of 900000 strings (a repeated 150000 times in this case), I got the following timings: user system elapsed 121.752 0.341 122.712 So 121 seconds on my laptop with 2GB of RAM is not bad for such a sized problem. Some further comments. Don't use 'c' as a variable name, it won't over write the c() function but it is a bit confusing to use objects with names the same as functions. Second, *space out your code* - what you wrote is very difficult to parse for a human - you'll find it easier to see mistakes etc if you spread stuff out a bit. HTH G > > a<-c("superman" , "xman" , "spiderman" ,"wolfman" ,"mansuper","manspider" ) > b<-sapply(a,function(.srt){paste(sort(strsplit(.srt,'')[[1]]), > collapse="")}) > c<-NA > for(i in 1:length(b)) { > if(length(which(b==b[i]))>1) > c[i]<-b[i] > } > c<-c[!is.na(c)] > # But if my get the volumne of "a" up to about 150000 words , the loop will > work incredibly slowly. > -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.