> On Dec 1, 2016, at 9:27 AM, Doran, Harold <hdo...@air.org> wrote: > > I am having tremendous fortune using the foreach function in the foreach > package sending work out to multiple cores in order to reduce computational > time. > > I am experimenting with which types of tasks benefit from running in parallel > and which do not and so this is a bit of a learning experience by trial and > error. > > One particular task I cannot seem to realize a benefit from (in terms of > reduced time) is splitting or subsetting a large data frame. I realize there > are other "fast" options like using data.table, but current goal is to see if > this can benefit from multiple cores or not. > > So, a very small toy example of how I am approaching the "traditional" and > "parallel" way is as follows. My actual data is much, much larger and it > turns out the parallel version of doing it this way vis-à-vis the traditional > way is unbelievably slow. Hence Im not sure if there is a good theoretical > reason why such a task cannot run faster when sent out to multiple cores if > there is a user error that I need to better understand and correct > > library(foreach) > library(doParallel) > registerDoParallel(cores=4) > > tmp <- data.frame(id = rep(1:200, each = 10), foo = rnorm(2000)) > > ff1 <- split(tmp, tmp$id) > > myList <- unique(tmp$id) > N <- length(myList) > ff2 <- foreach(i = 1:N) %dopar% { tmp[which(tmp$id == myList[i]),]}
I would have imagined that using split to deliver separate instance of separate data.frame parcels to the `i` -argument would be more sensible. Otherwise you are sending full copies to each worker and then doing the extraction N times rather than once.There's a lot of checking using data.frame methods. I also think you would want to avoid making reference to objects "outside" the parallel function application. ff2 <- foreach( z = iter( ff1) ) %dopar% { max(z$id) } > > Thanks, > Harold > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.