On 08/06/2012 09:41 AM, Jie wrote:
After searching online, I found that clusterCall or foreach might be the
solution.

Re-write your outer loop as an lapply, then on non-Windows use parallel::mclapply. Or on windows use makePSOCKcluster and parLapply. I ended with

library(parallel)
library(MASS)
Maxi <- 10
Maxj <- 1000

doit <- function(i, Maxi, Maxj)
{
    ## initialization, not of interest
    Sigmahalf <- matrix(sample(10000, replace=TRUE),  100)
    Sigma <- t(Sigmahalf) %*% Sigmahalf
    x <- mvrnorm(n=Maxj, rep(0, 100), Sigma)
    xlist <- lapply(seq_len(nrow(x)), function(i, x) matrix(x[i,], 10), x)
    ## end of initialization

    fun <- function(x) {
        v <- eigen(x, symmetric=FALSE, only.values=TRUE)$values
        min(abs(v))
    }
    dd1 <- sapply(xlist, fun)
    dd2 <- dd1 + dd1 / sum(dd1)
    sum(dd1 * dd2)
}

> system.time(lapply(1:8, doit, Maxi, Maxj))
   user  system elapsed
  6.677   0.016   6.714
> system.time(mclapply(1:64, doit, Maxi, Maxj, mc.cores=8))
   user  system elapsed
 68.857   1.032  10.398

the extra arguments to eigen are important, as is avoiding unnecessary repeated calculations. The strategy of allocate-and-grow (result.vec=numeric(); result.vec[i] <- ...) is very inefficient (result.vec is copied in its entirety for each new value of i); better preallocate-and-fill (result.vec = integer(Maxi); result.vec[i] = ...) or let lapply manage the allocation.

Martin


Best wishes,
Jie

On Sun, Aug 5, 2012 at 10:23 PM, Jie <jimmycl...@gmail.com> wrote:

Dear All,

Suppose I have a program as below: Outside is a loop for simulation (with
random generated data), inside there are several sapply()'s (10~100) over
the data and something else, but these sapply's have to be sequential. And
each sapply do not involve very intensive calculation (a few seconds only).
So the outside loop takes minutes to finish one iteration.
I guess the better way is not to parallel sapply but the outer loop.
But I have no idea how to modify it. I have a simple code here. Only two
sapply's involved for simplicity. The logical in the sapply is not
  important.
Thank you for your attention and suggestion.

library(parallel)
library(MASS)
result.seq=c()
Maxi <- 100
for (i in 1:Maxi)
{
## initialization, not of interest
Sigmahalf <- matrix(sample(1:10000,size = 10000,replace =T ),  100)
Sigma <- t(Sigmahalf)%*%Sigmahalf
x <- mvrnorm(n=1000, rep(0, 10), Sigma)
xlist <- list()
for (j in 1:1000)
{
xlist[[j]] <- list(X = matrix( x [j, ],5))
}
## end of initialization

dd1 <- sapply(xlist,function(s) {min(abs((eigen(s$X))$values))})
  ##
sumdd1=sum(dd1)
for (j in 1:1000)
{
xlist[[j]]$dd1 <- dd1[j]/sumdd1
}
   ## Assume dd2 and dd1 can not be combined in one sapply()
dd2 <- sapply(xlist, function(s){min(abs((eigen(s$X))$values))+s$dd1})
result.seq[i] <- sum(dd1*dd2)

}



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to