The attached diff address the following issues in mclapply
mclapply coerces non-lists or objects (S3 or S4) to lists, but a list may not be
an efficient representation and is not required if the object implements length,
[, and [[ methods (lapply must also work on the object, either through coercion
to a list at the 'inner.do' level or through other means, e.g., promoting lapply
to a generic and writing a method specialized for the object). As written
someone wishing to implement mclapply on an object not efficiently represented
as a list would need to promote mclapply to a generic, and then re-implement an
mclapply method for their object, rather than re-using the existing code.
mcparallel is not consistently invoked with a 'name' argument; a name seems to
be superfluous to the code.
Creating the variable 'schedule' makes a full copy of a potentially large object
in the master process; delaying until required in the inner.do function may (?)
result in less copying.
Avoiding coercion to a list is similar to a suggestion for pvec.
https://stat.ethz.ch/pipermail/r-devel/2012-October/065097.html
Martin
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
Index: src/library/parallel/R/unix/mclapply.R
===================================================================
--- src/library/parallel/R/unix/mclapply.R (revision 61084)
+++ src/library/parallel/R/unix/mclapply.R (working copy)
@@ -47,15 +47,12 @@
}
}
on.exit(cleanup())
- ## Follow lapply
- if(!is.vector(X) || is.object(X)) X <- as.list(X)
if (!mc.preschedule) { # sequential (non-scheduled)
FUN <- match.fun(FUN)
if (length(X) <= cores) { # we can use one-shot parallel
jobs <- lapply(seq_along(X),
function(i) mcparallel(FUN(X[[i]], ...),
- name = names(X)[i],
mc.set.seed = mc.set.seed,
silent = mc.silent))
res <- mccollect(jobs)
@@ -117,8 +114,6 @@
if (cores < 2L) return(lapply(X = X, FUN = FUN, ...))
sindex <- lapply(seq_len(cores),
function(i) seq(i, length(X), by = cores))
- schedule <- lapply(seq_len(cores),
- function(i) X[seq(i, length(X), by = cores)])
ch <- list()
res <- vector("list", length(X))
names(res) <- names(X)
@@ -126,13 +121,13 @@
fin <- rep(FALSE, cores)
dr <- rep(FALSE, cores)
inner.do <- function(core) {
- S <- schedule[[core]]
f <- mcfork()
if (isTRUE(mc.set.seed)) mc.advance.stream()
if (inherits(f, "masterProcess")) { # this is the child process
on.exit(mcexit(1L, structure("fatal error in wrapper code", class="try-error")))
if (isTRUE(mc.set.seed)) mc.set.stream()
if (isTRUE(mc.silent)) closeStdout()
+ S <- X[sindex[[core]]]
sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))
mcexit(0L)
}
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel