Dear R wizards--- I have a wrapper on mclapply() that makes it a little easier for me to do multiprocessing. (Posting this may make life easier for other googlers.) I pass a data frame, a vector that tells me what rows should be recomputed, and the function; and I get back a vector or matrix of answers.
d <- data.frame( id=1:6, val=11:16 ) loc <- c(TRUE,TRUE,FALSE,TRUE,FALSE,TRUE) v1 <- mc.byselectrows( d, loc, function(x) x[,2]^2 ) v2 <- mc.byselectrows(d, loc, function(x) cbind(x[,2]^2,x[,2]^3)) mc.byselectrows <- function(data.in, recalclist, FUN, ...) { data.notdone <- data.in[recalclist,] cat.stderr("[mc.byselectrows: ", nrow(data.notdone), "rows to be recomputed out of", nrow(data.in), "]\n") FUN.ON.ROWS <- function(.index, ...) as.matrix(FUN(data.notdone[.index,], ...)) soln <- mclapply( as.list(1:nrow(data.notdone)) , FUN.ON.ROWS, ... ) rv <- do.call("rbind", soln) ## omits naming. if (ncol(rv)==1) rv <- as.vector(rv) rv } this works fine, except that what I want to get NA's in the return positions that were not recalculated. then, I can write newdata$y <- ifelse ( is.na(olddata$y), mc.byselectrows( olddata, is.na(olddata$y), fun.calc.y ), olddata$y ) I can do this very inelegantly, of course. I can merge recalclist into data.in and then write a loop that substitutes for the do.call to rbind. yikes. or I could do the recalclist contingency inside the FUN.ON.ROWS, but this is costly in terms of execution time. are there obvious solutions? advice appreciated. regards, /iaw ---- Ivo Welch (ivo.we...@gmail.com) ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.