It is (at least for me) really unclear what the problem is, or how it's related to mclapply. You say " this works fine, except that what I want to get NA's in the return positions that were not recalculated. then, I can write > > newdata$y <- ifelse ( is.na(olddata$y), mc.byselectrows( olddata, > is.na(olddata$y), fun.calc.y ), olddata$y ) " Why ??? Are you applying the function twice ? than why not simply v1.1 <- mc.byselectrows( d, loc<1, function(x) x[,2]^2 ) the second time ?
If the problem is in keeping track of which rows got calculated, why not rename with the row.names omitted after mclapply (probably a good idea anyway): FUN.ON.ROWS <- function(.index, ...) as.matrix(FUN(data.notdone[.index,], ...)) soln <- mclapply( as.list(1:nrow(data.notdone)) , FUN.ON.ROWS, ... ) rv <- do.call("rbind", soln) ## omits naming. if (ncol(rv)==1){ rv <- as.vector(rv) ; names(rv) <- row.names(data.notdone) } else rownames(rv) <- row.names(data.notdone) rv } And finally, you don't even need row.names for c(v1,d[loc<1,2]) Or am I missing something here ? BTW your code uses cat.stderr (which is local ? ) instead of cat, and has no call to multicore. Cheers > On Mon, Mar 26, 2012 at 4:28 PM, ivo welch <ivo.we...@gmail.com> wrote: > Dear R wizards--- > > I have a wrapper on mclapply() that makes it a little easier for me to > do multiprocessing. (Posting this may make life easier for other > googlers.) I pass a data frame, a vector that tells me what rows > should be recomputed, and the function; and I get back a vector or > matrix of answers. > > d <- data.frame( id=1:6, val=11:16 ) > loc <- c(TRUE,TRUE,FALSE,TRUE,FALSE,TRUE) > v1 <- mc.byselectrows( d, loc, function(x) x[,2]^2 ) > v2 <- mc.byselectrows(d, loc, function(x) cbind(x[,2]^2,x[,2]^3)) > > mc.byselectrows <- function(data.in, recalclist, FUN, ...) { > > data.notdone <- data.in[recalclist,] > cat.stderr("[mc.byselectrows: ", nrow(data.notdone), "rows to be > recomputed out of", nrow(data.in), "]\n") > > FUN.ON.ROWS <- function(.index, ...) > as.matrix(FUN(data.notdone[.index,], ...)) > soln <- mclapply( as.list(1:nrow(data.notdone)) , FUN.ON.ROWS, ... ) > rv <- do.call("rbind", soln) ## omits naming. > if (ncol(rv)==1) rv <- as.vector(rv) > rv > } > > this works fine, except that what I want to get NA's in the return > positions that were not recalculated. then, I can write > > newdata$y <- ifelse ( is.na(olddata$y), mc.byselectrows( olddata, > is.na(olddata$y), fun.calc.y ), olddata$y ) > > I can do this very inelegantly, of course. I can merge recalclist > into data.in and then write a loop that substitutes for the do.call to > rbind. yikes. or I could do the recalclist contingency inside the > FUN.ON.ROWS, but this is costly in terms of execution time. are there > obvious solutions? advice appreciated. > > regards, > > /iaw > ---- > Ivo Welch (ivo.we...@gmail.com) > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.