Hi Inaki, > "Performant"... in terms of what. If the cost of copying the data > predominates over the computation time, maybe you didn't need > parallelization in the first place.
Performant in terms of speed. There's no copying in that example using `mclapply` and so it is significantly faster than other alternatives. It is a very simple and contrived example, but there are lots of applications that depend on processing of large data and benefit from multithreading. For example, if I read in large sequencing data with `Rsamtools` and want to check sequences for a set of motifs. > I don't see why mclapply could not be rewritten using PSOCK clusters. Because it would be much slower. > To implement copy-on-write, Linux overcommits virtual memory, and this > is what causes scripts to break unexpectedly: everything works fine, > until you change a small unimportant bit and... boom, out of memory. > And in general, running forks in any GUI would cause things everywhere > to break. > I'm not sure how did you setup that, but it does complete. Or do you > mean that you ran out of memory? Then try replacing "x" with, e.g., > "x+1" in your mclapply example and see what happens (hint: save your > work first). Yes, I meant that it ran out of memory on my desktop. I understand the limits, and it is not perfect because of the GUI issue you mention, but I don't see a better alternative in terms of speed. Regards, Travers On Fri, Apr 12, 2019 at 3:45 PM Iñaki Ucar <iu...@fedoraproject.org> wrote: > > On Fri, 12 Apr 2019 at 21:32, Travers Ching <trave...@gmail.com> wrote: > > > > Just throwing my two cents in: > > > > I think removing/deprecating fork would be a bad idea for two reasons: > > > > 1) There are no performant alternatives > > "Performant"... in terms of what. If the cost of copying the data > predominates over the computation time, maybe you didn't need > parallelization in the first place. > > > 2) Removing fork would break existing workflows > > I don't see why mclapply could not be rewritten using PSOCK clusters. > And as a side effect, this would enable those workflows on Windows, > which doesn't support fork. > > > Even if replaced with something using the same interface (e.g., a > > function that automatically detects variables to export as in the > > amazing `future` package), the lack of copy-on-write functionality > > would cause scripts everywhere to break. > > To implement copy-on-write, Linux overcommits virtual memory, and this > is what causes scripts to break unexpectedly: everything works fine, > until you change a small unimportant bit and... boom, out of memory. > And in general, running forks in any GUI would cause things everywhere > to break. > > > A simple example illustrating these two points: > > `x <- 5e8; mclapply(1:24, sum, x, 8)` > > > > Using fork, `mclapply` takes 5 seconds. Using "psock", `clusterApply` > > does not complete. > > I'm not sure how did you setup that, but it does complete. Or do you > mean that you ran out of memory? Then try replacing "x" with, e.g., > "x+1" in your mclapply example and see what happens (hint: save your > work first). > > -- > Iñaki Úcar ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel