Henrik, the example from the post works just fine in CRAN R for me - the post was about homebrew build so it's conceivably a bug in their libraries. That's exactly why I was proposing a more general solution where you can simply define a function in user-space that will issue a warning or stop on fork, it doesn't have to be part of core R, there are other packages that use fork() as well, so what I proposed is much safer than hacking the parallel package.
Cheers, Simon > On Jan 10, 2020, at 10:58 AM, Henrik Bengtsson <henrik.bengts...@gmail.com> > wrote: > > The RStudio GUI was just one example. AFAIK, and please correct me if > I'm wrong, another example is where multi-threaded code is used in > forked processing and that's sometimes unstable. Yes another, which > might be multi-thread related or not, is > https://stat.ethz.ch/pipermail/r-devel/2018-September/076845.html: > > res <- parallel::mclapply(urls, function(url) { > download.file(url, basename(url)) > }) > > That was reported to fail on macOS with the default method="libcurl" > but not for method="curl" or method="wget". > > Further documentation is needed and would help but I don't believe > it's sufficient to solve everyday problems. The argument for > introducing an option/env var to disable forking is to give the end > user a quick workaround for newly introduced bugs. Neither the > develop nor the end user have full control of the R package stack, > which is always in flux. For instance, above mclapply() code might > have been in a package on CRAN and then all of a sudden > method="libcurl" became the new default in base R. The above > mclapply() code is now buggy on macOS, and not necessarily caught by > CRAN checks. The package developer might not notice this because they > are on Linux or Windows. It can take a very long time before this > problem is even noticed and even further before it is tracked down and > fixed. Similarly, as more and more code turn to native code and it > becomes easier and easier to implement multi-threading, more and more > of these bugs across package dependencies risk sneaking in the > backdoor wherever forked processing is in place. > > For the end user, but also higher-up upstream package developers, the > quickest workaround would be disable forking. If you're conservative, > you could even disable it all of your R processing. Being able to > quickly disable forking will also provide a mechanism for quickly > testing the hypothesis that forking is the underlying problem, i.e. > "Please retry with options(fork.allowed = FALSE)" will become handy > for troubleshooting. > > /Henrik > > On Fri, Jan 10, 2020 at 5:31 AM Simon Urbanek > <simon.urba...@r-project.org> wrote: >> >> If I understand the thread correctly this is an RStudio issue and I would >> suggest that the developers consider using pthread_atfork() so RStudio can >> handle forking as they deem fit (bail out with an error or make RStudio >> work). Note that in principle the functionality requested here can be >> easily implemented in a package so R doesn’t need to be modified. >> >> Cheers, >> Simon >> >> Sent from my iPhone >> >>>> On Jan 10, 2020, at 04:34, Tomas Kalibera <tomas.kalib...@gmail.com> wrote: >>>> >>>> On 1/10/20 7:33 AM, Henrik Bengtsson wrote: >>>> I'd like to pick up this thread started on 2019-04-11 >>>> (https://hypatia.math.ethz.ch/pipermail/r-devel/2019-April/077632.html). >>>> Modulo all the other suggestions in this thread, would my proposal of >>>> being able to disable forked processing via an option or an >>>> environment variable make sense? >>> >>> I don't think R should be doing that. There are caveats with using fork, >>> and they are mentioned in the documentation of the parallel package, so >>> people can easily avoid functions that use it, and this all has been >>> discussed here recently. >>> >>> If it is the case, we can expand the documentation in parallel package, add >>> a warning against the use of forking with RStudio, but for that I it would >>> be good to know at least why it is not working. From the github issue I >>> have the impression that it is not really known why, whether it could be >>> fixed, and if so, where. The same github issue reflects also that some >>> people want to use forking for performance reasons, and even with RStudio, >>> at least on Linux. Perhaps it could be fixed? Perhaps it is just some race >>> condition somewhere? >>> >>> Tomas >>> >>>> I've prototyped a working patch that >>>> works like: >>>>> options(fork.allowed = FALSE) >>>>> unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid())) >>>> [1] 14058 14058 >>>>> parallel::mcmapply(1:2, FUN = function(x) Sys.getpid()) >>>> [1] 14058 14058 >>>>> parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10) >>>> [1] 14058.1 14058.2 >>>>> f <- parallel::mcparallel(Sys.getpid()) >>>> Error in allowFork(assert = TRUE) : >>>> Forked processing is not allowed per option ‘fork.allowed’ or >>>> environment variable ‘R_FORK_ALLOWED’ >>>>> cl <- parallel::makeForkCluster(1L) >>>> Error in allowFork(assert = TRUE) : >>>> Forked processing is not allowed per option ‘fork.allowed’ or >>>> environment variable ‘R_FORK_ALLOWED’ >>>> The patch is: >>>> Index: src/library/parallel/R/unix/forkCluster.R >>>> =================================================================== >>>> --- src/library/parallel/R/unix/forkCluster.R (revision 77648) >>>> +++ src/library/parallel/R/unix/forkCluster.R (working copy) >>>> @@ -30,6 +30,7 @@ >>>> newForkNode <- function(..., options = defaultClusterOptions, rank) >>>> { >>>> + allowFork(assert = TRUE) >>>> options <- addClusterOptions(options, list(...)) >>>> outfile <- getClusterOption("outfile", options) >>>> port <- getClusterOption("port", options) >>>> Index: src/library/parallel/R/unix/mclapply.R >>>> =================================================================== >>>> --- src/library/parallel/R/unix/mclapply.R (revision 77648) >>>> +++ src/library/parallel/R/unix/mclapply.R (working copy) >>>> @@ -28,7 +28,7 @@ >>>> stop("'mc.cores' must be >= 1") >>>> .check_ncores(cores) >>>> - if (isChild() && !isTRUE(mc.allow.recursive)) >>>> + if (!allowFork() || (isChild() && !isTRUE(mc.allow.recursive))) >>>> return(lapply(X = X, FUN = FUN, ...)) >>>> ## Follow lapply >>>> Index: src/library/parallel/R/unix/mcparallel.R >>>> =================================================================== >>>> --- src/library/parallel/R/unix/mcparallel.R (revision 77648) >>>> +++ src/library/parallel/R/unix/mcparallel.R (working copy) >>>> @@ -20,6 +20,7 @@ >>>> mcparallel <- function(expr, name, mc.set.seed = TRUE, silent = >>>> FALSE, mc.affinity = NULL, mc.interactive = FALSE, detached = FALSE) >>>> { >>>> + allowFork(assert = TRUE) >>>> f <- mcfork(detached) >>>> env <- parent.frame() >>>> if (isTRUE(mc.set.seed)) mc.advance.stream() >>>> Index: src/library/parallel/R/unix/pvec.R >>>> =================================================================== >>>> --- src/library/parallel/R/unix/pvec.R (revision 77648) >>>> +++ src/library/parallel/R/unix/pvec.R (working copy) >>>> @@ -25,7 +25,7 @@ >>>> cores <- as.integer(mc.cores) >>>> if(cores < 1L) stop("'mc.cores' must be >= 1") >>>> - if(cores == 1L) return(FUN(v, ...)) >>>> + if(cores == 1L || !allowFork()) return(FUN(v, ...)) >>>> .check_ncores(cores) >>>> if(mc.set.seed) mc.reset.stream() >>>> with a new file src/library/parallel/R/unix/allowFork.R: >>>> allowFork <- function(assert = FALSE) { >>>> value <- Sys.getenv("R_FORK_ALLOWED") >>>> if (nzchar(value)) { >>>> value <- switch(value, >>>> "1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE, >>>> "0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE, >>>> stop(gettextf("invalid environment variable value: %s==%s", >>>> "R_FORK_ALLOWED", value))) >>>> value <- as.logical(value) >>>> } else { >>>> value <- TRUE >>>> } >>>> value <- getOption("fork.allowed", value) >>>> if (is.na(value)) { >>>> stop(gettextf("invalid option value: %s==%s", "fork.allowed", value)) >>>> } >>>> if (assert && !value) { >>>> stop(gettextf("Forked processing is not allowed per option %s or >>>> environment variable %s", sQuote("fork.allowed"), >>>> sQuote("R_FORK_ALLOWED"))) >>>> } >>>> value >>>> } >>>> /Henrik >>>>> On Mon, Apr 15, 2019 at 3:12 AM Tomas Kalibera <tomas.kalib...@gmail.com> >>>>> wrote: >>>>> On 4/15/19 11:02 AM, Iñaki Ucar wrote: >>>>>> On Mon, 15 Apr 2019 at 08:44, Tomas Kalibera <tomas.kalib...@gmail.com> >>>>>> wrote: >>>>>>> On 4/13/19 12:05 PM, Iñaki Ucar wrote: >>>>>>>> On Sat, 13 Apr 2019 at 03:51, Kevin Ushey <kevinus...@gmail.com> wrote: >>>>>>>>> I think it's worth saying that mclapply() works as documented >>>>>>>> Mostly, yes. But it says nothing about fork's copy-on-write and memory >>>>>>>> overcommitment, and that this means that it may work nicely or fail >>>>>>>> spectacularly depending on whether, e.g., you operate on a long >>>>>>>> vector. >>>>>>> R cannot possibly replicate documentation of the underlying operating >>>>>>> systems. It clearly says that fork() is used and readers who may not >>>>>>> know what fork() is need to learn it from external sources. >>>>>>> Copy-on-write is an elementary property of fork(). >>>>>> Just to be precise, copy-on-write is an optimization widely deployed >>>>>> in most modern *nixes, particularly for the architectures in which R >>>>>> usually runs. But it is not an elementary property; it is not even >>>>>> possible without an MMU. >>>>> Yes, old Unix systems without virtual memory had fork eagerly copying. >>>>> Not relevant today, and certainly not for systems that run R, but indeed >>>>> people interested in OS internals can look elsewhere for more precise >>>>> information. >>>>> Tomas >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel