Henrik,

the example from the post works just fine in CRAN R for me - the post was about 
homebrew build so it's conceivably a bug in their libraries. That's exactly why 
I was proposing a more general solution where you can simply define a function 
in user-space that will issue a warning or stop on fork, it doesn't have to be 
part of core R, there are other packages that use fork() as well, so what I 
proposed is much safer than hacking the parallel package.

Cheers,
Simon
 


> On Jan 10, 2020, at 10:58 AM, Henrik Bengtsson <henrik.bengts...@gmail.com> 
> wrote:
> 
> The RStudio GUI was just one example.  AFAIK, and please correct me if
> I'm wrong, another example is where multi-threaded code is used in
> forked processing and that's sometimes unstable.  Yes another, which
> might be multi-thread related or not, is
> https://stat.ethz.ch/pipermail/r-devel/2018-September/076845.html:
> 
> res <- parallel::mclapply(urls, function(url) {
>  download.file(url, basename(url))
> })
> 
> That was reported to fail on macOS with the default method="libcurl"
> but not for method="curl" or method="wget".
> 
> Further documentation is needed and would help but I don't believe
> it's sufficient to solve everyday problems.  The argument for
> introducing an option/env var to disable forking is to give the end
> user a quick workaround for newly introduced bugs.  Neither the
> develop nor the end user have full control of the R package stack,
> which is always in flux.  For instance, above mclapply() code might
> have been in a package on CRAN and then all of a sudden
> method="libcurl" became the new default in base R.  The above
> mclapply() code is now buggy on macOS, and not necessarily caught by
> CRAN checks.  The package developer might not notice this because they
> are on Linux or Windows.  It can take a very long time before this
> problem is even noticed and even further before it is tracked down and
> fixed.   Similarly, as more and more code turn to native code and it
> becomes easier and easier to implement multi-threading, more and more
> of these bugs across package dependencies risk sneaking in the
> backdoor wherever forked processing is in place.
> 
> For the end user, but also higher-up upstream package developers, the
> quickest workaround would be disable forking.  If you're conservative,
> you could even disable it all of your R processing.  Being able to
> quickly disable forking will also provide a mechanism for quickly
> testing the hypothesis that forking is the underlying problem, i.e.
> "Please retry with options(fork.allowed = FALSE)" will become handy
> for troubleshooting.
> 
> /Henrik
> 
> On Fri, Jan 10, 2020 at 5:31 AM Simon Urbanek
> <simon.urba...@r-project.org> wrote:
>> 
>> If I understand the thread correctly this is an RStudio issue and I would 
>> suggest that the developers consider using pthread_atfork() so RStudio can 
>> handle forking as they deem fit (bail out with an error or make RStudio 
>> work).  Note that in principle the functionality requested here can be 
>> easily implemented in a package so R doesn’t need to be modified.
>> 
>> Cheers,
>> Simon
>> 
>> Sent from my iPhone
>> 
>>>> On Jan 10, 2020, at 04:34, Tomas Kalibera <tomas.kalib...@gmail.com> wrote:
>>>> 
>>>> On 1/10/20 7:33 AM, Henrik Bengtsson wrote:
>>>> I'd like to pick up this thread started on 2019-04-11
>>>> (https://hypatia.math.ethz.ch/pipermail/r-devel/2019-April/077632.html).
>>>> Modulo all the other suggestions in this thread, would my proposal of
>>>> being able to disable forked processing via an option or an
>>>> environment variable make sense?
>>> 
>>> I don't think R should be doing that. There are caveats with using fork, 
>>> and they are mentioned in the documentation of the parallel package, so 
>>> people can easily avoid functions that use it, and this all has been 
>>> discussed here recently.
>>> 
>>> If it is the case, we can expand the documentation in parallel package, add 
>>> a warning against the use of forking with RStudio, but for that I it would 
>>> be good to know at least why it is not working. From the github issue I 
>>> have the impression that it is not really known why, whether it could be 
>>> fixed, and if so, where. The same github issue reflects also that some 
>>> people want to use forking for performance reasons, and even with RStudio, 
>>> at least on Linux. Perhaps it could be fixed? Perhaps it is just some race 
>>> condition somewhere?
>>> 
>>> Tomas
>>> 
>>>> I've prototyped a working patch that
>>>> works like:
>>>>> options(fork.allowed = FALSE)
>>>>> unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid()))
>>>> [1] 14058 14058
>>>>> parallel::mcmapply(1:2, FUN = function(x) Sys.getpid())
>>>> [1] 14058 14058
>>>>> parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10)
>>>> [1] 14058.1 14058.2
>>>>> f <- parallel::mcparallel(Sys.getpid())
>>>> Error in allowFork(assert = TRUE) :
>>>> Forked processing is not allowed per option ‘fork.allowed’ or
>>>> environment variable ‘R_FORK_ALLOWED’
>>>>> cl <- parallel::makeForkCluster(1L)
>>>> Error in allowFork(assert = TRUE) :
>>>> Forked processing is not allowed per option ‘fork.allowed’ or
>>>> environment variable ‘R_FORK_ALLOWED’
>>>> The patch is:
>>>> Index: src/library/parallel/R/unix/forkCluster.R
>>>> ===================================================================
>>>> --- src/library/parallel/R/unix/forkCluster.R (revision 77648)
>>>> +++ src/library/parallel/R/unix/forkCluster.R (working copy)
>>>> @@ -30,6 +30,7 @@
>>>> newForkNode <- function(..., options = defaultClusterOptions, rank)
>>>> {
>>>> +    allowFork(assert = TRUE)
>>>>    options <- addClusterOptions(options, list(...))
>>>>    outfile <- getClusterOption("outfile", options)
>>>>    port <- getClusterOption("port", options)
>>>> Index: src/library/parallel/R/unix/mclapply.R
>>>> ===================================================================
>>>> --- src/library/parallel/R/unix/mclapply.R (revision 77648)
>>>> +++ src/library/parallel/R/unix/mclapply.R (working copy)
>>>> @@ -28,7 +28,7 @@
>>>>        stop("'mc.cores' must be >= 1")
>>>>    .check_ncores(cores)
>>>> -    if (isChild() && !isTRUE(mc.allow.recursive))
>>>> +    if (!allowFork() || (isChild() && !isTRUE(mc.allow.recursive)))
>>>>        return(lapply(X = X, FUN = FUN, ...))
>>>>    ## Follow lapply
>>>> Index: src/library/parallel/R/unix/mcparallel.R
>>>> ===================================================================
>>>> --- src/library/parallel/R/unix/mcparallel.R (revision 77648)
>>>> +++ src/library/parallel/R/unix/mcparallel.R (working copy)
>>>> @@ -20,6 +20,7 @@
>>>> mcparallel <- function(expr, name, mc.set.seed = TRUE, silent =
>>>> FALSE, mc.affinity = NULL, mc.interactive = FALSE, detached = FALSE)
>>>> {
>>>> +    allowFork(assert = TRUE)
>>>>    f <- mcfork(detached)
>>>>    env <- parent.frame()
>>>>    if (isTRUE(mc.set.seed)) mc.advance.stream()
>>>> Index: src/library/parallel/R/unix/pvec.R
>>>> ===================================================================
>>>> --- src/library/parallel/R/unix/pvec.R (revision 77648)
>>>> +++ src/library/parallel/R/unix/pvec.R (working copy)
>>>> @@ -25,7 +25,7 @@
>>>>    cores <- as.integer(mc.cores)
>>>>    if(cores < 1L) stop("'mc.cores' must be >= 1")
>>>> -    if(cores == 1L) return(FUN(v, ...))
>>>> +    if(cores == 1L || !allowFork()) return(FUN(v, ...))
>>>>    .check_ncores(cores)
>>>>    if(mc.set.seed) mc.reset.stream()
>>>> with a new file src/library/parallel/R/unix/allowFork.R:
>>>> allowFork <- function(assert = FALSE) {
>>>>   value <- Sys.getenv("R_FORK_ALLOWED")
>>>>   if (nzchar(value)) {
>>>>       value <- switch(value,
>>>>          "1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE,
>>>>          "0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE,
>>>>           stop(gettextf("invalid environment variable value: %s==%s",
>>>>          "R_FORK_ALLOWED", value)))
>>>> value <- as.logical(value)
>>>>   } else {
>>>>       value <- TRUE
>>>>   }
>>>>   value <- getOption("fork.allowed", value)
>>>>   if (is.na(value)) {
>>>>       stop(gettextf("invalid option value: %s==%s", "fork.allowed", value))
>>>>   }
>>>>   if (assert && !value) {
>>>>     stop(gettextf("Forked processing is not allowed per option %s or
>>>> environment variable %s", sQuote("fork.allowed"),
>>>> sQuote("R_FORK_ALLOWED")))
>>>>   }
>>>>   value
>>>> }
>>>> /Henrik
>>>>> On Mon, Apr 15, 2019 at 3:12 AM Tomas Kalibera <tomas.kalib...@gmail.com> 
>>>>> wrote:
>>>>> On 4/15/19 11:02 AM, Iñaki Ucar wrote:
>>>>>> On Mon, 15 Apr 2019 at 08:44, Tomas Kalibera <tomas.kalib...@gmail.com> 
>>>>>> wrote:
>>>>>>> On 4/13/19 12:05 PM, Iñaki Ucar wrote:
>>>>>>>> On Sat, 13 Apr 2019 at 03:51, Kevin Ushey <kevinus...@gmail.com> wrote:
>>>>>>>>> I think it's worth saying that mclapply() works as documented
>>>>>>>> Mostly, yes. But it says nothing about fork's copy-on-write and memory
>>>>>>>> overcommitment, and that this means that it may work nicely or fail
>>>>>>>> spectacularly depending on whether, e.g., you operate on a long
>>>>>>>> vector.
>>>>>>> R cannot possibly replicate documentation of the underlying operating
>>>>>>> systems. It clearly says that fork() is used and readers who may not
>>>>>>> know what fork() is need to learn it from external sources.
>>>>>>> Copy-on-write is an elementary property of fork().
>>>>>> Just to be precise, copy-on-write is an optimization widely deployed
>>>>>> in most modern *nixes, particularly for the architectures in which R
>>>>>> usually runs. But it is not an elementary property; it is not even
>>>>>> possible without an MMU.
>>>>> Yes, old Unix systems without virtual memory had fork eagerly copying.
>>>>> Not relevant today, and certainly not for systems that run R, but indeed
>>>>> people interested in OS internals can look elsewhere for more precise
>>>>> information.
>>>>> Tomas
>>> 
>>> ______________________________________________
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> ______________________________________________
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to