On Sat, Jan 11, 2020 at 4:14 AM Simon Urbanek
<[email protected]> wrote:
[...]
> > In general, any code that calls the macOS system libraries might
> > crash. (Except for CoreFoundation, which seems to be fine, but AFAIR
> > there is no guarantee for that, either.)
> >
>
> That is not true, either. macOS itself is fork-safe (it is POSIX-certified
> after all), but libraries may or may not.
Right, so CoreFoundation should be fine. Well, as much as it can be.
[...]
> As I said, using mc* functions explicitly says that you are ok with forking,
> so you if you run code that is not fork-safe it is clearly a user error.
Just to clarify, this basically means that you cannot use mc* in
portable code. Even base R functions might fail, see e.g. the previous
crashes with libcurl.
> That's exactly why we have the long "Warning" section in the documentation.
> If you have suggestions for its improvements, please feel free to supply
> patches.
One suggestion would be to export the parallel::isChild(), so that we
can guard against crashes.
Best,
Gabor
> Cheers,
> Simon
>
>
> > You get crashes in the terminal as well, without multithreading. E.g.
> > the keyring package links for the Security library on macOS, so you
> > get:
> >
> > ❯ R --vanilla -q
> >> .libPaths("~/R/3.6")
> >> keyring::key_list()[1:2,]
> > service username
> > 1 CommCenter kEntitlementsUniqueIDCacheKey
> > 2 ids identity-rsa-public-key
> >> parallel::mclapply(1:10, function(i) keyring::key_list()[1:2,])
> >
> > *** caught segfault ***
> > address 0x110, cause 'memory not mapped'
> >
> > *** caught segfault ***
> > address 0x110, cause 'memory not mapped'
> >
> > AFAICT only Apple can do anything about this, and they won't.
> >
> > Gabor
> >
> >> That's exactly why I was proposing a more general solution where you can
> >> simply define a function in user-space that will issue a warning or stop
> >> on fork, it doesn't have to be part of core R, there are other packages
> >> that use fork() as well, so what I proposed is much safer than hacking the
> >> parallel package.
> >>
> >> Cheers,
> >> Simon
> >>
> >>
> >>
> >>> On Jan 10, 2020, at 10:58 AM, Henrik Bengtsson
> >>> <[email protected]> wrote:
> >>>
> >>> The RStudio GUI was just one example. AFAIK, and please correct me if
> >>> I'm wrong, another example is where multi-threaded code is used in
> >>> forked processing and that's sometimes unstable. Yes another, which
> >>> might be multi-thread related or not, is
> >>> https://stat.ethz.ch/pipermail/r-devel/2018-September/076845.html:
> >>>
> >>> res <- parallel::mclapply(urls, function(url) {
> >>> download.file(url, basename(url))
> >>> })
> >>>
> >>> That was reported to fail on macOS with the default method="libcurl"
> >>> but not for method="curl" or method="wget".
> >>>
> >>> Further documentation is needed and would help but I don't believe
> >>> it's sufficient to solve everyday problems. The argument for
> >>> introducing an option/env var to disable forking is to give the end
> >>> user a quick workaround for newly introduced bugs. Neither the
> >>> develop nor the end user have full control of the R package stack,
> >>> which is always in flux. For instance, above mclapply() code might
> >>> have been in a package on CRAN and then all of a sudden
> >>> method="libcurl" became the new default in base R. The above
> >>> mclapply() code is now buggy on macOS, and not necessarily caught by
> >>> CRAN checks. The package developer might not notice this because they
> >>> are on Linux or Windows. It can take a very long time before this
> >>> problem is even noticed and even further before it is tracked down and
> >>> fixed. Similarly, as more and more code turn to native code and it
> >>> becomes easier and easier to implement multi-threading, more and more
> >>> of these bugs across package dependencies risk sneaking in the
> >>> backdoor wherever forked processing is in place.
> >>>
> >>> For the end user, but also higher-up upstream package developers, the
> >>> quickest workaround would be disable forking. If you're conservative,
> >>> you could even disable it all of your R processing. Being able to
> >>> quickly disable forking will also provide a mechanism for quickly
> >>> testing the hypothesis that forking is the underlying problem, i.e.
> >>> "Please retry with options(fork.allowed = FALSE)" will become handy
> >>> for troubleshooting.
> >>>
> >>> /Henrik
> >>>
> >>> On Fri, Jan 10, 2020 at 5:31 AM Simon Urbanek
> >>> <[email protected]> wrote:
> >>>>
> >>>> If I understand the thread correctly this is an RStudio issue and I
> >>>> would suggest that the developers consider using pthread_atfork() so
> >>>> RStudio can handle forking as they deem fit (bail out with an error or
> >>>> make RStudio work). Note that in principle the functionality requested
> >>>> here can be easily implemented in a package so R doesn’t need to be
> >>>> modified.
> >>>>
> >>>> Cheers,
> >>>> Simon
> >>>>
> >>>> Sent from my iPhone
> >>>>
> >>>>>> On Jan 10, 2020, at 04:34, Tomas Kalibera <[email protected]>
> >>>>>> wrote:
> >>>>>>
> >>>>>> On 1/10/20 7:33 AM, Henrik Bengtsson wrote:
> >>>>>> I'd like to pick up this thread started on 2019-04-11
> >>>>>> (https://hypatia.math.ethz.ch/pipermail/r-devel/2019-April/077632.html).
> >>>>>> Modulo all the other suggestions in this thread, would my proposal of
> >>>>>> being able to disable forked processing via an option or an
> >>>>>> environment variable make sense?
> >>>>>
> >>>>> I don't think R should be doing that. There are caveats with using
> >>>>> fork, and they are mentioned in the documentation of the parallel
> >>>>> package, so people can easily avoid functions that use it, and this all
> >>>>> has been discussed here recently.
> >>>>>
> >>>>> If it is the case, we can expand the documentation in parallel package,
> >>>>> add a warning against the use of forking with RStudio, but for that I
> >>>>> it would be good to know at least why it is not working. From the
> >>>>> github issue I have the impression that it is not really known why,
> >>>>> whether it could be fixed, and if so, where. The same github issue
> >>>>> reflects also that some people want to use forking for performance
> >>>>> reasons, and even with RStudio, at least on Linux. Perhaps it could be
> >>>>> fixed? Perhaps it is just some race condition somewhere?
> >>>>>
> >>>>> Tomas
> >>>>>
> >>>>>> I've prototyped a working patch that
> >>>>>> works like:
> >>>>>>> options(fork.allowed = FALSE)
> >>>>>>> unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid()))
> >>>>>> [1] 14058 14058
> >>>>>>> parallel::mcmapply(1:2, FUN = function(x) Sys.getpid())
> >>>>>> [1] 14058 14058
> >>>>>>> parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10)
> >>>>>> [1] 14058.1 14058.2
> >>>>>>> f <- parallel::mcparallel(Sys.getpid())
> >>>>>> Error in allowFork(assert = TRUE) :
> >>>>>> Forked processing is not allowed per option ‘fork.allowed’ or
> >>>>>> environment variable ‘R_FORK_ALLOWED’
> >>>>>>> cl <- parallel::makeForkCluster(1L)
> >>>>>> Error in allowFork(assert = TRUE) :
> >>>>>> Forked processing is not allowed per option ‘fork.allowed’ or
> >>>>>> environment variable ‘R_FORK_ALLOWED’
> >>>>>> The patch is:
> >>>>>> Index: src/library/parallel/R/unix/forkCluster.R
> >>>>>> ===================================================================
> >>>>>> --- src/library/parallel/R/unix/forkCluster.R (revision 77648)
> >>>>>> +++ src/library/parallel/R/unix/forkCluster.R (working copy)
> >>>>>> @@ -30,6 +30,7 @@
> >>>>>> newForkNode <- function(..., options = defaultClusterOptions, rank)
> >>>>>> {
> >>>>>> + allowFork(assert = TRUE)
> >>>>>> options <- addClusterOptions(options, list(...))
> >>>>>> outfile <- getClusterOption("outfile", options)
> >>>>>> port <- getClusterOption("port", options)
> >>>>>> Index: src/library/parallel/R/unix/mclapply.R
> >>>>>> ===================================================================
> >>>>>> --- src/library/parallel/R/unix/mclapply.R (revision 77648)
> >>>>>> +++ src/library/parallel/R/unix/mclapply.R (working copy)
> >>>>>> @@ -28,7 +28,7 @@
> >>>>>> stop("'mc.cores' must be >= 1")
> >>>>>> .check_ncores(cores)
> >>>>>> - if (isChild() && !isTRUE(mc.allow.recursive))
> >>>>>> + if (!allowFork() || (isChild() && !isTRUE(mc.allow.recursive)))
> >>>>>> return(lapply(X = X, FUN = FUN, ...))
> >>>>>> ## Follow lapply
> >>>>>> Index: src/library/parallel/R/unix/mcparallel.R
> >>>>>> ===================================================================
> >>>>>> --- src/library/parallel/R/unix/mcparallel.R (revision 77648)
> >>>>>> +++ src/library/parallel/R/unix/mcparallel.R (working copy)
> >>>>>> @@ -20,6 +20,7 @@
> >>>>>> mcparallel <- function(expr, name, mc.set.seed = TRUE, silent =
> >>>>>> FALSE, mc.affinity = NULL, mc.interactive = FALSE, detached = FALSE)
> >>>>>> {
> >>>>>> + allowFork(assert = TRUE)
> >>>>>> f <- mcfork(detached)
> >>>>>> env <- parent.frame()
> >>>>>> if (isTRUE(mc.set.seed)) mc.advance.stream()
> >>>>>> Index: src/library/parallel/R/unix/pvec.R
> >>>>>> ===================================================================
> >>>>>> --- src/library/parallel/R/unix/pvec.R (revision 77648)
> >>>>>> +++ src/library/parallel/R/unix/pvec.R (working copy)
> >>>>>> @@ -25,7 +25,7 @@
> >>>>>> cores <- as.integer(mc.cores)
> >>>>>> if(cores < 1L) stop("'mc.cores' must be >= 1")
> >>>>>> - if(cores == 1L) return(FUN(v, ...))
> >>>>>> + if(cores == 1L || !allowFork()) return(FUN(v, ...))
> >>>>>> .check_ncores(cores)
> >>>>>> if(mc.set.seed) mc.reset.stream()
> >>>>>> with a new file src/library/parallel/R/unix/allowFork.R:
> >>>>>> allowFork <- function(assert = FALSE) {
> >>>>>> value <- Sys.getenv("R_FORK_ALLOWED")
> >>>>>> if (nzchar(value)) {
> >>>>>> value <- switch(value,
> >>>>>> "1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE,
> >>>>>> "0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE,
> >>>>>> stop(gettextf("invalid environment variable value: %s==%s",
> >>>>>> "R_FORK_ALLOWED", value)))
> >>>>>> value <- as.logical(value)
> >>>>>> } else {
> >>>>>> value <- TRUE
> >>>>>> }
> >>>>>> value <- getOption("fork.allowed", value)
> >>>>>> if (is.na(value)) {
> >>>>>> stop(gettextf("invalid option value: %s==%s", "fork.allowed",
> >>>>>> value))
> >>>>>> }
> >>>>>> if (assert && !value) {
> >>>>>> stop(gettextf("Forked processing is not allowed per option %s or
> >>>>>> environment variable %s", sQuote("fork.allowed"),
> >>>>>> sQuote("R_FORK_ALLOWED")))
> >>>>>> }
> >>>>>> value
> >>>>>> }
> >>>>>> /Henrik
> >>>>>>> On Mon, Apr 15, 2019 at 3:12 AM Tomas Kalibera
> >>>>>>> <[email protected]> wrote:
> >>>>>>> On 4/15/19 11:02 AM, Iñaki Ucar wrote:
> >>>>>>>> On Mon, 15 Apr 2019 at 08:44, Tomas Kalibera
> >>>>>>>> <[email protected]> wrote:
> >>>>>>>>> On 4/13/19 12:05 PM, Iñaki Ucar wrote:
> >>>>>>>>>> On Sat, 13 Apr 2019 at 03:51, Kevin Ushey <[email protected]>
> >>>>>>>>>> wrote:
> >>>>>>>>>>> I think it's worth saying that mclapply() works as documented
> >>>>>>>>>> Mostly, yes. But it says nothing about fork's copy-on-write and
> >>>>>>>>>> memory
> >>>>>>>>>> overcommitment, and that this means that it may work nicely or fail
> >>>>>>>>>> spectacularly depending on whether, e.g., you operate on a long
> >>>>>>>>>> vector.
> >>>>>>>>> R cannot possibly replicate documentation of the underlying
> >>>>>>>>> operating
> >>>>>>>>> systems. It clearly says that fork() is used and readers who may not
> >>>>>>>>> know what fork() is need to learn it from external sources.
> >>>>>>>>> Copy-on-write is an elementary property of fork().
> >>>>>>>> Just to be precise, copy-on-write is an optimization widely deployed
> >>>>>>>> in most modern *nixes, particularly for the architectures in which R
> >>>>>>>> usually runs. But it is not an elementary property; it is not even
> >>>>>>>> possible without an MMU.
> >>>>>>> Yes, old Unix systems without virtual memory had fork eagerly copying.
> >>>>>>> Not relevant today, and certainly not for systems that run R, but
> >>>>>>> indeed
> >>>>>>> people interested in OS internals can look elsewhere for more precise
> >>>>>>> information.
> >>>>>>> Tomas
> >>>>>
> >>>>> ______________________________________________
> >>>>> [email protected] mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>>
> >>>> ______________________________________________
> >>>> [email protected] mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>
> >>
> >> ______________________________________________
> >> [email protected] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > ______________________________________________
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel