Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2020-01-10 Thread Tomas Kalibera

On 1/10/20 7:33 AM, Henrik Bengtsson wrote:

I'd like to pick up this thread started on 2019-04-11
(https://hypatia.math.ethz.ch/pipermail/r-devel/2019-April/077632.html).
Modulo all the other suggestions in this thread, would my proposal of
being able to disable forked processing via an option or an
environment variable make sense?


I don't think R should be doing that. There are caveats with using fork, 
and they are mentioned in the documentation of the parallel package, so 
people can easily avoid functions that use it, and this all has been 
discussed here recently.


If it is the case, we can expand the documentation in parallel package, 
add a warning against the use of forking with RStudio, but for that I it 
would be good to know at least why it is not working. From the github 
issue I have the impression that it is not really known why, whether it 
could be fixed, and if so, where. The same github issue reflects also 
that some people want to use forking for performance reasons, and even 
with RStudio, at least on Linux. Perhaps it could be fixed? Perhaps it 
is just some race condition somewhere?


Tomas


  I've prototyped a working patch that
works like:


options(fork.allowed = FALSE)
unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid()))

[1] 14058 14058

parallel::mcmapply(1:2, FUN = function(x) Sys.getpid())

[1] 14058 14058

parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10)

[1] 14058.1 14058.2

f <- parallel::mcparallel(Sys.getpid())

Error in allowFork(assert = TRUE) :
   Forked processing is not allowed per option ‘fork.allowed’ or
environment variable ‘R_FORK_ALLOWED’

cl <- parallel::makeForkCluster(1L)

Error in allowFork(assert = TRUE) :
   Forked processing is not allowed per option ‘fork.allowed’ or
environment variable ‘R_FORK_ALLOWED’

The patch is:

Index: src/library/parallel/R/unix/forkCluster.R
===
--- src/library/parallel/R/unix/forkCluster.R (revision 77648)
+++ src/library/parallel/R/unix/forkCluster.R (working copy)
@@ -30,6 +30,7 @@

  newForkNode <- function(..., options = defaultClusterOptions, rank)
  {
+allowFork(assert = TRUE)
  options <- addClusterOptions(options, list(...))
  outfile <- getClusterOption("outfile", options)
  port <- getClusterOption("port", options)
Index: src/library/parallel/R/unix/mclapply.R
===
--- src/library/parallel/R/unix/mclapply.R (revision 77648)
+++ src/library/parallel/R/unix/mclapply.R (working copy)
@@ -28,7 +28,7 @@
  stop("'mc.cores' must be >= 1")
  .check_ncores(cores)

-if (isChild() && !isTRUE(mc.allow.recursive))
+if (!allowFork() || (isChild() && !isTRUE(mc.allow.recursive)))
  return(lapply(X = X, FUN = FUN, ...))

  ## Follow lapply
Index: src/library/parallel/R/unix/mcparallel.R
===
--- src/library/parallel/R/unix/mcparallel.R (revision 77648)
+++ src/library/parallel/R/unix/mcparallel.R (working copy)
@@ -20,6 +20,7 @@

  mcparallel <- function(expr, name, mc.set.seed = TRUE, silent =
FALSE, mc.affinity = NULL, mc.interactive = FALSE, detached = FALSE)
  {
+allowFork(assert = TRUE)
  f <- mcfork(detached)
  env <- parent.frame()
  if (isTRUE(mc.set.seed)) mc.advance.stream()
Index: src/library/parallel/R/unix/pvec.R
===
--- src/library/parallel/R/unix/pvec.R (revision 77648)
+++ src/library/parallel/R/unix/pvec.R (working copy)
@@ -25,7 +25,7 @@

  cores <- as.integer(mc.cores)
  if(cores < 1L) stop("'mc.cores' must be >= 1")
-if(cores == 1L) return(FUN(v, ...))
+if(cores == 1L || !allowFork()) return(FUN(v, ...))
  .check_ncores(cores)

  if(mc.set.seed) mc.reset.stream()

with a new file src/library/parallel/R/unix/allowFork.R:

allowFork <- function(assert = FALSE) {
 value <- Sys.getenv("R_FORK_ALLOWED")
 if (nzchar(value)) {
 value <- switch(value,
"1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE,
"0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE,
 stop(gettextf("invalid environment variable value: %s==%s",
"R_FORK_ALLOWED", value)))
value <- as.logical(value)
 } else {
 value <- TRUE
 }
 value <- getOption("fork.allowed", value)
 if (is.na(value)) {
 stop(gettextf("invalid option value: %s==%s", "fork.allowed", value))
 }
 if (assert && !value) {
   stop(gettextf("Forked processing is not allowed per option %s or
environment variable %s", sQuote("fork.allowed"),
sQuote("R_FORK_ALLOWED")))
 }
 value
}

/Henrik

On Mon, Apr 15, 2019 at 3:12 AM Tomas Kalibera  wrote:

On 4/15/19 11:02 AM, Iñaki Ucar wrote:

On Mon, 15 Apr 2019 at 08:44, Tomas Kalibera  wrote:

On 4/13/19 12:05 PM, Iñaki Ucar wrote:

On Sat, 13 Apr 2019 at 03:51, Kevin U

[Rd] Operator precedence of =, <- and ?

2020-01-10 Thread Konrad Rudolph
The documentation (help("Syntax")) gives the operator precedence of the
assignment operators and help, from highest to lowest, as:

   ‘<- <<-’   assignment (right to left)
   ‘=’assignment (right to left)
   ‘?’help (unary and binary)

If I understand correctly this implies that `a = b ? c` and `a <- b ? c`
should parse identically. Or, if using the unary version, `?a = b` and `?a
<- b` should parse identically.

However, as noted by Antoine Fabri on Stack Overflow [1], they have
different parses (on R 3.5.3 and 3.6.1, at least), which puts the
precedence of `?` *between* that of `<-` and `=`. In fact, src/main/gram.y
[2] appears to show the same precedence table as the documentation;
presumably the parser at some point rewrites the parse tree manually.

At any rate, should this be fixed in the documentation? Or is the
documentation “correct”, and there’s a bug in the parser (in some versions
of R)?

[1] <
https://stackoverflow.com/questions/1741820/51564252#comment105506343_51564252
>
[2] <
https://github.com/wch/r-source/blob/386c3a93cbcaf95017fa6ae52453530fb95149f4/src/main/gram.y#L384-L390
>

--
Konrad Rudolph

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Get memory address of an R data frame

2020-01-10 Thread Dirk Eddelbuettel


lille stor,

As a general rule, please do not cross-post.

And almost certainly do not simultaneously as you did here and on
https://stackoverflow.com/questions/59663174/get-memory-address-of-an-r-data-frame

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2020-01-10 Thread Simon Urbanek
If I understand the thread correctly this is an RStudio issue and I would 
suggest that the developers consider using pthread_atfork() so RStudio can 
handle forking as they deem fit (bail out with an error or make RStudio work).  
Note that in principle the functionality requested here can be easily 
implemented in a package so R doesn’t need to be modified.

Cheers,
Simon

Sent from my iPhone

>> On Jan 10, 2020, at 04:34, Tomas Kalibera  wrote:
>> 
>> On 1/10/20 7:33 AM, Henrik Bengtsson wrote:
>> I'd like to pick up this thread started on 2019-04-11
>> (https://hypatia.math.ethz.ch/pipermail/r-devel/2019-April/077632.html).
>> Modulo all the other suggestions in this thread, would my proposal of
>> being able to disable forked processing via an option or an
>> environment variable make sense?
> 
> I don't think R should be doing that. There are caveats with using fork, and 
> they are mentioned in the documentation of the parallel package, so people 
> can easily avoid functions that use it, and this all has been discussed here 
> recently.
> 
> If it is the case, we can expand the documentation in parallel package, add a 
> warning against the use of forking with RStudio, but for that I it would be 
> good to know at least why it is not working. From the github issue I have the 
> impression that it is not really known why, whether it could be fixed, and if 
> so, where. The same github issue reflects also that some people want to use 
> forking for performance reasons, and even with RStudio, at least on Linux. 
> Perhaps it could be fixed? Perhaps it is just some race condition somewhere?
> 
> Tomas
> 
>> I've prototyped a working patch that
>> works like:
>>> options(fork.allowed = FALSE)
>>> unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid()))
>> [1] 14058 14058
>>> parallel::mcmapply(1:2, FUN = function(x) Sys.getpid())
>> [1] 14058 14058
>>> parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10)
>> [1] 14058.1 14058.2
>>> f <- parallel::mcparallel(Sys.getpid())
>> Error in allowFork(assert = TRUE) :
>>  Forked processing is not allowed per option ‘fork.allowed’ or
>> environment variable ‘R_FORK_ALLOWED’
>>> cl <- parallel::makeForkCluster(1L)
>> Error in allowFork(assert = TRUE) :
>>  Forked processing is not allowed per option ‘fork.allowed’ or
>> environment variable ‘R_FORK_ALLOWED’
>> The patch is:
>> Index: src/library/parallel/R/unix/forkCluster.R
>> ===
>> --- src/library/parallel/R/unix/forkCluster.R (revision 77648)
>> +++ src/library/parallel/R/unix/forkCluster.R (working copy)
>> @@ -30,6 +30,7 @@
>> newForkNode <- function(..., options = defaultClusterOptions, rank)
>> {
>> +allowFork(assert = TRUE)
>> options <- addClusterOptions(options, list(...))
>> outfile <- getClusterOption("outfile", options)
>> port <- getClusterOption("port", options)
>> Index: src/library/parallel/R/unix/mclapply.R
>> ===
>> --- src/library/parallel/R/unix/mclapply.R (revision 77648)
>> +++ src/library/parallel/R/unix/mclapply.R (working copy)
>> @@ -28,7 +28,7 @@
>> stop("'mc.cores' must be >= 1")
>> .check_ncores(cores)
>> -if (isChild() && !isTRUE(mc.allow.recursive))
>> +if (!allowFork() || (isChild() && !isTRUE(mc.allow.recursive)))
>> return(lapply(X = X, FUN = FUN, ...))
>> ## Follow lapply
>> Index: src/library/parallel/R/unix/mcparallel.R
>> ===
>> --- src/library/parallel/R/unix/mcparallel.R (revision 77648)
>> +++ src/library/parallel/R/unix/mcparallel.R (working copy)
>> @@ -20,6 +20,7 @@
>> mcparallel <- function(expr, name, mc.set.seed = TRUE, silent =
>> FALSE, mc.affinity = NULL, mc.interactive = FALSE, detached = FALSE)
>> {
>> +allowFork(assert = TRUE)
>> f <- mcfork(detached)
>> env <- parent.frame()
>> if (isTRUE(mc.set.seed)) mc.advance.stream()
>> Index: src/library/parallel/R/unix/pvec.R
>> ===
>> --- src/library/parallel/R/unix/pvec.R (revision 77648)
>> +++ src/library/parallel/R/unix/pvec.R (working copy)
>> @@ -25,7 +25,7 @@
>> cores <- as.integer(mc.cores)
>> if(cores < 1L) stop("'mc.cores' must be >= 1")
>> -if(cores == 1L) return(FUN(v, ...))
>> +if(cores == 1L || !allowFork()) return(FUN(v, ...))
>> .check_ncores(cores)
>> if(mc.set.seed) mc.reset.stream()
>> with a new file src/library/parallel/R/unix/allowFork.R:
>> allowFork <- function(assert = FALSE) {
>>value <- Sys.getenv("R_FORK_ALLOWED")
>>if (nzchar(value)) {
>>value <- switch(value,
>>   "1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE,
>>   "0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE,
>>stop(gettextf("invalid environment variable value: %s==%s",
>>   "R_FORK_ALLOWED", value)))
>> value <- as.logi

Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2020-01-10 Thread Henrik Bengtsson
The RStudio GUI was just one example.  AFAIK, and please correct me if
I'm wrong, another example is where multi-threaded code is used in
forked processing and that's sometimes unstable.  Yes another, which
might be multi-thread related or not, is
https://stat.ethz.ch/pipermail/r-devel/2018-September/076845.html:

res <- parallel::mclapply(urls, function(url) {
  download.file(url, basename(url))
})

That was reported to fail on macOS with the default method="libcurl"
but not for method="curl" or method="wget".

Further documentation is needed and would help but I don't believe
it's sufficient to solve everyday problems.  The argument for
introducing an option/env var to disable forking is to give the end
user a quick workaround for newly introduced bugs.  Neither the
develop nor the end user have full control of the R package stack,
which is always in flux.  For instance, above mclapply() code might
have been in a package on CRAN and then all of a sudden
method="libcurl" became the new default in base R.  The above
mclapply() code is now buggy on macOS, and not necessarily caught by
CRAN checks.  The package developer might not notice this because they
are on Linux or Windows.  It can take a very long time before this
problem is even noticed and even further before it is tracked down and
fixed.   Similarly, as more and more code turn to native code and it
becomes easier and easier to implement multi-threading, more and more
of these bugs across package dependencies risk sneaking in the
backdoor wherever forked processing is in place.

For the end user, but also higher-up upstream package developers, the
quickest workaround would be disable forking.  If you're conservative,
you could even disable it all of your R processing.  Being able to
quickly disable forking will also provide a mechanism for quickly
testing the hypothesis that forking is the underlying problem, i.e.
"Please retry with options(fork.allowed = FALSE)" will become handy
for troubleshooting.

/Henrik

On Fri, Jan 10, 2020 at 5:31 AM Simon Urbanek
 wrote:
>
> If I understand the thread correctly this is an RStudio issue and I would 
> suggest that the developers consider using pthread_atfork() so RStudio can 
> handle forking as they deem fit (bail out with an error or make RStudio 
> work).  Note that in principle the functionality requested here can be easily 
> implemented in a package so R doesn’t need to be modified.
>
> Cheers,
> Simon
>
> Sent from my iPhone
>
> >> On Jan 10, 2020, at 04:34, Tomas Kalibera  wrote:
> >>
> >> On 1/10/20 7:33 AM, Henrik Bengtsson wrote:
> >> I'd like to pick up this thread started on 2019-04-11
> >> (https://hypatia.math.ethz.ch/pipermail/r-devel/2019-April/077632.html).
> >> Modulo all the other suggestions in this thread, would my proposal of
> >> being able to disable forked processing via an option or an
> >> environment variable make sense?
> >
> > I don't think R should be doing that. There are caveats with using fork, 
> > and they are mentioned in the documentation of the parallel package, so 
> > people can easily avoid functions that use it, and this all has been 
> > discussed here recently.
> >
> > If it is the case, we can expand the documentation in parallel package, add 
> > a warning against the use of forking with RStudio, but for that I it would 
> > be good to know at least why it is not working. From the github issue I 
> > have the impression that it is not really known why, whether it could be 
> > fixed, and if so, where. The same github issue reflects also that some 
> > people want to use forking for performance reasons, and even with RStudio, 
> > at least on Linux. Perhaps it could be fixed? Perhaps it is just some race 
> > condition somewhere?
> >
> > Tomas
> >
> >> I've prototyped a working patch that
> >> works like:
> >>> options(fork.allowed = FALSE)
> >>> unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid()))
> >> [1] 14058 14058
> >>> parallel::mcmapply(1:2, FUN = function(x) Sys.getpid())
> >> [1] 14058 14058
> >>> parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10)
> >> [1] 14058.1 14058.2
> >>> f <- parallel::mcparallel(Sys.getpid())
> >> Error in allowFork(assert = TRUE) :
> >>  Forked processing is not allowed per option ‘fork.allowed’ or
> >> environment variable ‘R_FORK_ALLOWED’
> >>> cl <- parallel::makeForkCluster(1L)
> >> Error in allowFork(assert = TRUE) :
> >>  Forked processing is not allowed per option ‘fork.allowed’ or
> >> environment variable ‘R_FORK_ALLOWED’
> >> The patch is:
> >> Index: src/library/parallel/R/unix/forkCluster.R
> >> ===
> >> --- src/library/parallel/R/unix/forkCluster.R (revision 77648)
> >> +++ src/library/parallel/R/unix/forkCluster.R (working copy)
> >> @@ -30,6 +30,7 @@
> >> newForkNode <- function(..., options = defaultClusterOptions, rank)
> >> {
> >> +allowFork(assert = TRUE)
> >> options <- addClusterOptions(options, list(...))
> >> 

Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2020-01-10 Thread Simon Urbanek
Henrik,

the example from the post works just fine in CRAN R for me - the post was about 
homebrew build so it's conceivably a bug in their libraries. That's exactly why 
I was proposing a more general solution where you can simply define a function 
in user-space that will issue a warning or stop on fork, it doesn't have to be 
part of core R, there are other packages that use fork() as well, so what I 
proposed is much safer than hacking the parallel package.

Cheers,
Simon
 


> On Jan 10, 2020, at 10:58 AM, Henrik Bengtsson  
> wrote:
> 
> The RStudio GUI was just one example.  AFAIK, and please correct me if
> I'm wrong, another example is where multi-threaded code is used in
> forked processing and that's sometimes unstable.  Yes another, which
> might be multi-thread related or not, is
> https://stat.ethz.ch/pipermail/r-devel/2018-September/076845.html:
> 
> res <- parallel::mclapply(urls, function(url) {
>  download.file(url, basename(url))
> })
> 
> That was reported to fail on macOS with the default method="libcurl"
> but not for method="curl" or method="wget".
> 
> Further documentation is needed and would help but I don't believe
> it's sufficient to solve everyday problems.  The argument for
> introducing an option/env var to disable forking is to give the end
> user a quick workaround for newly introduced bugs.  Neither the
> develop nor the end user have full control of the R package stack,
> which is always in flux.  For instance, above mclapply() code might
> have been in a package on CRAN and then all of a sudden
> method="libcurl" became the new default in base R.  The above
> mclapply() code is now buggy on macOS, and not necessarily caught by
> CRAN checks.  The package developer might not notice this because they
> are on Linux or Windows.  It can take a very long time before this
> problem is even noticed and even further before it is tracked down and
> fixed.   Similarly, as more and more code turn to native code and it
> becomes easier and easier to implement multi-threading, more and more
> of these bugs across package dependencies risk sneaking in the
> backdoor wherever forked processing is in place.
> 
> For the end user, but also higher-up upstream package developers, the
> quickest workaround would be disable forking.  If you're conservative,
> you could even disable it all of your R processing.  Being able to
> quickly disable forking will also provide a mechanism for quickly
> testing the hypothesis that forking is the underlying problem, i.e.
> "Please retry with options(fork.allowed = FALSE)" will become handy
> for troubleshooting.
> 
> /Henrik
> 
> On Fri, Jan 10, 2020 at 5:31 AM Simon Urbanek
>  wrote:
>> 
>> If I understand the thread correctly this is an RStudio issue and I would 
>> suggest that the developers consider using pthread_atfork() so RStudio can 
>> handle forking as they deem fit (bail out with an error or make RStudio 
>> work).  Note that in principle the functionality requested here can be 
>> easily implemented in a package so R doesn’t need to be modified.
>> 
>> Cheers,
>> Simon
>> 
>> Sent from my iPhone
>> 
 On Jan 10, 2020, at 04:34, Tomas Kalibera  wrote:
 
 On 1/10/20 7:33 AM, Henrik Bengtsson wrote:
 I'd like to pick up this thread started on 2019-04-11
 (https://hypatia.math.ethz.ch/pipermail/r-devel/2019-April/077632.html).
 Modulo all the other suggestions in this thread, would my proposal of
 being able to disable forked processing via an option or an
 environment variable make sense?
>>> 
>>> I don't think R should be doing that. There are caveats with using fork, 
>>> and they are mentioned in the documentation of the parallel package, so 
>>> people can easily avoid functions that use it, and this all has been 
>>> discussed here recently.
>>> 
>>> If it is the case, we can expand the documentation in parallel package, add 
>>> a warning against the use of forking with RStudio, but for that I it would 
>>> be good to know at least why it is not working. From the github issue I 
>>> have the impression that it is not really known why, whether it could be 
>>> fixed, and if so, where. The same github issue reflects also that some 
>>> people want to use forking for performance reasons, and even with RStudio, 
>>> at least on Linux. Perhaps it could be fixed? Perhaps it is just some race 
>>> condition somewhere?
>>> 
>>> Tomas
>>> 
 I've prototyped a working patch that
 works like:
> options(fork.allowed = FALSE)
> unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid()))
 [1] 14058 14058
> parallel::mcmapply(1:2, FUN = function(x) Sys.getpid())
 [1] 14058 14058
> parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10)
 [1] 14058.1 14058.2
> f <- parallel::mcparallel(Sys.getpid())
 Error in allowFork(assert = TRUE) :
 Forked processing is not allowed per option ‘fork.allowed’ or
 environment variable ‘R_FORK_ALLOWED’
> cl <- parallel::makeFo

Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2020-01-10 Thread Gábor Csárdi
On Fri, Jan 10, 2020 at 7:23 PM Simon Urbanek
 wrote:
>
> Henrik,
>
> the example from the post works just fine in CRAN R for me - the post was 
> about homebrew build so it's conceivably a bug in their libraries.

I think it works now, because Apple switched to a different SSL
library for libcurl. It usually crashes or fails on older macOS
versions, with the CRAN build of R as well.

It is not a bug in any library, it is just that macOS does not support
fork() without an immediate exec().

In general, any code that calls the macOS system libraries might
crash. (Except for CoreFoundation, which seems to be fine, but AFAIR
there is no guarantee for that, either.)

You get crashes in the terminal as well, without multithreading. E.g.
the keyring package links for the Security library on macOS, so you
get:

❯ R --vanilla -q
> .libPaths("~/R/3.6")
> keyring::key_list()[1:2,]
service  username
1CommCenter kEntitlementsUniqueIDCacheKey
2   ids   identity-rsa-public-key
> parallel::mclapply(1:10, function(i) keyring::key_list()[1:2,])

 *** caught segfault ***
address 0x110, cause 'memory not mapped'

 *** caught segfault ***
address 0x110, cause 'memory not mapped'

AFAICT only Apple can do anything about this, and they won't.

Gabor

> That's exactly why I was proposing a more general solution where you can 
> simply define a function in user-space that will issue a warning or stop on 
> fork, it doesn't have to be part of core R, there are other packages that use 
> fork() as well, so what I proposed is much safer than hacking the parallel 
> package.
>
> Cheers,
> Simon
>
>
>
> > On Jan 10, 2020, at 10:58 AM, Henrik Bengtsson  
> > wrote:
> >
> > The RStudio GUI was just one example.  AFAIK, and please correct me if
> > I'm wrong, another example is where multi-threaded code is used in
> > forked processing and that's sometimes unstable.  Yes another, which
> > might be multi-thread related or not, is
> > https://stat.ethz.ch/pipermail/r-devel/2018-September/076845.html:
> >
> > res <- parallel::mclapply(urls, function(url) {
> >  download.file(url, basename(url))
> > })
> >
> > That was reported to fail on macOS with the default method="libcurl"
> > but not for method="curl" or method="wget".
> >
> > Further documentation is needed and would help but I don't believe
> > it's sufficient to solve everyday problems.  The argument for
> > introducing an option/env var to disable forking is to give the end
> > user a quick workaround for newly introduced bugs.  Neither the
> > develop nor the end user have full control of the R package stack,
> > which is always in flux.  For instance, above mclapply() code might
> > have been in a package on CRAN and then all of a sudden
> > method="libcurl" became the new default in base R.  The above
> > mclapply() code is now buggy on macOS, and not necessarily caught by
> > CRAN checks.  The package developer might not notice this because they
> > are on Linux or Windows.  It can take a very long time before this
> > problem is even noticed and even further before it is tracked down and
> > fixed.   Similarly, as more and more code turn to native code and it
> > becomes easier and easier to implement multi-threading, more and more
> > of these bugs across package dependencies risk sneaking in the
> > backdoor wherever forked processing is in place.
> >
> > For the end user, but also higher-up upstream package developers, the
> > quickest workaround would be disable forking.  If you're conservative,
> > you could even disable it all of your R processing.  Being able to
> > quickly disable forking will also provide a mechanism for quickly
> > testing the hypothesis that forking is the underlying problem, i.e.
> > "Please retry with options(fork.allowed = FALSE)" will become handy
> > for troubleshooting.
> >
> > /Henrik
> >
> > On Fri, Jan 10, 2020 at 5:31 AM Simon Urbanek
> >  wrote:
> >>
> >> If I understand the thread correctly this is an RStudio issue and I would 
> >> suggest that the developers consider using pthread_atfork() so RStudio can 
> >> handle forking as they deem fit (bail out with an error or make RStudio 
> >> work).  Note that in principle the functionality requested here can be 
> >> easily implemented in a package so R doesn’t need to be modified.
> >>
> >> Cheers,
> >> Simon
> >>
> >> Sent from my iPhone
> >>
>  On Jan 10, 2020, at 04:34, Tomas Kalibera  
>  wrote:
> 
>  On 1/10/20 7:33 AM, Henrik Bengtsson wrote:
>  I'd like to pick up this thread started on 2019-04-11
>  (https://hypatia.math.ethz.ch/pipermail/r-devel/2019-April/077632.html).
>  Modulo all the other suggestions in this thread, would my proposal of
>  being able to disable forked processing via an option or an
>  environment variable make sense?
> >>>
> >>> I don't think R should be doing that. There are caveats with using f

Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2020-01-10 Thread Henrik Bengtsson
On Fri, Jan 10, 2020 at 11:23 AM Simon Urbanek
 wrote:
>
> Henrik,
>
> the example from the post works just fine in CRAN R for me - the post was 
> about homebrew build so it's conceivably a bug in their libraries.

Thanks for ruling that example out.

> That's exactly why I was proposing a more general solution where you can 
> simply define a function in user-space that will issue a warning or stop on 
> fork, it doesn't have to be part of core R, there are other packages that use 
> fork() as well, so what I proposed is much safer than hacking the parallel 
> package.

I think this is worth pursuing and will help improve and stabilize
things.  But issuing a warning or stop on fork will not allow end
users from running the pipeline, or am I missing something?

I'm trying to argue that this is still a real problem that users and
developers run into on a regular basis.  Since parallel::mclapply() is
such a common and readily available solution it is also a low hanging
fruit to make it possible to have those forking functions fall back to
sequential processing.  The only(*) way to achieve this fall back
right now is to run the same pipeline on MS Windows - I just think it
would be very useful to have the same fallback option available on
Unix and macOS.  Having this in base R could also serve as standard
for other parallel/forking packages/implementations who also wish to
have a fallback to sequential processing.

==> What would the disadvantages be to provide a mechanism/setting for
disabling forking in the parallel::mc*** API? <==

(*) One can also somewhat disable forking in 'parallel' by using
'cgroups' limiting the process to a single core (see also
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17641).  That will
handle code that uses mc.cores = parallel::detectCores(), which there
is a lot of.  I guess it will cause run-time error (on 'mc.cores' must
be >= 1) for code that uses the second most common used mc.cores =
parallel::detectCores() - 1, which is unfortunately also very common.
I find the use of hardcoded detectCores() unfortunate but that is a
slightly different topic.  OTH, if there would a standardized option
in R for disabling all types of parallel processing by forcing a
single core, one could imagine other parallel APIs to implement
fallbacks to sequential processing as well. (I'm aware that not all
use cases of async processing is about parallelization, so it might
not apply everywhere).

Cheers,

Henrik

>
> Cheers,
> Simon
>
>
>
> > On Jan 10, 2020, at 10:58 AM, Henrik Bengtsson  
> > wrote:
> >
> > The RStudio GUI was just one example.  AFAIK, and please correct me if
> > I'm wrong, another example is where multi-threaded code is used in
> > forked processing and that's sometimes unstable.  Yes another, which
> > might be multi-thread related or not, is
> > https://stat.ethz.ch/pipermail/r-devel/2018-September/076845.html:
> >
> > res <- parallel::mclapply(urls, function(url) {
> >  download.file(url, basename(url))
> > })
> >
> > That was reported to fail on macOS with the default method="libcurl"
> > but not for method="curl" or method="wget".
> >
> > Further documentation is needed and would help but I don't believe
> > it's sufficient to solve everyday problems.  The argument for
> > introducing an option/env var to disable forking is to give the end
> > user a quick workaround for newly introduced bugs.  Neither the
> > develop nor the end user have full control of the R package stack,
> > which is always in flux.  For instance, above mclapply() code might
> > have been in a package on CRAN and then all of a sudden
> > method="libcurl" became the new default in base R.  The above
> > mclapply() code is now buggy on macOS, and not necessarily caught by
> > CRAN checks.  The package developer might not notice this because they
> > are on Linux or Windows.  It can take a very long time before this
> > problem is even noticed and even further before it is tracked down and
> > fixed.   Similarly, as more and more code turn to native code and it
> > becomes easier and easier to implement multi-threading, more and more
> > of these bugs across package dependencies risk sneaking in the
> > backdoor wherever forked processing is in place.
> >
> > For the end user, but also higher-up upstream package developers, the
> > quickest workaround would be disable forking.  If you're conservative,
> > you could even disable it all of your R processing.  Being able to
> > quickly disable forking will also provide a mechanism for quickly
> > testing the hypothesis that forking is the underlying problem, i.e.
> > "Please retry with options(fork.allowed = FALSE)" will become handy
> > for troubleshooting.
> >
> > /Henrik
> >
> > On Fri, Jan 10, 2020 at 5:31 AM Simon Urbanek
> >  wrote:
> >>
> >> If I understand the thread correctly this is an RStudio issue and I would 
> >> suggest that the developers consider using pthread_atfork() so RStudio can 
> >> handle forking as they deem fit (bail out 

Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2020-01-10 Thread Gábor Csárdi
On Fri, Jan 10, 2020 at 7:23 PM Simon Urbanek
 wrote:
>
> Henrik,
>
> the example from the post works just fine in CRAN R for me - the post was 
> about homebrew build so it's conceivably a bug in their libraries. That's 
> exactly why I was proposing a more general solution where you can simply 
> define a function in user-space that will issue a warning or stop on fork, it 
> doesn't have to be part of core R, there are other packages that use fork() 
> as well, so what I proposed is much safer than hacking the parallel package.

Btw. what other packages do fork() without an exec() call()?

Gbaor
[...]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2020-01-10 Thread Simon Urbanek
Henrik,

the whole point and only purpose of mc* functions is to fork. That's what the 
multicore package was about, so if you don't want to fork, don't use mc* 
functions - they don't have any other purpose. I really fail to see the point - 
if you use mc* functions you're very explicitly asking for forking - so your 
argument is like saying that print() should have an option to not print 
anything - it just makes no sense. If you have code that is fork-incompatilble, 
you clearly cannot use it in mcparallel - that's why there is a very explicit 
warning in the documentation. As I said, if you have some software that embeds 
R and has issue with forks, then that software should be use pthread_atfork() 
to control the behavior.

Cheers,
Simon



> On Jan 10, 2020, at 3:34 PM, Henrik Bengtsson  
> wrote:
> 
> On Fri, Jan 10, 2020 at 11:23 AM Simon Urbanek
>  wrote:
>> 
>> Henrik,
>> 
>> the example from the post works just fine in CRAN R for me - the post was 
>> about homebrew build so it's conceivably a bug in their libraries.
> 
> Thanks for ruling that example out.
> 
>> That's exactly why I was proposing a more general solution where you can 
>> simply define a function in user-space that will issue a warning or stop on 
>> fork, it doesn't have to be part of core R, there are other packages that 
>> use fork() as well, so what I proposed is much safer than hacking the 
>> parallel package.
> 
> I think this is worth pursuing and will help improve and stabilize
> things.  But issuing a warning or stop on fork will not allow end
> users from running the pipeline, or am I missing something?
> 
> I'm trying to argue that this is still a real problem that users and
> developers run into on a regular basis.  Since parallel::mclapply() is
> such a common and readily available solution it is also a low hanging
> fruit to make it possible to have those forking functions fall back to
> sequential processing.  The only(*) way to achieve this fall back
> right now is to run the same pipeline on MS Windows - I just think it
> would be very useful to have the same fallback option available on
> Unix and macOS.  Having this in base R could also serve as standard
> for other parallel/forking packages/implementations who also wish to
> have a fallback to sequential processing.
> 
> ==> What would the disadvantages be to provide a mechanism/setting for
> disabling forking in the parallel::mc*** API? <==
> 
> (*) One can also somewhat disable forking in 'parallel' by using
> 'cgroups' limiting the process to a single core (see also
> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17641).  That will
> handle code that uses mc.cores = parallel::detectCores(), which there
> is a lot of.  I guess it will cause run-time error (on 'mc.cores' must
> be >= 1) for code that uses the second most common used mc.cores =
> parallel::detectCores() - 1, which is unfortunately also very common.
> I find the use of hardcoded detectCores() unfortunate but that is a
> slightly different topic.  OTH, if there would a standardized option
> in R for disabling all types of parallel processing by forcing a
> single core, one could imagine other parallel APIs to implement
> fallbacks to sequential processing as well. (I'm aware that not all
> use cases of async processing is about parallelization, so it might
> not apply everywhere).
> 
> Cheers,
> 
> Henrik
> 
>> 
>> Cheers,
>> Simon
>> 
>> 
>> 
>>> On Jan 10, 2020, at 10:58 AM, Henrik Bengtsson  
>>> wrote:
>>> 
>>> The RStudio GUI was just one example.  AFAIK, and please correct me if
>>> I'm wrong, another example is where multi-threaded code is used in
>>> forked processing and that's sometimes unstable.  Yes another, which
>>> might be multi-thread related or not, is
>>> https://stat.ethz.ch/pipermail/r-devel/2018-September/076845.html:
>>> 
>>> res <- parallel::mclapply(urls, function(url) {
>>> download.file(url, basename(url))
>>> })
>>> 
>>> That was reported to fail on macOS with the default method="libcurl"
>>> but not for method="curl" or method="wget".
>>> 
>>> Further documentation is needed and would help but I don't believe
>>> it's sufficient to solve everyday problems.  The argument for
>>> introducing an option/env var to disable forking is to give the end
>>> user a quick workaround for newly introduced bugs.  Neither the
>>> develop nor the end user have full control of the R package stack,
>>> which is always in flux.  For instance, above mclapply() code might
>>> have been in a package on CRAN and then all of a sudden
>>> method="libcurl" became the new default in base R.  The above
>>> mclapply() code is now buggy on macOS, and not necessarily caught by
>>> CRAN checks.  The package developer might not notice this because they
>>> are on Linux or Windows.  It can take a very long time before this
>>> problem is even noticed and even further before it is tracked down and
>>> fixed.   Similarly, as more and more code turn to native code and it
>>> becomes easier and 

Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2020-01-10 Thread Simon Urbanek



> On Jan 10, 2020, at 3:10 PM, Gábor Csárdi  wrote:
> 
> On Fri, Jan 10, 2020 at 7:23 PM Simon Urbanek
>  wrote:
>> 
>> Henrik,
>> 
>> the example from the post works just fine in CRAN R for me - the post was 
>> about homebrew build so it's conceivably a bug in their libraries.
> 
> I think it works now, because Apple switched to a different SSL
> library for libcurl. It usually crashes or fails on older macOS
> versions, with the CRAN build of R as well.
> 

That is not true - Apple has not changed the SSL back-end for many years. The 
issue in that post is presumably in the homebrew version of SSL.

Cheers,
Simon


> It is not a bug in any library, it is just that macOS does not support
> fork() without an immediate exec().
> 
> In general, any code that calls the macOS system libraries might
> crash. (Except for CoreFoundation, which seems to be fine, but AFAIR
> there is no guarantee for that, either.)
> 
> You get crashes in the terminal as well, without multithreading. E.g.
> the keyring package links for the Security library on macOS, so you
> get:
> 
> ❯ R --vanilla -q
>> .libPaths("~/R/3.6")
>> keyring::key_list()[1:2,]
>service  username
> 1CommCenter kEntitlementsUniqueIDCacheKey
> 2   ids   identity-rsa-public-key
>> parallel::mclapply(1:10, function(i) keyring::key_list()[1:2,])
> 
> *** caught segfault ***
> address 0x110, cause 'memory not mapped'
> 
> *** caught segfault ***
> address 0x110, cause 'memory not mapped'
> 
> AFAICT only Apple can do anything about this, and they won't.
> 
> Gabor
> 
>> That's exactly why I was proposing a more general solution where you can 
>> simply define a function in user-space that will issue a warning or stop on 
>> fork, it doesn't have to be part of core R, there are other packages that 
>> use fork() as well, so what I proposed is much safer than hacking the 
>> parallel package.
>> 
>> Cheers,
>> Simon
>> 
>> 
>> 
>>> On Jan 10, 2020, at 10:58 AM, Henrik Bengtsson  
>>> wrote:
>>> 
>>> The RStudio GUI was just one example.  AFAIK, and please correct me if
>>> I'm wrong, another example is where multi-threaded code is used in
>>> forked processing and that's sometimes unstable.  Yes another, which
>>> might be multi-thread related or not, is
>>> https://stat.ethz.ch/pipermail/r-devel/2018-September/076845.html:
>>> 
>>> res <- parallel::mclapply(urls, function(url) {
>>> download.file(url, basename(url))
>>> })
>>> 
>>> That was reported to fail on macOS with the default method="libcurl"
>>> but not for method="curl" or method="wget".
>>> 
>>> Further documentation is needed and would help but I don't believe
>>> it's sufficient to solve everyday problems.  The argument for
>>> introducing an option/env var to disable forking is to give the end
>>> user a quick workaround for newly introduced bugs.  Neither the
>>> develop nor the end user have full control of the R package stack,
>>> which is always in flux.  For instance, above mclapply() code might
>>> have been in a package on CRAN and then all of a sudden
>>> method="libcurl" became the new default in base R.  The above
>>> mclapply() code is now buggy on macOS, and not necessarily caught by
>>> CRAN checks.  The package developer might not notice this because they
>>> are on Linux or Windows.  It can take a very long time before this
>>> problem is even noticed and even further before it is tracked down and
>>> fixed.   Similarly, as more and more code turn to native code and it
>>> becomes easier and easier to implement multi-threading, more and more
>>> of these bugs across package dependencies risk sneaking in the
>>> backdoor wherever forked processing is in place.
>>> 
>>> For the end user, but also higher-up upstream package developers, the
>>> quickest workaround would be disable forking.  If you're conservative,
>>> you could even disable it all of your R processing.  Being able to
>>> quickly disable forking will also provide a mechanism for quickly
>>> testing the hypothesis that forking is the underlying problem, i.e.
>>> "Please retry with options(fork.allowed = FALSE)" will become handy
>>> for troubleshooting.
>>> 
>>> /Henrik
>>> 
>>> On Fri, Jan 10, 2020 at 5:31 AM Simon Urbanek
>>>  wrote:
 
 If I understand the thread correctly this is an RStudio issue and I would 
 suggest that the developers consider using pthread_atfork() so RStudio can 
 handle forking as they deem fit (bail out with an error or make RStudio 
 work).  Note that in principle the functionality requested here can be 
 easily implemented in a package so R doesn’t need to be modified.
 
 Cheers,
 Simon
 
 Sent from my iPhone
 
>> On Jan 10, 2020, at 04:34, Tomas Kalibera  
>> wrote:
>> 
>> On 1/10/20 7:33 AM, Henrik Bengtsson wrote:
>> I'd like to pick up this thread started on 2019-04-11
>> (https://hypatia.m

Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2020-01-10 Thread Simon Urbanek



> On Jan 10, 2020, at 3:10 PM, Gábor Csárdi  wrote:
> 
> On Fri, Jan 10, 2020 at 7:23 PM Simon Urbanek
>  wrote:
>> 
>> Henrik,
>> 
>> the example from the post works just fine in CRAN R for me - the post was 
>> about homebrew build so it's conceivably a bug in their libraries.
> 
> I think it works now, because Apple switched to a different SSL
> library for libcurl. It usually crashes or fails on older macOS
> versions, with the CRAN build of R as well.
> 
> It is not a bug in any library, it is just that macOS does not support
> fork() without an immediate exec().
> 
> In general, any code that calls the macOS system libraries might
> crash. (Except for CoreFoundation, which seems to be fine, but AFAIR
> there is no guarantee for that, either.)
> 

That is not true, either. macOS itself is fork-safe (it is POSIX-certified 
after all), but libraries may or may not. The rules are pretty clear - fork() 
shares open descriptors and only inherits the main thread (see the POSIX 
documentation for pthread_atfork() - it illustrates the issues nicely). So as a 
user of APIs it may be your responsibility to make sure things are handled 
properly - again, that's what pthread_atfork() is for. Most libraries don't 
allow duplicated fds or have rules about thread safety, so it is your 
responsibility in the package to abide by those rules if you want it to 
function after forking. Some libraries don't allow forking at all, e.g., JVMs 
cannot be forked (because they are too complex to make them fork-safe). In 
general, you cannot assume that (non-R) code is fork-safe unless it has been 
designed to be. That's why mcparallel() should only be used for pure R code 
(and even that is with I/O limitations) and C code that is explicitly 
fork-safe. As I said, using mc* functions explicitly says that you are ok with 
forking, so you if you run code that is not fork-safe it is clearly a user 
error.

That's exactly why we have the long "Warning" section in the documentation. If 
you have suggestions for its improvements, please feel free to supply patches.

Cheers,
Simon


> You get crashes in the terminal as well, without multithreading. E.g.
> the keyring package links for the Security library on macOS, so you
> get:
> 
> ❯ R --vanilla -q
>> .libPaths("~/R/3.6")
>> keyring::key_list()[1:2,]
>service  username
> 1CommCenter kEntitlementsUniqueIDCacheKey
> 2   ids   identity-rsa-public-key
>> parallel::mclapply(1:10, function(i) keyring::key_list()[1:2,])
> 
> *** caught segfault ***
> address 0x110, cause 'memory not mapped'
> 
> *** caught segfault ***
> address 0x110, cause 'memory not mapped'
> 
> AFAICT only Apple can do anything about this, and they won't.
> 
> Gabor
> 
>> That's exactly why I was proposing a more general solution where you can 
>> simply define a function in user-space that will issue a warning or stop on 
>> fork, it doesn't have to be part of core R, there are other packages that 
>> use fork() as well, so what I proposed is much safer than hacking the 
>> parallel package.
>> 
>> Cheers,
>> Simon
>> 
>> 
>> 
>>> On Jan 10, 2020, at 10:58 AM, Henrik Bengtsson  
>>> wrote:
>>> 
>>> The RStudio GUI was just one example.  AFAIK, and please correct me if
>>> I'm wrong, another example is where multi-threaded code is used in
>>> forked processing and that's sometimes unstable.  Yes another, which
>>> might be multi-thread related or not, is
>>> https://stat.ethz.ch/pipermail/r-devel/2018-September/076845.html:
>>> 
>>> res <- parallel::mclapply(urls, function(url) {
>>> download.file(url, basename(url))
>>> })
>>> 
>>> That was reported to fail on macOS with the default method="libcurl"
>>> but not for method="curl" or method="wget".
>>> 
>>> Further documentation is needed and would help but I don't believe
>>> it's sufficient to solve everyday problems.  The argument for
>>> introducing an option/env var to disable forking is to give the end
>>> user a quick workaround for newly introduced bugs.  Neither the
>>> develop nor the end user have full control of the R package stack,
>>> which is always in flux.  For instance, above mclapply() code might
>>> have been in a package on CRAN and then all of a sudden
>>> method="libcurl" became the new default in base R.  The above
>>> mclapply() code is now buggy on macOS, and not necessarily caught by
>>> CRAN checks.  The package developer might not notice this because they
>>> are on Linux or Windows.  It can take a very long time before this
>>> problem is even noticed and even further before it is tracked down and
>>> fixed.   Similarly, as more and more code turn to native code and it
>>> becomes easier and easier to implement multi-threading, more and more
>>> of these bugs across package dependencies risk sneaking in the
>>> backdoor wherever forked processing is in place.
>>> 
>>> For the end user, but also higher-up upstre

Re: [Rd] add jsslogo.jpg to R sources?

2020-01-10 Thread Toby Hocking
hi there, thanks for the feedback, sorry about the cross-posting, and that
makes sense given the nojss option, which I was not aware of.

On Wed, Jan 8, 2020 at 9:16 AM Achim Zeileis 
wrote:

> On Wed, 8 Jan 2020, Iñaki Ucar wrote:
>
> > On Wed, 8 Jan 2020 at 19:21, Toby Hocking  wrote:
> >>
> >> Hi R-core, I was wondering if somebody could please add jsslogo.jpg to
> the
> >> R sources? (as I reported yesterday in this bug)
> >>
> >> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17687
> >>
> >> R already includes jss.cls which is the document class file for Journal
> of
> >> Statistical Software. Actually, for the jss.cls file to be useful, it
> also
> >> requires jsslogo.jpg in order to compile JSS articles without error.
> >>
> >> This is an issue for me because I am writing a JSS paper that includes
> >> figures created using tikzDevice, which I am telling to use the jss
> class
> >> for computing metrics. On debian/ubuntu the R-src/share/texmf directory
> is
> >> copied to /usr/share/texmf/tex/latex/R, so tikzDevice is finding
> jss.cls in
> >> /usr/share/texmf/tex/latex/R/tex/latex/jss.cls but it is failing with a
> >> 'jsslogo not found' error -- the fix is to also include jsslogo.jpg in
> the
> >> R sources (in the same directory as jss.cls).
> >
> > Why don't you just include jsslogo.jpg in your working directory?
> > jss.cls is included in the R sources because there are many vignettes
> > with the JSS style, but always *without* the logo. The logo should
> > only be used for actual JSS publication, so I think that the R sources
> > are no place for it.
>
> Thanks, Iñaki, you are right. The motivation for including jss.cls and
> jss.bst in the R sources was to facilitate turning JSS papers into
> vignettes (see the FAQ at https://www.jstatsoft.org/pages/view/style)
> with
> \documentclass[nojss]{jss}. Before jss.cls/bst were shipped along with
> base R many packages shipped with their own copy which seemed like a waste
> of resources and source of confusion.
>
> When preparing new papers for submission in JSS you can also use the
> "nojss" option, this is also accepted by the journal.
>
> Hope that helps,
> Achim
>
> P.S.: Toby, if you plan on discussing an such an issue anyway, I would
> recommend to wait with the bug report. Cross-posting on different channels
> is always a bit of a nuisance.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel