Re: [Rd] segfault issue with parallel::mclapply and download.file() on Mac OS X

2018-10-04 Thread Tomas Kalibera



Thanks for the report, but unfortunately I cannot reproduce on my system 
(either macOS nor Linux, from the command line) to debug. Did you run 
this in the command line version of R?


I would not be surprised to see such a crash if executed from a 
multi-threaded application, say from some GUI or frontend that runs 
multiple threads, or from some other R session where a third party 
library (curl?) already started some threads. In such situations 
mcfork/mclapply is unsafe (?mcfork warns against GUI and frontends and 
I've now expanded slightly) and and it could not be fixed without being 
turned into something like parLapply(). parLapply() on a non-FORK 
cluster should work fine even with such applications.


Best
Tomas

On 09/19/2018 11:19 PM, Seth Russell wrote:

I have an lapply function call that I want to parallelize. Below is a very
simplified version of the code:

url_base <- "https://cloud.r-project.org/src/contrib/";
files <- c("A3_1.0.0.tar.gz", "ABC.RAP_0.9.0.tar.gz")
res <- parallel::mclapply(files, function(s) download.file(paste0(url_base,
s), s))

Instead of download a couple of files in parallel, I get a segfault per
process with a 'memory not mapped' message. I've been working with Henrik
Bengtsson on resolving this issue and he recommended I send a message to
the R-Devel mailing list.

Here's the output:

trying URL 'https://cloud.r-project.org/src/contrib/A3_1.0.0.tar.gz'
trying URL 'https://cloud.r-project.org/src/contrib/ABC.RAP_0.9.0.tar.gz'

  *** caught segfault ***
address 0x11575ba3a, cause 'memory not mapped'

  *** caught segfault ***
address 0x11575ba3a, cause 'memory not mapped'

Traceback:
  1: download.file(paste0(url_base, s), s)
  2: FUN(X[[i]], ...)
  3: lapply(X = S, FUN = FUN, ...)
  4: doTryCatch(return(expr), name, parentenv, handler)
  5: tryCatchOne(expr, names, parentenv, handlers[[1L]])
  6: tryCatchList(expr, classes, parentenv, handlers)
  7: tryCatch(expr, error = function(e) {call <- conditionCall(e)if
(!is.null(call)) {if (identical(call[[1L]], quote(doTryCatch)))
 call <- sys.call(-4L)dcall <- deparse(call)[1L]
  prefix <- paste("Error in", dcall, ": ")
 LONG <- 75LTraceback:
 sm <- strsplit(conditionMessage(e), "\n")[[1L]] 1: w <- 14L
+ nchar(dcall, type = "w") + nchar(sm[1L], type = "w")if (is.na(w))
download.file(paste0(url_base, s), s)w <- 14L + nchar(dcall,
type = "b") + nchar(sm[1L],
 type = "b")if (w > LONG)  2: FUN(X[[i]], ...)
  3: lapply(X = S, FUN = FUN, ...)
  4: doTryCatch(return(expr), name, parentenv, handler)
  5: tryCatchOne(expr, names, parentenv, handlers[[1L]])
  6: prefix <- paste0(prefix, "\n  ")tryCatchList(expr, classes,
parentenv, handlers)
 }else prefix <- "Error : " 7: msg <- paste0(prefix,
conditionMessage(e), "\n")tryCatch(expr, error = function(e) {
  .Internal(seterrmessage(msg[1L]))call <- conditionCall(e)if
(!silent && isTRUE(getOption("show.error.messages"))) {if
(!is.null(call)) {cat(msg, file = outFile)if
(identical(call[[1L]], quote(doTryCatch)))
.Internal(printDeferredWarnings())call <- sys.call(-4L)}
  dcall <- deparse(call)[1L]invisible(structure(msg, class =
"try-error", condition = e))prefix <- paste("Error in", dcall, ":
")})LONG <- 75Lsm <- strsplit(conditionMessage(e),
"\n")[[1L]]
 w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w")
if (is.na(w))  8: w <- 14L + nchar(dcall, type = "b") +
nchar(sm[1L], try(lapply(X = S, FUN = FUN, ...), silent = TRUE)
type = "b")
 if (w > LONG) prefix <- paste0(prefix, "\n  ") 9:
}sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))else
prefix <- "Error : "
 msg <- paste0(prefix, conditionMessage(e), "\n")
  .Internal(seterrmessage(msg[1L]))10: if (!silent &&
isTRUE(getOption("show.error.messages"))) {FUN(X[[i]], ...)cat(msg,
file = outFile)
 .Internal(printDeferredWarnings())}11:
invisible(structure(msg, class = "try-error", condition =
e))lapply(seq_len(cores), inner.do)})

12:  8: parallel::mclapply(files, function(s)
download.file(paste0(url_base, try(lapply(X = S, FUN = FUN, ...), silent =
TRUE)s), s))

  9:
sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))Possible
actions:

1: abort (with core dump, if enabled)
2: normal R exit
10: 3: exit R without saving workspace
FUN(X[[i]], ...)4: exit R saving workspace

11: lapply(seq_len(cores), inner.do)
12: parallel::mclapply(files, function(s) download.file(paste0(url_base,
   s), s))

Here's my sessionInfo()


sessionInfo()

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin16.7.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS/LAPACK: /usr/local/Cellar/openblas/0.3.3/lib/libopenblasp-r0.3.3.dylib

locale:
[1] en_US/en_US/en_US/C/en_US/en_US

attached base packages:
[1] parallel 

Re: [Rd] segfault issue with parallel::mclapply and download.file() on Mac OS X

2018-10-04 Thread Jeroen Ooms
On Thu, Oct 4, 2018 at 6:12 PM Tomas Kalibera  wrote:
>
>
> Thanks for the report, but unfortunately I cannot reproduce on my system
> (either macOS nor Linux, from the command line) to debug. Did you run
> this in the command line version of R?

It depends on which version of MacOS that you are using and
specifically which TLS back-end curl has been configured with. When
libcurl uses DarwinSSL, it may crash when opening HTTPS connections in
a fork because CoreFoundation is not fork-safe. OTOH when using
OpenSSL or LibreSSL for TLS, you usually get away with forking (though
it's still bad practice).

The standard version of libcurl that ships with MacOS was using
CoreFoundation until 10.12 but starting 10.13 they switched to
LibreSSL in order to support HTTP/2. See curl --version or
curl::curl_version() for your local config. Don't count in this
though, Apple might switch back to the fork-unsafe DarwinSSL once they
support ALPN, which is needed for HTTP/2.

As Gabor already suggested, libcurl has built-in systems for
concurrent connections. The curl package exposes this via multi_add
function. Not only is this safer than forking, it will be much faster
because it takes advantage of HTTP keep-alive and when supported it
uses HTTP2 multiplexing which allows efficiently performing thousands
of concurrent HTTPS requests over a single TCP connection.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel