Hi Thibault,

mclapply has been designed to signal an error in two ways. User code errors are returned as special objects (of class "try-error") in the respective element of the result list. All other errors (including a process killed) are returned as NULL in the respective elements of the result list. To detect these errors reliably, one needs to implement FUN so that it never returns NULL normally (also it cannot return a raw vector). This is how mclapply was designed and implemented (and also mccollect, etc). It may be surprising to see multiple NULL elements when a single process is killed, but this is expected with pre-scheduling when that process has been tasked to compute multiple elements.

To make this API more user friendly, I've added a warning that is now emitted when a job does not deliver a result (that is, when a vector element is NULL because of such error). I've also made it more explicit in the documentation that NULL signals an error.

Best,
Tomas


On 07/26/2018 08:37 PM, Thibault Vatter wrote:
Hi,

I wondered about the behavior described in the following stackoverflow
question:

https://stackoverflow.com/questions/20674538/mclapply-returns-null-randomly

More specifically, I would like to know if you ever considered the
suggestion made in the comments of the first answer, namely to somehow warn
the user if one of the processes has been killed by the out-of-memory
killer ?

I am always surprised to see the random NULLs without message/warning/error
of any kind, and I think that it could be a useful feature to know whether
the function executed by mclapply returned a NULL or if the process was
killed for some reason.

In the following gist, I have an example of this (in this case non-random)
behavior:

https://gist.github.com/tvatter/2fcf3a9a99c256f9b9360f596b300715

For the record, I generate the list of NULLs in the 4th mclapply in the
girst above with a late 2013 macbook pro with macOS High Sierra, 16GB of
memory, and my sessionInfo() is:

R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin16.7.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS:
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK:
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
  base

loaded via a namespace (and not attached):
[1] compiler_3.5.0 tools_3.5.0    yaml_2.1.19

------------------------------------------------------------
Thibault Vatter
Department of Statistics
Columbia University

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to