On 08/31/2018 03:13 PM, Gábor Csárdi wrote:
On Fri, Aug 31, 2018 at 2:51 PM Tomas Kalibera <tomas.kalib...@gmail.com> wrote:
[...]
kill(sig=0) is specified by POSIX but indeed as you say there is a race
condition due to PID-reuse.  In principle, detecting that a worker
process is still alive cannot be done correctly outside base R.
I am not sure why you think so.
To avoid the race with PID re-use one needs access to signal handling, to blocking signals, to handling sigchld. system/system2 and mcparallel/mccollect in base R use these features and the interaction is still safe given the specific use in system/system2 and mcparallel/mccollect, yet would have to be re-visited if either of the two uses change. These features cannot be safely used outside of base R in contributed packages.

Tomas


At user-level I would probably consider some watchdog, e.g. the parallel
tasks would be repeatedly touching a file.
I am pretty sure that there are simpler and better solutions. E.g. one
would be to
ask the worker process for its startup time (with as much precision as possible)
and then use the (pid, startup_time) pair as a unique id.

With this you can check if the process is still running, by checking
that the pid exists,
and that its startup time matches.

This is all very simple with the ps package, on Linux, macOS and Windows.

Gabor

In base R, one can do this correctly for forked processes via
mcparallel/mccollect, not for PSOCK cluster workers which are based on
system() (and I understand it would be a useful feature)

  > j <- mcparallel(Sys.sleep(1000))
  > mccollect(j, wait=FALSE)
NULL

# kill the child process

  > mccollect(j, wait=FALSE)
$`1542`
NULL

More details indeed in ?mcparallel. The key part is that the job must be
started as non-detached and as soon as mccollect() collects is,
mccollect() must never be called on it again.

Tomas

I can the PID of each cluster nodes by querying them for their
Sys.getpid(), e.g.

      pids <- parallel::clusterEvalQ(cl, Sys.getpid())

Is there a function in core R for testing whether a process with a
given PID exists or not? From trial'n'error, I found that on Linux:

    pid_exists <- function(pid) as.logical(tools::pskill(pid, signal = 0L))

returns TRUE for existing processes and FALSE otherwise, but I'm not
sure if I can trust this.  It's not a documented feature in
?tools::pskill, which also warns about 'signal' not being standardized
across OSes.

The other Linux alternative I can imagine is:

    pid_exists <- function(pid) system2("ps", args = c("--pid", pid),
stdout = FALSE) == 0L

Can I expect this to work on macOS as well?  What about other *nix systems?

And, finally, what can be done on Windows?

I'm sure there are packages on CRAN that provides this, but I'd like
to keep dependencies at a minimum.

I appreciate any feedback. Thxs,

Henrik

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to