Thanks for your explanation. This makes a lot of sense! SIGINT handling is a blind spot to me, this introduction looks perfect!
Best, Jiefei On Tue, Jul 20, 2021 at 4:31 PM Tomas Kalibera <tomas.kalib...@gmail.com> wrote: > > Hi Jiefei, > > when you run the cluster "automatically" in your terminal and pres > Ctrl-C in Unix, both the master and the worker processes get the SIGINT > signal, because they belong to the same foreground process group. So you > are directly interrupting also the worker process. > > When you run the cluster "manually", that is the master in one terminal > window and the worker in another, they are in different process groups > and if you pres Ctrl-C in the terminal running the master, only the > master will receive SIGINT signal, not the worker. > > If you wanted to read the sources more, look for SIGINT handling in R, > the onintrEx() function, etc. A good source on signal handling is e.g. > http://www.linusakesson.net/programming/tty/ > > Best > Tomas > > On 7/20/21 9:55 AM, Jiefei Wang wrote: > > Hi all, > > > > I just notice this interesting problem a few days before, but I cannot > > find an answer for it. Say if you have a long-running job in a cluster > > made by the parallel package and you decide to stop the execution by > > pressing ctr + c in the terminal or the stop button in Rstudio for > > some reason. After the interrupt, is the cluster still valid or not? > > Below is a simple example code > > > > library(parallel) > > cl <- makeCluster(1) > > ## run and interrupt it > > parLapply(cl, 1, function(i){Sys.sleep(10);Sys.getpid()}) > > ## run another apply function to check the cluster status > > parLapply(cl, 1, function(i)i) > > > > From my test result, the answer is yes. The worker is interrupted > > immediately and the cluster is ready for the next command, but when I > > create the worker manually, things seem different. > > > > library(parallel) > > cl <- makeCluster(1, manual = TRUE) > > ## run and interrupt it > > parLapply(cl, 1, function(i){Sys.sleep(10);Sys.getpid()}) > > ## run another apply function to check the cluster status > > parLapply(cl, 1, function(i)i) > > > > It seems like the worker does not know the manager has been > > interrupted and still runs the current task. I have to wait for 10 > > seconds before I can get the result from the last line of the code and > > the return value is the PID from the first apply function. > > > > Both cases are reasonable, but it is surprising to see them at the > > same time. I start to wonder how the user interrupt is handled, so I > > looked at the code in the parallel package. However, it looks like > > there is no related code, there is no try-catch statement in the > > manager's code to handle the user interrupt, but the worker just > > magically knows it should stop the current execution. > > > > I can see this behavior in both Win and Ubuntu. It is kind of beyond > > my knowledge, so I wonder if anyone can help me with it. Does the > > cluster support the user interrupt? Why the above code works or not > > works? Many thanks! > > > > Best, > > Jiefei > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel