On Sun, Feb 01, 2015 at 01:13:06AM +0000, Jason Vas Dias wrote: > 1. An "invoker.sh" process runs a "job.sh" bash script in a separate > process which runs a long-running (or non-terminating!) > 'Simple Command' (not a shell "Job") (call it "nterm.sh"). > > 2. After a while, the originator decides that the job has timed-out, > and kills its process (the instance of bash running job.sh), and > then exits. > > 3. The "long-command" nterm.sh process is left still running as an orphan, > and would become a zombie if it tries to exit.
If nterm.sh's parent has already exited, then nterm.sh gets "adopted" by init. When nterm.sh exits, init will wait() for it to harvest and discard the exit status, so it won't become a zombie for any significant length of time (it'll only be a zombie for however long it takes init to wait()). That said, the real issue here is your step 2. If someone kills the shell that's managing a long-running job, but doesn't kill the long-running job itself, then you as the shell script developer have the opportunity to catch the signal and pass it along to the child process. In theory, that's great. In practice, it only works if you launch the long-running child as a background job and then block yourself with a shell builtin (such as wait or read), so that you can catch the signal immediately, rather than whenever the long-running child finishes. But that's the nature of shell script development. You just have to know these limitations and work around them in your script. Simplistic example: #!/usr/bin/env bash set -m myjob() { some | long-running | pipeline | here; } trap 'kill %1; exit' TERM myjob & wait