Steven Pelley <stevenpel...@gmail.com> writes: > wait -n > fails to return for processes that terminate due to a signal prior to > calling wait -n. Instead, it returns 127 with an error that the > process id cannot be found. Calling wait <pid> (without -n) then > returns its exit code (e.g., 143).
My understanding is that this is how "wait" is expected to work, or at least known to work, but mostly because that's how the *kernel* works. "wait" without -n makes a system call which means "give me information about a terminated subprocess". The termination (or perhaps change-of-state) reports from subprocesses are queued up in the kernel until the process retrieves them through "wait" system calls. OTOH, "wait" with -n makes a system call which means "give me information about my subprocess N". In the first case, if the subprocess N has terminated, its report is still queued and "wait" retrieves it. In the second case, if the subprocess N has terminated, it doesn't exist and as the manual page says "If id specifies a non-existent process or job, the return status is 127." What you're pointing out is that that creates a race condition when the subprocess ends before the "wait". And it seems that the kernel has enough information to tell "wait -n N", "process N doesn't exist, but you do have a queued termination report for it". But it's not clear that there's a way to ask the kernel for that information without reading all the queued termination reports (and losing the ability to return them for other "wait" calls). Then again, I might be wrong. Dale