Steven Pelley <stevenpel...@gmail.com> writes:
> wait -n
> fails to return for processes that terminate due to a signal prior to
> calling wait -n.  Instead, it returns 127 with an error that the
> process id cannot be found.  Calling wait <pid> (without -n) then
> returns its exit code (e.g., 143).

My understanding is that this is how "wait" is expected to work, or at
least known to work, but mostly because that's how the *kernel* works.

"wait" without -n makes a system call which means "give me information
about a terminated subprocess".  The termination (or perhaps
change-of-state) reports from subprocesses are queued up in the kernel
until the process retrieves them through "wait" system calls.

OTOH, "wait" with -n makes a system call which means "give me
information about my subprocess N".

In the first case, if the subprocess N has terminated, its report is
still queued and "wait" retrieves it.  In the second case, if the
subprocess N has terminated, it doesn't exist and as the manual page
says "If id specifies a non-existent process or job, the return status
is 127."

What you're pointing out is that that creates a race condition when the
subprocess ends before the "wait".  And it seems that the kernel has
enough information to tell "wait -n N", "process N doesn't exist, but
you do have a queued termination report for it".  But it's not clear
that there's a way to ask the kernel for that information without
reading all the queued termination reports (and losing the ability to
return them for other "wait" calls).

Then again, I might be wrong.

Dale

Reply via email to