On Mon, Mar 7, 2011 at 8:24 AM, Linus Torvalds <torva...@linux-foundation.org> wrote: > > and I think one reason why the race is hard to get rid of is simply > that system call return is _the_ common point of signal handling in > UNIX (technically, obviously any return to user space, but there are > no appreciable interrupts etc going on, and there _are_ a lot of > system calls). The above trace is one that my patch would have handled > correctly (it has no EINTR).
Linux also ends up making this race easier to see probably because all system calls that are interruptible by signals are all "greedy": they try to do as much real work as possible, rather than return EINTR. That means that if there is work pending (like characters in a tty buffer for "read()", or a child that has exited for "wait()"), then system calls under Linux (and likely all other Unixes too, but Linux is the one I can guarantee works this way) will always do as much real work as possible, and return that real work rather than return with EINTR. So the _common_ case will be: - the system call returns with success ("read a few characters" or "found this child") - but the signal handler will be executed immediately at the return point, so user space won't really even "see" the success before the signal handler is executed. In other words, when you do waiting_for_child++; pid = WAITPID (-1, &status, waitpid_flags); waiting_for_child--; even if the "waiting_for_child--" were to be compiled to be one single instruction, and at the exact return point of the system call, the signal handler would still happen right in between the system call return and that instruction. Linus