On 07/25/2012 03:05 AM, Chet Ramey wrote: > Bash assumes that there's a PID space at least as > large as CHILD_MAX, and that the kernel will use all of it before reusing > any PID in the space. Posix says that shells must remember up to CHILD_MAX > statuses of terminated asynchronous children (the description of `wait'), > so implicitly the kernel is not allowed to reuse process IDs until it has > exhausted CHILD_MAX PIDs.
What about grand-childs? They do count for the kernel, but not for the toplevel shell... > The description of fork() doesn't mention this, > however. The Posix fork() requirement that the PID returned can't > correspond to an existing process or process group is not sufficient to > satisfy the requirement on `wait'. OTOH, AFAICT, as long as a PID isn't waitpid()ed for, it isn't reused by fork(). However, I'm unable to find that in the POSIX spec. > Bash holds on to the status of all terminated processes, not just > background ones, and only checks for the presence of a newly-forked PID > in that list if the list size exceeds CHILD_MAX. One of the results of > defining RECYCLES_PIDS is that the check is performed on every created > process. What if the shell does not do waitpid(-1), but waitpid(known-child-PID). That would mean to waitpid(synchronous-child-PID) immediately, and waitpid(asynchronous-child-PID) upon some "wait $!" shell command, rendering to waitpid(-1) when there's no PID passed to "wait". > I'd be interested in knowing the value of CHILD_MAX (or even `ulimit -c') > on the system where you're seeing this problem. The AIX 6.1 I've debugged on has: #define CHILD_MAX 128 #define _POSIX_CHILD_MAX 25 sysconf(_SC_CHILD_MAX) = 1024 $ ulimit -H -c -u core file size (blocks, -c) unlimited max user processes (-u) unlimited $ ulimit -S -c -u core file size (blocks, -c) 1048575 max user processes (-u) unlimited The Interix 6.1 we do have similar-looking stability problems has: CHILD_MAX not defined #define _POSIX_CHILD_MAX 6 sysconf(_SC_CHILD_MAX) = 512 $ ulimit -H -c -u core file size (blocks, -c) unlimited max user processes (-u) 512 $ ulimit -S -c -u core file size (blocks, -c) unlimited max user processes (-u) 512 > The case where last_made_pid is equal to last_pid is a problem only when > the PID space is extremely small -- on the order of, say, 4 -- as long as > the kernel behaves as described above. I'm going to run this build job with 'truss -t kfork' again, to eventually find some too small count of different PIDs before PID-recycling by the kernel... Anyway - defining RECYCLES_PIDS for that AIX 6.1 has reduced the error rate for this one build job from ~37 to 0 when run 50 times. /haubi/