In the last set of patches we sent to the list, we included a patch to 
master.c to avoid losing track of child processes after a segfault. This 
patch has a race condition that we saw triggered under high load, where 
a child can be reaped before master has processed an 
MASTER_SERVICE_UNAVAILABLE notification. The result of this is that the 
process count is off by one for each time the race condition occurs, 
causing the number of processes to increase indefinitely.

This fixes that problem, and has resulted in a stable number of 
processes on FastMail.FM for the last few days:

--- master/master.c     Thu May  9 19:36:03 2002
+++ master/master.c.new Thu May  9 19:35:21 2002
@@ -814,13 +814,17 @@

     switch (msg->message) {
     case MASTER_SERVICE_AVAILABLE:
-  c->is_available = 1;
-  s->ready_workers++;
+  if (c && c->pid == msg->service_pid) {
+    c->is_available = 1;
+    s->ready_workers++;
+  }
        break;

     case MASTER_SERVICE_UNAVAILABLE:
-  c->is_available = 0;
-  s->ready_workers--;
+  if (c && c->pid == msg->service_pid) {
+    c->is_available = 0;
+    s->ready_workers--;
+  }
        break;

     case MASTER_SERVICE_CONNECTION:


Reply via email to