At 11:13 PM 5/13/2002 -0400, Michael Bacon wrote: >Sounds like what we're running into at the moment, which appears to be the >master processes ending up with an incorrect count of available workers. >The problem occurs when a worker process dies while in the "available" >state, and doesn't notify the master. Jeremy Howard recently posted a >patch which addresses this problem, by decrementing the "available >workers" counter when receiving a SIGCLD, which strikes me as the right >way to go. However, his patch is for 2.1.3, and like you, we're using >2.0.16 (the bleeding edge is a bad place
This is extremely interesting. Michael, do you find this happens at seemingly random times though? We can go a week or two with no problems, and then bam, I get a 911. Of course, our volume is considerably lower than yours. Another issue, and one that may differentiate our problems from yours (but hopefully not as your at least have a work-around), is that I can sometimes restart Cyrus, and even after a restart, no new connections are serviced. (They connect, but get no service.) I've found that when this happens Cyrus will often appear to work for a VERY short while, and then revert back to the point where connections occur but no service (pop3d) responds. Shouldn't a restart completely fix the problem? If so we may be fighting something different. A reboot also doesn't always clear up the problem. Again, Cyrus will come up, but then fail shortly thereafter. What is really odd is that the problem just goes away after a few hours. Regards, Dustin --- Dustin Puryear <[EMAIL PROTECTED]> UNIX and Network Consultant http://members.telocity.com/~dpuryear PGP Key available at http://www.us.pgp.net In the beginning the Universe was created. This has been widely regarded as a bad move. - Douglas Adams