We had to limit the number of lmtp processes and let sendmail do the queuing.. We're on smaller hardware and a lot fewer accounts and messages/day but on a PIII w/ 1Gb of RAM we found 10-12 lmtpd was the sweet spot to consistently prevent deadlocks..
Rob Carter wrote: > > Gentlefolk, > > Does anyone have experiences they'd be willing to share with combatting > deadlocks within a BDB 3.3 duplicate delivery database on a high-traffic Cyrus > v2.1.16 (or earlier 2.1.x) server? We're running a 60,000+ user/1.2 million > message/day Cyrus postoffice on an 8-way Solaris system, and recently, we've > started running into increasingly frequent deadlock problems with the > duplicate suppression database. > > The symptoms we're seeing are probably what you'd expect -- our cyrus.conf is > set to allow up to 120 lmtpd children to run simulateously, and when we hit a > deadlock condition in the duplicate suppression database, we find that all 120 > of our running lmtpds lock up waiting for write locks in the database. > "truss" shows them all stuck in "lwp_sema_wait()" calls. Inspection of the > duplicate database after the fact sometimes shows corruption (usually null > page pointers reported by db_verify), but sometimes shows nothing -- it's > possible that we're seeing two different problems with the same end effect, > but I suspect the database corruption is actually a side-effect of the > deadlock problem... > > We've come up with a work-around that at least allows us to correct the > situation without performing a master restart (with 4000+ simultaneous IMAPS > connections, a master restart isn't something we can routinely do, > unfortunately) -- renaming the duplicate delivery database and its log and > __db* files and then kill -15'ing all the running lmtpds seems to get us back > to a functional state with a fresh duplicate suppression database. We're up > to seeing this happen a bit more than once a day now, though, and it's > becoming seriously annoying. > > We're using the db3_nosync mechanism (with BDB version 3.3.11) for our dup > suppression database -- one option we're strongly considering is switching to > the regular "db3" mechanism (without the nosync option) to try to avoid the > deadlocks, but we're a bit concerned about what that may do to lmtp > throughput. Turning off duplicate suppression is...politically untenable...at > this point... > > We've also considered running the db3 "db_deadlock" routine to periodically > detect and try to correct deadlock conditions in the duplicate suppression > database, but that's also somewhat scary -- it's unclear to us exactly what > the behavior of an lmtpd awaiting a lock in the duplicate suppression database > would be when its waiting lock got terminated by the db_deadlock daemon... > > Anyone have any experience or wisdom to share about either possible solution, > or about other things that you've seen work in similar situations? At this > point, upgrading to 2.2.x is on our radar, but probably not something we can > approach before mid-semester (2-3 months out), so suggestions for solutions > with Cyrus v2.1.x would be most appreciated... > > --Thanx much, > --Rob Carter-- > --- > Cyrus Home Page: http://asg.web.cmu.edu/cyrus > Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu > List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html