On Fri, 20 Aug 2004, Rob Carter wrote:

Gentlefolk,

Does anyone have experiences they'd be willing to share with combatting deadlocks within a BDB 3.3 duplicate delivery database on a high-traffic Cyrus v2.1.16 (or earlier 2.1.x) server? We're running a 60,000+ user/1.2 million message/day Cyrus postoffice on an 8-way Solaris system, and recently, we've started running into increasingly frequent deadlock problems with the duplicate suppression database.

The symptoms we're seeing are probably what you'd expect -- our cyrus.conf is set to allow up to 120 lmtpd children to run simulateously, and when we hit a deadlock condition in the duplicate suppression database, we find that all 120 of our running lmtpds lock up waiting for write locks in the database. "truss" shows them all stuck in "lwp_sema_wait()" calls. Inspection of the duplicate database after the fact sometimes shows corruption (usually null page pointers reported by db_verify), but sometimes shows nothing -- it's possible that we're seeing two different problems with the same end effect, but I suspect the database corruption is actually a side-effect of the deadlock problem...

We're not running on anything that hefty, which is probably why we haven't seen it. Likewise, at this point I don't see any hardware that beefy falling on me to test it with, which is probably somewhat unfortunate.


We've also considered running the db3 "db_deadlock" routine to periodically detect and try to correct deadlock conditions in the duplicate suppression database, but that's also somewhat scary -- it's unclear to us exactly what the behavior of an lmtpd awaiting a lock in the duplicate suppression database would be when its waiting lock got terminated by the db_deadlock daemon...

It seems like it would be better to detect deadlocks and ascertain where and why, but I think I need to review some code before I could possibly have any useful suggestions in that vein.


---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Reply via email to