Hi Larry, Thanks for the reply. Comments interspersed below:
Lawrence Greenfield wrote: > This seems to mean that Cyrus uses the old style text file to store the > seen info rather than a Berkely DB. Is there a particular reason this > is defined this way by default? Is the Berkeley DB style seen format > safe to use and/or preferred? > > The problem is that a large number of small btree files is > inefficient. If you are interested in using Berkeley db for seen > state (probably more efficient in the long run) look into > "seen_bigdb.c". > > However, it's unlikely that the locking problem you're experiencing > will go away by using Berkeley db---in fact, it will probably be worse > since you won't be able to kill just one process---you'll have to kill > everything and run recovery. > I agree with your assessment, the text files seem more efficient for this purpose. I just got confused by the comments. > [...] > world, this would be fine, since locks appear to be released when > processes terminate, in my world, this seems to cause a cascade failure > where once a mailbox gets locked, eventually mail delivery dies when all > of my deliver processes are hung up waiting for the user's mailbox to > be unlocked. Is there any reason not to try to call flock in a > non-blocking fashion with some limited number of retries ( perhaps > delayed with additional attempts tried at increasing intervals) until > finally after 5 minutes or so we fail with an error. I was going to > give this a try unless somebody already knows why this is a bad idea. > > This obviously can work, but it doesn't really help us find the root > cause of the problem. > I realize this, but I may implement it anyway just because it will make the whole system a little bit more robust. A single programming error or deadlock condition would not take out the server. We would still be able to identify the problem process since the affected user would still be unable to get into their mail, and there would be a lot fewer processes to wade through. > Third, I suspect (after looking at the open files in use by the imap > process after terminating an IMAP connection) that the problem may be > exacerbated by, or related to process reuse, can anyone point me to how > to disable this feature for imapd ? > > It's really crucial to understand what process is holding this lock. > It might not be reused---it's probably blocked doing something else > and never releasing the lock. imapds should never do something that > will block for a long time while holding a lock but evidentally > something is wrong. > > Finding the process that holds the lock, doing a backtrace on it and > (even better) figuring out how it gets into that state is really the > crucial issue for fixing this problem. > I agree and I will do everything I can to locate the process next time. I thought I had done a backtrace on every process accessing files in the user's directory (as indentified by an lsof), but I must have missed the guilty one. I also tried to work the problem the other way by looking at /proc/locks, but there were too many to track down and I was not aware of how to easily find a filename associated with an inode number. The problem is that this is of course affecting our heavily used production systems so I usually have a limited window of time to debug the problem before the users will start to scream. The thing I found funny from the lsof was that even though I had a process blocked trying to access the seen file, I had numerous open file entries from later processes to a series of deleted files <username>.seen.NEW. Have not had a chance to go through the seen code yet to see when it creates a .NEW file, but I was supprised to see this for processes that were started later, thinking that they would have tried to get a lock on the seen file first. (see below) Will send more when I track down the process. Thanks again for your help, John [root@romulan /root]# lsof | grep gost9796.seen imapd 3098 cyrus 17u REG 8,15 114 177106 /var/imap/user/ g/gost9796.seen.NEW (deleted) imapd 3098 cyrus 21u REG 8,15 122 177253 /var/imap/user/ g/gost9796.seen.NEW (deleted) imapd 3216 cyrus 17u REG 8,15 114 176458 /var/imap/user/ g/gost9796.seen.NEW (deleted) imapd 3217 cyrus 19u REG 8,15 114 177533 /var/imap/user/ g/gost9796.seen.NEW (deleted) imapd 3376 cyrus 18u REG 8,15 102 177534 /var/imap/user/ g/gost9796.seen.NEW (deleted) imapd 3466 cyrus 17u REG 8,15 102 177536 /var/imap/user/ g/gost9796.seen.NEW (deleted) imapd 3485 cyrus 18u REG 8,15 102 177535 /var/imap/user/ g/gost9796.seen.NEW (deleted) imapd 3496 cyrus 18u REG 8,15 102 177537 /var/imap/user/ g/gost9796.seen.NEW (deleted) imapd 5523 cyrus 17u REG 8,15 102 177424 /var/imap/user/ g/gost9796.seen.NEW (deleted) imapd 6063 cyrus 18u REG 8,15 102 177250 /var/imap/user/ g/gost9796.seen.NEW (deleted) imapd 6108 cyrus 17u REG 8,15 102 177540 /var/imap/user/ g/gost9796.seen.NEW (deleted) imapd 6150 cyrus 17u REG 8,15 102 177541 /var/imap/user/ g/gost9796.seen.NEW (deleted) imapd 6161 cyrus 17u REG 8,15 102 177542 /var/imap/user/ g/gost9796.seen.NEW (deleted) imapd 6722 cyrus 17u REG 8,15 102 177543 /var/imap/user/ g/gost9796.seen.NEW (deleted) imapd 6744 cyrus 18u REG 8,15 102 177464 /var/imap/user/ g/gost9796.seen.NEW (deleted) imapd 6766 cyrus 17u REG 8,15 102 177545 /var/imap/user/ g/gost9796.seen.NEW (deleted) imapd 6877 cyrus mem REG 8,15 102 177546 /var/imap/user/ g/gost9796.seen imapd 6877 cyrus 18u REG 8,15 102 177546 /var/imap/user/ g/gost9796.seen