Hi Larry,

Thanks for the reply.  Comments interspersed below:

Lawrence Greenfield wrote:

>    This seems to mean that Cyrus uses the old style text file to store the
>    seen info rather than a Berkely DB.   Is there a particular reason this
>    is defined this way by default?   Is the Berkeley DB style seen format
>    safe to use and/or preferred?
>
> The problem is that a large number of small btree files is
> inefficient.  If you are interested in using Berkeley db for seen
> state (probably more efficient in the long run) look into
> "seen_bigdb.c".
>
> However, it's unlikely that the locking problem you're experiencing
> will go away by using Berkeley db---in fact, it will probably be worse
> since you won't be able to kill just one process---you'll have to kill
> everything and run recovery.
>

I agree with your assessment, the text files seem more efficient for this
purpose.  I just got confused by the comments.


> [...]
>    world, this would be fine, since locks appear to be released when
>    processes terminate, in my world, this seems to cause a cascade failure
>    where once a mailbox gets locked, eventually mail delivery dies when all
>    of my deliver processes are hung up waiting for the user's  mailbox to
>    be unlocked.   Is there any reason not to try to call flock in a
>    non-blocking fashion with some limited number of retries ( perhaps
>    delayed with additional attempts tried at increasing intervals) until
>    finally after 5 minutes or so we fail with an error.     I was going to
>    give this a try unless somebody already knows why this is a bad idea.
>
> This obviously can work, but it doesn't really help us find the root
> cause of the problem.
>

I realize this, but I may implement it anyway just because it will make the
whole system a little bit more robust.  A single programming error or deadlock
condition would not take out the server.   We would still be able to identify
the problem process since the affected user would still be unable to get into
their mail, and there would be a lot fewer processes to wade through.

>    Third,  I suspect (after looking at the open files in use by the imap
>    process after terminating an IMAP connection) that the problem may be
>    exacerbated by, or related to process reuse, can anyone point me to how
>    to disable this feature for  imapd ?
>
> It's really crucial to understand what process is holding this lock.
> It might not be reused---it's probably blocked doing something else
> and never releasing the lock.  imapds should never do something that
> will block for a long time while holding a lock but evidentally
> something is wrong.
>
> Finding the process that holds the lock, doing a backtrace on it and
> (even better) figuring out how it gets into that state is really the
> crucial issue for fixing this problem.
>

I agree and I will do everything I can to locate the process next time.   I
thought I had done a backtrace on every process accessing files in the user's
directory (as indentified by an lsof), but I must have missed the guilty
one.     I also tried to work the problem the other way by looking at
/proc/locks, but there were too many to track down and I was not aware of how
to easily find a filename associated with an inode number.   The problem is
that this is of course affecting our heavily used production systems so I
usually have a limited window of time to debug the problem before the users
will start to scream.

The thing I found funny from the lsof was that even though I had a process
blocked trying to access the seen file, I had numerous open file entries from
later processes to a series of deleted files <username>.seen.NEW.     Have not
had a chance to go through the seen code yet to see when it creates a .NEW
file, but I was supprised to see this for processes that were started later,
thinking that they would have tried to get a lock on the seen file first.  (see
below)

Will send more when I track down the process.    Thanks again for your help,

John

[root@romulan /root]# lsof | grep gost9796.seen
imapd      3098  cyrus   17u   REG       8,15      114    177106
/var/imap/user/
g/gost9796.seen.NEW (deleted)
imapd      3098  cyrus   21u   REG       8,15      122    177253
/var/imap/user/
g/gost9796.seen.NEW (deleted)
imapd      3216  cyrus   17u   REG       8,15      114    176458
/var/imap/user/
g/gost9796.seen.NEW (deleted)
imapd      3217  cyrus   19u   REG       8,15      114    177533
/var/imap/user/
g/gost9796.seen.NEW (deleted)
imapd      3376  cyrus   18u   REG       8,15      102    177534
/var/imap/user/
g/gost9796.seen.NEW (deleted)
imapd      3466  cyrus   17u   REG       8,15      102    177536
/var/imap/user/
g/gost9796.seen.NEW (deleted)
imapd      3485  cyrus   18u   REG       8,15      102    177535
/var/imap/user/
g/gost9796.seen.NEW (deleted)
imapd      3496  cyrus   18u   REG       8,15      102    177537
/var/imap/user/
g/gost9796.seen.NEW (deleted)
imapd      5523  cyrus   17u   REG       8,15      102    177424
/var/imap/user/
g/gost9796.seen.NEW (deleted)
imapd      6063  cyrus   18u   REG       8,15      102    177250
/var/imap/user/
g/gost9796.seen.NEW (deleted)
imapd      6108  cyrus   17u   REG       8,15      102    177540
/var/imap/user/
g/gost9796.seen.NEW (deleted)
imapd      6150  cyrus   17u   REG       8,15      102    177541
/var/imap/user/
g/gost9796.seen.NEW (deleted)
imapd      6161  cyrus   17u   REG       8,15      102    177542
/var/imap/user/
g/gost9796.seen.NEW (deleted)
imapd      6722  cyrus   17u   REG       8,15      102    177543
/var/imap/user/
g/gost9796.seen.NEW (deleted)
imapd      6744  cyrus   18u   REG       8,15      102    177464
/var/imap/user/
g/gost9796.seen.NEW (deleted)
imapd      6766  cyrus   17u   REG       8,15      102    177545
/var/imap/user/
g/gost9796.seen.NEW (deleted)
imapd      6877  cyrus  mem    REG       8,15      102    177546
/var/imap/user/
g/gost9796.seen
imapd      6877  cyrus   18u   REG       8,15      102    177546
/var/imap/user/
g/gost9796.seen




Reply via email to