I am running a locally compiled cyrus 2.0.16 on a Red Hat 7.1 system. Twice now one of my users (only one, but this one happens to be my boss!) has observed a wierd, transient failure mode: in this mode, imapd hangs on any attempt to write to a folder. Here is an example, caught by ethereal:
0000000c LIST "" inbox.Sent * LIST () "." "inbox.Sent" 0000000c OK Completed (0.000 secs 2 calls) 0000000d APPEND inbox.Sent {353} That's it -- no word from imapd ever again. Here, for reference, is how it's supposed to look: 00000006 LIST "" inbox.Sent * LIST () "." "inbox.Sent" 00000006 OK Completed (0.000 secs 2 calls) 00000007 APPEND inbox.Sent {355} + go ahead Date: Mon, 26 Nov 2001 22:25:29 -0800 (PST)... 00000007 OK [APPENDUID 1001546367 273] Completed But in the hanging mode, the client never gets a "+ go ahead" from imapd. Once this behaviour starts, it occurs for any imapd process that my boss creates until the cyrus master process is killed and restarted. It does not occur for other users' imapd processes, even while it is occuring for my boss's. Mail delivered via lmtp continues to arrive normally in my boss's inbox, even while this is occuring. The last time this occured (about 10 days ago), I reconstructed my boss's mailboxes and assumed that was that, but today it happened again. My boss uses the same clients (netscape and pine) as most of my other users. The only thing unusual about his account is its size: 246 MB distributed over 299 folders (we have no quotas). I am wondering if sometimes an operation on a large file or directory might time out and leave cyrus in an inconsistent state. One last piece of information: when setting up cyrus, I chose to ignore the documentation's advice to set the user, quota, and partition directories to update synchronously. The documentation implies that the only consequence of this is possible data loss during a hard shutdown, and since that has never happened to us (our server has a UPS and we a re in a hospital with its own emergency power) I choose to accept that risk. I am now wondering if asynchronicity might have additional consequences. That's the story. Any ideas? Can anyone suggest a way to get more information out of cyrus, e.g. an strace or ltrace of a running imapd process so I can see what it is doing when it dies?