Since I have been working on the 2.0.16 locking problem (per my previous e-mails), I have been taking a good long look at the seen locking behavior and I must admit confusion as to whether this is working as intended or if it is buggy.
First of all, I am reasonably convinced that the locking problem that we have discussed previously, is a combination of two things (at least in Linux), There are clearly cases in the cyrus code where cyrus tries to lock the same file twice without unlocking it. For example, when I use the "select" command to access a mailbox that I have not touched before, in the index_check function on line 439 of index.c, we call seen_lockread to lock the file, later in the same function (on line 487), we call the index_checkseen which also makes a call to the seen_lockread function. I suspect there are other places where this happens as well. In all of my testing, a call to flock to lock the same file descriptor twice by the same process always works, but clearly from the gdb backtraces and analysis I have done, every time an IMAP process locks up on me, it is trying to lock the seen file a second time when it already has a lock. I suspect that there is some bug in the 2.2.19 version of the linux kernel I am using that causes second calls to flock for the same file descriptor to fail intermittantly. I am currently testing a workaround fix to lib/lock_flock.c that attempts to unlock the file first before it attempts to lock. Fortunately you can not remove an advisory lock that was set by another process and if you don't have a lock, the call to flock to unlock the file does not generate an error code. This seems to be working and I will post it to the list once I have more confidence in it. I am not sure if locking the file multiple times by the same process can be considered a bug or not, but it may be that this is not the normal way that flock is used and it is possible that the linux kernel has not been tested extensively in this way. I am curious if all of the other folks with this problem are using linux or if it also shows up in solaris and other Unix flavors. The second point is more directly tied to the topic of this message. I agree with Heiki Kask that there are some problems in the seen code. Since there are very few clients that initiate multiple connections and since the odds of a message arriving while the seen file is being updated are very low, I think these problems are masked. The obvious evidence of the problem that I have seen is that when I have one of these locking problems, the seen file is locked with an advisory lock and yet it still gets updated and replaced when the other IMAP processes attempt to access the mailbox. Clearly there are some points in the code (that I am trying to track down) that update the file without checking the advisory lock first. I also have some confusion about how the seen system is supposed to work. The model cyrus uses seems to be: Read the file and cache it in a buffer to speed up subsequent accesses. When you want to write the file, lock it, write out the new file as filename.NEW, lock the new file, and rename the new back to the original name (replacing the original) Then unlock Subsequent reads always check to see if the file has been replaced and if it has, read in the new file, otherwise use the cached buffer version for performance reasons. Larry please correct me if this is explanation is wrong. Since this is abstracted out to allow the use of either berkely db's or flat files, it can get confusing to trace the calls from one function to the next. Thanks, John Lawrence Greenfield wrote: > Cyrus caches seen state in memory for a time before flushing it to > disk. Generally this works quite well; I use Outlook Express and > don't seem to have this problem, but perhaps I just don't do this > exact sequence of clicks. > > It's possible to force Cyrus to synchronize seen state more quickly > with the comparable loss of performance/scalability. > > Larry > > Date: Fri, 30 Nov 2001 01:31:37 +0200 > From: Heiki Kask <[EMAIL PROTECTED]> > CC: [EMAIL PROTECTED] > > > When Cyrus-IMAP writes the seen state, it first makes a copy of cyrus.seen > > to cyrus.seen.new(?). This allows other IMAP connections to read the seen > > state from cyrus.seen, while the first connection is updating > > cyrus.seen.new(?). When it finishes it moves the file to cyrus.seen. > > I see a contradiction here: Cyrus IMAP server is designed to support > multiple connections but it does not work with a client that actually > uses them. > > > one file (i.e. unix mbox format). This requires the UW-IMAP server to lock > > the file when it is writing to it, which prevents other connections from > > reading the file, until the lock is removed. > > Is it possible to force Cyrus IMAP server to use some kind of locking > mechanism? > > For some unknown reason most users are running OE as a IMAP client and > only way to solve their problems is to do something at the server side. > > I haven't familiar (yet :)) with internals of the Cyrus IMAP server > design, so could somebody explain the background of the multiple > connection support? This seems to be handy only for situations when same > user is accessing imap folders using different clients. > > BTW, Thanks for your OE hints! > > heiki