Since I have been working on the 2.0.16 locking problem (per my previous e-mails),
I have been taking a good long look at the seen locking behavior and I must admit
confusion as to whether this is working as intended or if it is buggy.

First of all, I am reasonably convinced that the locking problem that we have
discussed previously, is a combination of two things (at least in Linux),   There
are clearly cases in the cyrus code where cyrus tries to lock the same file twice
without unlocking it.     For example, when I use the "select" command to access a
mailbox that I have not touched before, in the index_check function on line 439 of
index.c, we call seen_lockread to lock the file,  later in the same function (on
line 487), we call the index_checkseen which also makes a call to the seen_lockread
function.     I suspect there are other places where this happens as well.    In all
of my testing, a call to flock  to lock the same file descriptor twice by the same
process always works, but clearly from the gdb backtraces and analysis I have done,
every time an IMAP process locks up on me, it is trying to lock the seen file a
second time when it already has a lock.   I suspect that there is some bug in the
2.2.19 version of the linux kernel I am using that causes second calls to flock for
the same file descriptor to fail intermittantly.    I am currently testing a
workaround fix to lib/lock_flock.c that attempts to unlock the file first before it
attempts to lock.  Fortunately you can not remove an advisory lock that was set by
another process and if you don't have a lock, the call to flock to unlock the file
does not generate an error code.    This seems to be working and I will post it to
the list once I have more confidence in it.

I am not sure if locking the file multiple times by the same process can be
considered a bug or not, but  it may be that this is not the normal way that flock
is used and it is possible that the linux kernel has not been tested extensively in
this way.   I am curious if all of the other folks with this problem are using linux
or if it also shows up in solaris and other Unix flavors.

The second point is more directly tied to the topic of this message.    I agree with
Heiki Kask that there are some problems in the seen code.  Since there are very few
clients that initiate multiple connections and since the odds of a message arriving
while the seen file is being updated are very low, I think these problems are
masked.     The obvious evidence of the problem that I have seen is that when I have
one of these locking problems, the seen file is locked with an advisory lock and yet
it still gets updated and replaced when the other IMAP processes attempt to access
the mailbox.   Clearly there are some points in the code (that I am trying to track
down) that update the file without checking the advisory lock first.

I also have some confusion about how the seen system is supposed to work.    The
model cyrus uses seems to be:

Read the file and cache it in a buffer to speed up subsequent accesses.

When you want to write the file, lock it, write out the new file as filename.NEW,
lock the new file, and rename the new back to the original name (replacing the
original)    Then unlock

 Subsequent reads always check to see if the file has been replaced and if it has,
read in the new file, otherwise use the cached buffer version for performance
reasons.

Larry please correct me if this is explanation is wrong.  Since this is abstracted
out to allow the use of either berkely db's or flat files, it can get confusing to
trace the calls from one function to the next.

Thanks,
John


Lawrence Greenfield wrote:

> Cyrus caches seen state in memory for a time before flushing it to
> disk.  Generally this works quite well; I use Outlook Express and
> don't seem to have this problem, but perhaps I just don't do this
> exact sequence of clicks.
>
> It's possible to force Cyrus to synchronize seen state more quickly
> with the comparable loss of performance/scalability.
>
> Larry
>
>    Date: Fri, 30 Nov 2001 01:31:37 +0200
>    From: Heiki Kask <[EMAIL PROTECTED]>
>    CC: [EMAIL PROTECTED]
>
>    > When Cyrus-IMAP writes the seen state, it first makes a copy of cyrus.seen
>    > to cyrus.seen.new(?).  This allows other IMAP connections to read the seen
>    > state from cyrus.seen, while the first connection is updating
>    > cyrus.seen.new(?).  When it finishes it moves the file to cyrus.seen.
>
>    I see a contradiction here: Cyrus IMAP server is designed to support
>    multiple connections but it does not work with a client that actually
>    uses them.
>
>    > one file (i.e. unix mbox format).  This requires the UW-IMAP server to lock
>    > the file when it is writing to it, which prevents other connections from
>    > reading the file, until the lock is removed.
>
>    Is it possible to force Cyrus IMAP server to use some kind of locking
>    mechanism?
>
>    For some unknown reason most users are running OE as a IMAP client and
>    only way to solve their problems is to do something at the server side.
>
>    I haven't familiar (yet :)) with internals of the Cyrus IMAP server
>    design, so could somebody explain the background of the multiple
>    connection support? This seems to be handy only for situations when same
>    user is accessing imap folders using different clients.
>
>    BTW, Thanks for your OE hints!
>
>    heiki

Reply via email to