Re: Updating /seen from concurrent sessions

Lawrence Greenfield Thu, 14 Nov 2002 18:41:04 -0800

--On Friday, November 15, 2002 12:52 PM +1100 Andrew McNamara <[EMAIL PROTECTED]> wrote:

What's the general feeling on the skiplist implementation used in
conjunction with Sun and NetApp's NFS (we're locked in to using this
combination for various reasons)? Would you be more or less likely to
trust it over db3?

In general none of Cyrus will necessarily work over NFS. If you're only accessing the NFS store from a single client, things have a much better chance of working---but I really don't know what semantics Sun's NFS client and NetApp's NFS filer guarantee with regards to mmap() and write(). If it doesn't support mmap() showing changes by write() immediately (Cyrus tests for this in the configure script but the configure script is probably not doodling on an NFS partition) you need to use map_nommap, which is very slow.

Berkeley db makes no guarantees of working over NFS. skiplist should work over NFS with a single client and map_nommap.

Another question - it looks to me like I have to recompile to switch
database types - is this true? The code looks like it would be flexible
enough to allow a run-time config option to chose the method with very
little modification?

It probably could be made a run-time option. Since you need to convert all of the different files, making it an easy run-time switch has never been a priority.

It would be possible to flush the seen state more often; it's just a
question of how often and when should other imapds look for it.

If the imapd already can cope with asynchronous events, I would flush the
state after a second or two of inactivity from the client. Failing that,
I would probably flush the state before replying to the client (yes,
this would hurt performance, although probably not much, particularly
if we skip the fsync()).

You can't skip the fsync() because the fsync()s are what guarantees that the files will be in a consistent form if the system crashes. (The fsync()s are needed for ordering guarantees of operation. This is true for Berkeley db, skiplist, flat files, whatever.)

But this just fixes the OE problem - Cyrus would still have a problem
(as far as I can see): all the other copies accessing that mailbox
will still have their old seen files open (maybe using skiplist fixes
this). The flat-file seen implementation needs to check to see if the
file has been renamed under it (and do what?).

The flat file database layer (cyrusdb_flat) already knows how to do this at the appropriate time. The caching is being implemented in the seen layer (seen_db.c) not the flat file implementation.

To be honest, the flat file seen implementation is way more complicated
than I would have thought was worthwhile. My preference would be to
not hold the file open, and simply re-write the whole file each time we
updated it, renaming the replacement into place (to make the operation
atomic - this is also the only synchronous operation). My experience has
been that unix is quite happy doing naive things like this while the
file remains small (say less than 10k).

Whenever there is a change, the flat file does rewrite the entire file. The database layer holds the file open because the database layer assumes that other operations (reads on other keys, things like that). Updates are very frequent, which is why the skiplist implementation can perform better.

However, updates can be an order of magnitude more frequent if we're going to write for every flag change. Cyrus is written with the expectation that you will have thousands of simultaneous clients working on tens or hundreds of thousands of mailboxes.

I implemented a Postfix map that works this way - for lookups, it simply
does a linear read/search of the file. For update, it writes a new file,
and moves it into place. Generally this performed much better than
more complex schemes such as the Sleepycat DB's - particularly when you
consider memory footprint (this was on a machine with about 100k users,
handling 10's of messages per second).

It doesn't scale when there are frequent updates. That's why we have the database abstraction, so we can choose the file format that does the job most effectively. cyrusdb_flat does exactly this, and it works ok when you don't need frequent updates. Seen state has frequent updates.

Larry

Re: Updating /seen from concurrent sessions

Reply via email to