In general none of Cyrus will necessarily work over NFS. If you're only accessing the NFS store from a single client, things have a much better chance of working---but I really don't know what semantics Sun's NFS client and NetApp's NFS filer guarantee with regards to mmap() and write(). If it doesn't support mmap() showing changes by write() immediately (Cyrus tests for this in the configure script but the configure script is probably not doodling on an NFS partition) you need to use map_nommap, which is very slow.What's the general feeling on the skiplist implementation used in conjunction with Sun and NetApp's NFS (we're locked in to using this combination for various reasons)? Would you be more or less likely to trust it over db3?
Berkeley db makes no guarantees of working over NFS. skiplist should work over NFS with a single client and map_nommap.
It probably could be made a run-time option. Since you need to convert all of the different files, making it an easy run-time switch has never been a priority.Another question - it looks to me like I have to recompile to switch database types - is this true? The code looks like it would be flexible enough to allow a run-time config option to chose the method with very little modification?
You can't skip the fsync() because the fsync()s are what guarantees that the files will be in a consistent form if the system crashes. (The fsync()s are needed for ordering guarantees of operation. This is true for Berkeley db, skiplist, flat files, whatever.)It would be possible to flush the seen state more often; it's just a question of how often and when should other imapds look for it.If the imapd already can cope with asynchronous events, I would flush the state after a second or two of inactivity from the client. Failing that, I would probably flush the state before replying to the client (yes, this would hurt performance, although probably not much, particularly if we skip the fsync()).
The flat file database layer (cyrusdb_flat) already knows how to do this at the appropriate time. The caching is being implemented in the seen layer (seen_db.c) not the flat file implementation.But this just fixes the OE problem - Cyrus would still have a problem (as far as I can see): all the other copies accessing that mailbox will still have their old seen files open (maybe using skiplist fixes this). The flat-file seen implementation needs to check to see if the file has been renamed under it (and do what?).
Whenever there is a change, the flat file does rewrite the entire file. The database layer holds the file open because the database layer assumes that other operations (reads on other keys, things like that). Updates are very frequent, which is why the skiplist implementation can perform better.To be honest, the flat file seen implementation is way more complicated than I would have thought was worthwhile. My preference would be to not hold the file open, and simply re-write the whole file each time we updated it, renaming the replacement into place (to make the operation atomic - this is also the only synchronous operation). My experience has been that unix is quite happy doing naive things like this while the file remains small (say less than 10k).
However, updates can be an order of magnitude more frequent if we're going to write for every flag change. Cyrus is written with the expectation that you will have thousands of simultaneous clients working on tens or hundreds of thousands of mailboxes.
It doesn't scale when there are frequent updates. That's why we have the database abstraction, so we can choose the file format that does the job most effectively. cyrusdb_flat does exactly this, and it works ok when you don't need frequent updates. Seen state has frequent updates.I implemented a Postfix map that works this way - for lookups, it simply does a linear read/search of the file. For update, it writes a new file, and moves it into place. Generally this performed much better than more complex schemes such as the Sleepycat DB's - particularly when you consider memory footprint (this was on a machine with about 100k users, handling 10's of messages per second).
Larry