>Try using skiplist for the seen.db >It doesn't really solve the problem but it masks it well enough. > >From my understanding, changing to skiplist really shouldn't change >the visible behavior at all. But I've been wrong before.
I'll try to test it here and let you know. My reading of the code suggests it shouldn't change the specific problem I'm seeing. What's the general feeling on the skiplist implementation used in conjunction with Sun and NetApp's NFS (we're locked in to using this combination for various reasons)? Would you be more or less likely to trust it over db3? Another question - it looks to me like I have to recompile to switch database types - is this true? The code looks like it would be flexible enough to allow a run-time config option to chose the method with very little modification? >It would be possible to flush the seen state more often; it's just a >question of how often and when should other imapds look for it. If the imapd already can cope with asynchronous events, I would flush the state after a second or two of inactivity from the client. Failing that, I would probably flush the state before replying to the client (yes, this would hurt performance, although probably not much, particularly if we skip the fsync()). But this just fixes the OE problem - Cyrus would still have a problem (as far as I can see): all the other copies accessing that mailbox will still have their old seen files open (maybe using skiplist fixes this). The flat-file seen implementation needs to check to see if the file has been renamed under it (and do what?). To be honest, the flat file seen implementation is way more complicated than I would have thought was worthwhile. My preference would be to not hold the file open, and simply re-write the whole file each time we updated it, renaming the replacement into place (to make the operation atomic - this is also the only synchronous operation). My experience has been that unix is quite happy doing naive things like this while the file remains small (say less than 10k). I implemented a Postfix map that works this way - for lookups, it simply does a linear read/search of the file. For update, it writes a new file, and moves it into place. Generally this performed much better than more complex schemes such as the Sleepycat DB's - particularly when you consider memory footprint (this was on a machine with about 100k users, handling 10's of messages per second). >I've never actually seen this problem happen whenever I've fooled around >with OE so I've never looked at the code to figure out what to do. I get the impression it's a specific OE usage pattern that triggers it. I've had it described to me as "send a mail, click the <send/check> button", which sounds common enough to me. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/