On Wed, 06 Nov 2019 at 12:44:08 +0100, Jonas Smedegaard wrote: > It seems that interimap would do far better in such an extreme > scenario, as it seems to syncronize small chunks at a time which would > make it possible to continue-where-left-off in far smaller chunks than > possible (or rather comprehensible to me) to do with either offineimap > or dsync.
Correct, the targeted use-case is when one wants to synchronize often, meaning most of the time there are no changes to send over. offlineimap (and `interimap --repair`) selects each mailbox, fetches all messages UIDs and flags, then locally checks for differences between the local and remote list. This is slow, I/O-intensive, and causes a significant amount of network traffic. OTOH with QRESYNC-based synchronization, it's enough to compare the output of the LIST command, which is much shorter (one line per message vs. two lines per mailbox). In practice the LIST/STATUS response for 100-150 mailboxes fits in a single TLS record (max size 16kiB); for servers supporting the COMPRESS=DEFLATE extension that number is typically over 100× higher (it's also possible to compress at a lower layer, but the advantage of doing it at the IMAP layer is that it gives more control on the deflate dictionary; we for instance trigger flush points before sending/receiving large messages, which likely wouldn't compress well anyway). When there are new changes, be it flag updates, new messages, or deletions, only what's new since the last successful sync needs to be replayed on the other side. Thanks to that logic, interimap should be able to scale to mailboxes containing many millions messages; AFAICT Dovecot is quite able to cope with these too, but the MUA might not. > When I switched, I removed the local folder and had interimap syncronize > it from scratch. Or at least that is my belief: Does your forensics > analysis above imply that I failed at that, and the local copy cannot > have been initialized by interimap? I see 3 states in your original bug report: | local: Update last clean state for INBOX: (HIGHESTMODSEQ 165 UIDVALIDITY 1571588814 UIDNEXT 9598) These are the numbers found in the database, and correspond to the state of the local INBOX at the time it was last synchronized by interimap. The numbers make sense for a mailbox that was recently created and populated en masse: low HIGHESTMODSEQ (a strictly increasing sequence of all modifications to the mailbox, incl. flags, message creations and deletion) and highest UID <9598 suggests there are slightly under 10k messages in that mailbox. Based on how Dovecot allocates UIDVALIDITY, I guess the mailbox was created on Sun, 20 Oct 2019 at 18:26:54 +0200. | local: S: * STATUS INBOX (UIDNEXT 290008 UIDVALIDITY 1571588814 HIGHESTMODSEQ 71671) This the *current* state of the local INBOX. The new UIDNEXT and HIGHESTMODSEQ values suggest (IMAP4rev1 mandates the sequences to be strictly increasing, but in practice Dovecot uses continuous numbers) that within two weeks that mailbox saw 280k *new* messages (possibly now deleted) and over 71k *new* changes (such as flag updates, and message creation/deletion). It's not impossible, but certainly suspicious. I'd be interested to know the last good remote state, could you please run: sqlite3 /path/to/interimap/database.db -header <<-EOF SELECT l.UIDVALIDITY AS lUIDVALIDITY, l.UIDNEXT AS lUIDNEXT, l.HIGHESTMODSEQ AS lHIGHESTMODSEQ, r.UIDVALIDITY AS rUIDVALIDITY, r.UIDNEXT AS rUIDNEXT, r.HIGHESTMODSEQ AS rHIGHESTMODSEQ FROM mailboxes m JOIN local l ON m.idx = l.idx JOIN remote r ON m.idx = r.idx WHERE mailbox = 'INBOX'; SELECT COUNT(*) AS messages FROM mailboxes NATURAL JOIN mapping WHERE mailbox = 'INBOX'; EOF and paste the output? (The database file can normally be found under ‘${XDG_SHARE_HOME:-~/.local/share}/interimap’, and the default filename is the hostname of your remote server.) We already know the values of lUIDVALIDITY, lUIDNEXT, lHIGHESTMODSEQ and rUIDVALIDITY: respectively 1571588814, 9598, 165 and 1571588814. I'm interested in rUIDNEXT and rHIGHESTMODSEQ though. I'd also like to see the output of doveadm mailbox status "messages uidnext uidvalidity highestmodseq firstsaved" INBOX both locally and remotely. | remote: S: * STATUS INBOX (UIDNEXT 280739 UIDVALIDITY 1571588814 HIGHESTMODSEQ 74538) This the *current* state of the remote INBOX. The UIDVALIDITY value suggests the mailbox was also created on Sun, 20 Oct 2019 at 18:26:54 +0200 (just like the local INBOX; not impossible of course, but suspicious), while the high UIDNEXT and HIGHESTMODSEQ values hint at a much older mailbox (unless that mailbox is very active, again not impossible but suspicious). | remote: ERROR: UIDVALIDITY changed! (1571588814 != 1154884797) Need to invalidate the UID cache. This indicates that the UIDVALIDITY of the remote INBOX used to be 1154884797 (suggesting it was created on Sun, 06 Aug 2006 at 19:19:57 +0200) when you first ran interimap to synchronize that INBOX. AFAIK an UIDVALIDITY change means one of three things: - the mailbox disappeared (deleted or renamed), then a mailbox with the same name was created (INBOX can't be deleted, but can be renamed) - the storage medium doesn't support persistent UIDs - dovecot's index files were corrupted But the fact that the local and remote UIDVALIDITY value is identical is really surprising. Moreover it was a reset (option 1 and 2 above) the UIDNEXT and HIGHESTMODSEQ values would be lower. (Or did you get 280k new messages in the past two weeks?) So I believe that leaves option 3. I was asking if you run dsync(1) *after* interimap because it does indeed mess around with the index files (unlike interimap and other IMAP solutions, it works at a lower level and does mirror mailbox states incl. UIDVALIDITY, UIDNEXT values as well as UIDs). I'm even able able to reproduce a similar conflict in a test harness: - interimap(1) is used to sync two mailboxes, bringing, UIDVALIDITY, UIDNEXT and HIGHESTMODSEQ respectively to 2, 70, and 14 for the local mailbox, and 1, 81 and 78 for the remote one. - dsync(1) is use locally to syncronize mailboxes; the mailbox states are reconciled, which overrides the UID validity (apparently the lower one is raised to the higher one), and the UID values are invalidated. - interimap(1) is run again, complaining about the UIDVALIDITY change for the remote mailbox. UIDVALIDITY, UIDNEXT and HIGHESTMODSEQ are now respectively to 2, 79, and 202 for the local mailbox, and 2, 81 and 202 for the remote one. However the messages appear to be duplicated, with counts going from 66 to 132, so maybe it's not what you're seeing. >> Now, how to fix this? The easiest is to remove the database entry for >> that mailbox: `interimap --target=database delete INBOX` (this won't >> touch your mails, just the database) and then try to sync again. >> However every message in the local INBOX will be copied remotely and >> vice-versa, so if you have a lot of messages in both mailboxes this >> will create a lot of duplicates :-( This won't cause any data loss >> though. > > That sounds like the least scary approach. And then afterwards try do a > "doveadm deduplicate". Oh, I wasn't aware of `doveadm deduplicate`, thanks for the hint! I added a pointer in the documentation. Note however that deduplication by message GUIDs won't work, because interimap (like any IMAP client) is not GUID-aware and dovecot will assign a new GUID on APPEND. Removing duplicates based on Message-Id values (flag ‘-m’) does work, but is less reliable as two different might share the same Message-Id. Unfortunately doveadm-deduplicate(1) doesn't have a flag to deduplicate by message RFC 5322 content, but for simple mail stores like Maildir it can be done manually at the file-system level: find /path/to/Maildir/{cur,new,tmp} -type f -print0 \ | xargs -r0 sha256sum -z \ | sort -z \ | uniq -zd -w64 \ | cut -z -b1-66 --complement >/tmp/duplicates Use `xargs -r0 -a/tmp/duplicates rm -vf --` to remove duplicates (only one per group, repeat until there are no duplicates left to remove). Next time interimap runs it'll receive a “VANISHED (EARLIER)” response from that server, and remove matching messages on the other one. But first I'd like to understand why you ended up in state, and hopefully answering the question I had above will help with findind a plausible explaination. >> Another thing which seems to suggest that there is a lower-level >> synchronization running and causing cache corruption, is the huge >> spike between the local INBOX's current state and the one from the >> database (ie the last successful sync): UIDNEXT=9598 / >> HIGHESTMODSEQ=165 vs. UIDNEXT=290008 / HIGHESTMODSEQ=71671. While >> sudden heavy-traffic mailboxes are certainly possible, it's still a >> bit suspicious. > > Sorry, I don't understand what you are saying here. I hope the above wall of text helps clarify this. :-) > Could that perhaps be explained by my _previous_ use of dsync and then > resyncing from scratch with interimap? I mean, I guess that would show > as high counts remote and low counts locally. By restarting from scratch I guess you nuked the local mailbox? Then the previous synchronization solution used doesn't matter. Indeed this explain having much lower local state values than remote ones, but doesn't explain the huge difference between last “good” local values and the current ones. Let alone the UIDVALIDITY change. >> The first step is to identify what's causing the cache corruption as >> it's likely to happen again otherwise, and that's beyond the scope of >> interimap and other plain IMAP clients :-P If it was indeed dsync(1) >> causing the cache corruption, then it's possible to avoid duplicates: >> you could run it once more so both sides are in sync (with matching/ >> overwritten UIDs, UIDVALIDITY, etc. values), then nuke *one* end along >> with the database (using `interimap --target=database --target=local >> delete INBOX`) and finally sync again. I reckon it's a bit scary >> though, so feel free to poke me over IRC if you'd like. > > I would prefer to _not_ go back to dsync but explore the use of > interimap. I'm glad to hear that, that's not what I was suggesting :-P I meant, if the corruption was caused by a synchronization tool like dsync, you could run it *once* more to bring the mailboxes in sync again (they appear to have diverged at the moment), then nuke one end to start fresh again with interimap. > I suspect the corruption is caused by my use of notmuch with the > synchronize_flags=true setting. Do you also use notmuch remotely? (It's the *remote* UIDVALIDITY value that changed, not the local one) Or more exactly, do you ever access the remote mailstore directly, not through the IMAPd? Also for Maildir [0] Dovecot keeps the mapping between UID and filenames in a ‘dovecot-uidlist’ inside each mailbox directory, and the UIDVALIDITY value can be found in the header of that file. notmuch and others MUA have no business in this file, and I do hope they just ignore it. Also I doubt the fact that the local and remote UIDVALIDITY are now identical is a coincidence. My gut feeling is that either Dovecot's internal file (‘dovecot-uidlist’) was copied as is, or the tool that caused the corruption is able to parse it. > I remember at your talk in Curitiba there was a question afterwards > about notmuch and imap tags - I would very _very_ much like to shift > from local-only notmuch tags to distributed imap tags - but that's > probably another big discussion... Indeed :-) -- Guilhem. [0] https://wiki.dovecot.org/MailboxFormat/Maildir
signature.asc
Description: PGP signature