Wesley, Thanks for your response. This is precisely what we ended up doing. We've got a Perl script which walks LDAP for a user list, and runs "sync_client -u <user>" for each account, trapping errors. This gave us a list for reconstruct. In a couple of cases, however, even that didn't remedy the situation, in which cases we resorted to rsync followed by sync_client cleanups.
Thanks again. Hopefully this message, with its subject line, will help future unfortunate users grepping the mailing list archives for ideas. Cheers, -nic On 08/11/2014 08:46 AM, Wesley Craig wrote: > So, sync server is crashing on the backend you're attempting to replicate > back to. Probably the cyrus meta files were corrupted for mailboxes which > were actively being written to when you had the array malfunction. To > recover, I'd probably run sync client on each individual user to find which > users are corrupted. Armed with the list, I'd reconstruct those users and > try again. > > Ideally, you'd get crash reports that you could forward along, since cyrus > really ought to be armored against this kind of corruption. After all, why > else would you have failed over? > > :wes > > On 06 Aug 2014, at 16:03, Nic Bernstein <n...@onlight.com> wrote: > >> Friends, >> We've got a simple Murder deployed, 2 front-ends, 1 mupdate-master, 1 >> backend and 1 replica. Recently, due to an array malfunction, the >> back-end master took a powder, and we switched to the replica. Now >> we're trying to recover the original master, and running into lots of >> problems getting data to sync back. >> >> This is all with version 2.4.17-caldav-beta9, from Debian packages, on >> Ubuntu 14.04 servers. For the record, the servers are KVM QEMU VMs, tho >> I doubt that matters at all. >> >> We've got the roles reversed just fine with changes to the various >> cyrus.conf and imapd.conf files, and are not worried about that being a >> problem. Everything is working fine as far as >> authentication/authorization, etc. It's just the replication that's fubar. >> >> We're seeing this sort of error in the logs on the (new) master side: >> ... >> Aug 6 18:21:28 mailbox.ia cyrus/sync_client[27000]: Promoting: >> MAILBOX user.connie.yadda -> USER connie >> Aug 6 18:21:28 mailbox.ia cyrus/sync_client[27000]: Promoting: >> MAILBOX user.elly.Junk -> USER elly >> Aug 6 18:21:28 mailbox.ia cyrus/sync_client[27000]: Error in >> do_sync(): bailing out! Bad protocol >> Aug 6 18:21:28 mailbox.ia cyrus/sync_client[27000]: Processing sync >> log file /var/lib/imap/sync/log-27000 failed: Bad protocol >> >> And this on the (new) replica side: >> Aug 6 18:20:37 mailbox.wi cyrus/syncserver[13158]: executed >> Aug 6 18:20:37 mailbox.wi cyrus/syncserver[13158]: accepted connection >> Aug 6 18:20:37 mailbox.wi cyrus/syncserver[13158]: cmdloop(): startup >> Aug 6 18:20:37 mailbox.wi cyrus/syncserver[13158]: login: >> mailbox.ia.occinc.com [192.168.220.24] mailproxy PLAIN User logged in >> Aug 6 18:20:37 mailbox.wi cyrus/syncserver[13158]: created >> decompress buffer of 4102 bytes >> Aug 6 18:20:37 mailbox.wi cyrus/syncserver[13158]: created compress >> buffer of 4102 bytes >> Aug 6 18:20:59 mailbox.wi cyrus/syncserver[13158]: Repacking >> mailbox user.ndlocate >> Aug 6 18:21:05 mailbox.wi master[11811]: service syncserver pid >> 13158 in BUSY state: terminated abnormally >> >> In some cases we've seen problems we believe are due to issues with a >> particular user's mailbox, and have fixed those by blowing away the >> user's mailbox hierarchy on the replica, rsync-ing it back over from the >> master, and then doing a user-sync. But there are hundreds of users, so >> that's not a practical general solution. >> >> The mailstore is currently about 130GB in size, and the master and >> replica are in different data centers, with only about 3 or 4Mbps >> available between them (depending upon time of day). This is fine in >> the normal course of rolling replication, but makes simply >> re-replication the entire thing a major pain, if that's the only option. >> >> So, what's causing this problem, and what's the best course of action to >> recover from this sort of situation? >> >> Thanks in advance for your consideration, >> -nic >> >> -- >> Nic Bernstein n...@onlight.com >> Onlight, Inc. www.onlight.com >> 219 N. Milwaukee St., Suite 2a v. 414.272.4477 >> Milwaukee, Wisconsin 53202 >> >> ---- >> Cyrus Home Page: http://www.cyrusimap.org/ >> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ >> To Unsubscribe: >> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus -- Nic Bernstein n...@onlight.com Onlight, Inc. www.onlight.com 219 N. Milwaukee St., Suite 2a v. 414.272.4477 Milwaukee, Wisconsin 53202 ---- Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ To Unsubscribe: https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus