Wesley,
Thanks for your response.  This is precisely what we ended up doing. 
We've got a Perl script which walks LDAP for a user list, and runs
"sync_client -u <user>" for each account, trapping errors.  This gave us
a list for reconstruct.  In a couple of cases, however, even that didn't
remedy the situation, in which cases we resorted to rsync followed by
sync_client cleanups.

Thanks again.  Hopefully this message, with its subject line, will help
future unfortunate users grepping the mailing list archives for ideas.

Cheers,
    -nic

On 08/11/2014 08:46 AM, Wesley Craig wrote:
> So, sync server is crashing on the backend you're attempting to replicate 
> back to.  Probably the cyrus meta files were corrupted for mailboxes which 
> were actively being written to when you had the array malfunction.  To 
> recover, I'd probably run sync client on each individual user to find which 
> users are corrupted.  Armed with the list, I'd reconstruct those users and 
> try again.
>
> Ideally, you'd get crash reports that you could forward along, since cyrus 
> really ought to be armored against this kind of corruption.  After all, why 
> else would you have failed over?
>
> :wes
>
> On 06 Aug 2014, at 16:03, Nic Bernstein <n...@onlight.com> wrote:
>
>> Friends,
>> We've got a simple Murder deployed, 2 front-ends, 1 mupdate-master, 1
>> backend and 1 replica.  Recently, due to an array malfunction, the
>> back-end master took a powder, and we switched to the replica.  Now
>> we're trying to recover the original master, and running into lots of
>> problems getting data to sync back.
>>
>> This is all with version 2.4.17-caldav-beta9, from Debian packages, on
>> Ubuntu 14.04 servers.  For the record, the servers are KVM QEMU VMs, tho
>> I doubt that matters at all.
>>
>> We've got the roles reversed just fine with changes to the various
>> cyrus.conf and imapd.conf files, and are not worried about that being a
>> problem.  Everything is working fine as far as
>> authentication/authorization, etc.  It's just the replication that's fubar.
>>
>> We're seeing this sort of error in the logs on the (new) master side:
>>    ...
>>    Aug  6 18:21:28 mailbox.ia cyrus/sync_client[27000]:   Promoting:
>> MAILBOX user.connie.yadda -> USER connie
>>    Aug  6 18:21:28 mailbox.ia cyrus/sync_client[27000]:   Promoting:
>> MAILBOX user.elly.Junk -> USER elly
>>    Aug  6 18:21:28 mailbox.ia cyrus/sync_client[27000]: Error in
>> do_sync(): bailing out! Bad protocol
>>    Aug  6 18:21:28 mailbox.ia cyrus/sync_client[27000]: Processing sync
>> log file /var/lib/imap/sync/log-27000 failed: Bad protocol
>>
>> And this on the (new) replica side:
>>    Aug  6 18:20:37 mailbox.wi cyrus/syncserver[13158]: executed
>>    Aug  6 18:20:37 mailbox.wi cyrus/syncserver[13158]: accepted connection
>>    Aug  6 18:20:37 mailbox.wi cyrus/syncserver[13158]: cmdloop(): startup
>>    Aug  6 18:20:37 mailbox.wi cyrus/syncserver[13158]: login:
>> mailbox.ia.occinc.com [192.168.220.24] mailproxy PLAIN User logged in
>>    Aug  6 18:20:37 mailbox.wi cyrus/syncserver[13158]: created
>> decompress buffer of 4102 bytes
>>    Aug  6 18:20:37 mailbox.wi cyrus/syncserver[13158]: created compress
>> buffer of 4102 bytes
>>    Aug  6 18:20:59 mailbox.wi cyrus/syncserver[13158]: Repacking
>> mailbox user.ndlocate
>>    Aug  6 18:21:05 mailbox.wi master[11811]: service syncserver pid
>> 13158 in BUSY state: terminated abnormally
>>
>> In some cases we've seen problems we believe are due to issues with a
>> particular user's mailbox, and have fixed those by blowing away the
>> user's mailbox hierarchy on the replica, rsync-ing it back over from the
>> master, and then doing a user-sync.  But there are hundreds of users, so
>> that's not a practical general solution. 
>>
>> The mailstore is currently about 130GB in size, and the master and
>> replica are in different data centers, with only about 3 or 4Mbps
>> available between them (depending upon time of day).  This is fine in
>> the normal course of rolling replication, but makes simply
>> re-replication the entire thing a major pain, if that's the only option.
>>
>> So, what's causing this problem, and what's the best course of action to
>> recover from this sort of situation?
>>
>> Thanks in advance for your consideration,
>>    -nic
>>
>> -- 
>> Nic Bernstein                             n...@onlight.com
>> Onlight, Inc.                             www.onlight.com
>> 219 N. Milwaukee St., Suite 2a            v. 414.272.4477
>> Milwaukee, Wisconsin  53202
>>
>> ----
>> Cyrus Home Page: http://www.cyrusimap.org/
>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>> To Unsubscribe:
>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

-- 
Nic Bernstein                             n...@onlight.com
Onlight, Inc.                             www.onlight.com
219 N. Milwaukee St., Suite 2a            v. 414.272.4477
Milwaukee, Wisconsin  53202

----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Reply via email to