Re: replica recovery

Brian Scholl Tue, 27 Oct 2015 18:03:07 -0700

Both are excellent points and I will look to implement them.  Particularly I 
wonder if a respectable increase to the numRecordsToKeep param could solve this 
problem entirely.


Thanks!



> On Oct 27, 2015, at 20:50, Jeff Wartes <jwar...@whitepages.com> wrote:
> 
> 
> On the face of it, your scenario seems plausible. I can offer two pieces
> of info that may or may not help you:
> 
> 1. A write request to Solr will not be acknowledged until an attempt has
> been made to write to all relevant replicas. So, B won’t ever be missing
> updates that were applied to A, unless communication with B was disrupted
> somehow at the time of the update request. You can add a min_rf param to
> your write request, in which case the response will tell you how many
> replicas received the update, but it’s still up to your indexer client to
> decide what to do if that’s less than your replication factor.
> 
> See 
> https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+
> Tolerance for more info.
> 
> 2. There are two forms of replication. The usual thing is for the leader
> for each shard to write an update to all replicas before acknowledging the
> write itself, as above. If a replica is less than N docs behind the
> leader, the leader can replay those docs to the replica from its
> transaction log. If a replica is more than N docs behind though, it falls
> back to the replication handler recovery mode you mention, and attempts to
> re-sync the whole shard from the leader.
> The default N for this is 100, which is pretty low for a high-update-rate
> index. It can be changed by increasing the size of the transaction log,
> (via numRecordsToKeep) but be aware that a large transaction log size can
> delay node restart.
> 
> See 
> https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConf
> ig#UpdateHandlersinSolrConfig-TransactionLog for more info.
> 
> 
> Hope some of that helps, I don’t know a way to say
> delete-first-on-recovery.
> 
> 
> 
> On 10/27/15, 5:21 PM, "Brian Scholl" <bsch...@legendary.com> wrote:
> 
>> Whoops, in the description of my setup that should say 2 replicas per
>> shard.  Every server has a replica.
>> 
>> 
>>> On Oct 27, 2015, at 20:16, Brian Scholl <bsch...@legendary.com> wrote:
>>> 
>>> Hello,
>>> 
>>> I am experiencing a failure mode where a replica is unable to recover
>>> and it will try to do so forever.  In writing this email I want to make
>>> sure that I haven't missed anything obvious or missed a configurable
>>> option that could help.  If something about this looks funny, I would
>>> really like to hear from you.
>>> 
>>> Relevant details:
>>> - solr 5.3.1
>>> - java 1.8
>>> - ubuntu linux 14.04 lts
>>> - the cluster is composed of 1 SolrCloud collection with 100 shards
>>> backed by a 3 node zookeeper ensemble
>>> - there are 200 solr servers in the cluster, 1 replica per shard
>>> - a shard replica is larger than 50% of the available disk
>>> - ~40M docs added per day, total indexing time is 8-10 hours spread
>>> over the day
>>> - autoCommit is set to 15s
>>> - softCommit is not defined
>>>     
>>> I think I have traced the failure to the following set of events but
>>> would appreciate feedback:
>>> 
>>> 1. new documents are being indexed
>>> 2. the leader of a shard, server A, fails for any reason (java crashes,
>>> times out with zookeeper, etc)
>>> 3. zookeeper promotes the other replica of the shard, server B, to the
>>> leader position and indexing resumes
>>> 4. server A comes back online (typically 10s of seconds later) and
>>> reports to zookeeper
>>> 5. zookeeper tells server A that it is no longer the leader and to sync
>>> with server B
>>> 6. server A checks with server B but finds that server B's index
>>> version is different from its own
>>> 7. server A begins replicating a new copy of the index from server B
>>> using the (legacy?) replication handler
>>> 8. the original index on server A was not deleted so it runs out of
>>> disk space mid-replication
>>> 9. server A throws an error, deletes the partially replicated index,
>>> and then tries to replicate again
>>> 
>>> At this point I think steps 6  => 9 will loop forever
>>> 
>>> If the actual errors from solr.log are useful let me know, not doing
>>> that now for brevity since this email is already pretty long.  In a
>>> nutshell and in order, on server A I can find the error that took it
>>> down, the post-recovery instruction from ZK to unregister itself as a
>>> leader, the corrupt index error message, and then the (start - whoops,
>>> out of disk- stop) loop of the replication messages.
>>> 
>>> I first want to ask if what I described is possible or did I get lost
>>> somewhere along the way reading the docs?  Is there any reason to think
>>> that solr should not do this?
>>> 
>>> If my version of events is feasible I have a few other questions:
>>> 
>>> 1. What happens to the docs that were indexed on server A but never
>>> replicated to server B before the failure?  Assuming that the replica on
>>> server A were to complete the recovery process would those docs appear
>>> in the index or are they gone for good?
>>> 
>>> 2. I am guessing that the corrupt replica on server A is not deleted
>>> because it is still viable, if server B had a catastrophic failure you
>>> could pick up the pieces from server A.  If so is this a configurable
>>> option somewhere?  I'd rather take my chances on server B going down
>>> before replication finishes than be stuck in this state and have to
>>> manually intervene.  Besides, I have disaster recovery backups for
>>> exactly this situation.
>>> 
>>> 3. Is there anything I can do to prevent this type of failure?  It
>>> seems to me that if server B gets even 1 new document as a leader the
>>> shard will enter this state.  My only thought right now is to try to
>>> stop sending documents for indexing the instant a leader goes down but
>>> on the surface this solution sounds tough to implement perfectly (and it
>>> would have to be perfect).
>>> 
>>> If you got this far thanks for sticking with me.
>>> 
>>> Cheers,
>>> Brian
>>> 
>> 
>

Re: replica recovery

Reply via email to