Re: SolrCloud and split-brain

Otis Gospodnetic Fri, 15 Jun 2012 19:33:38 -0700

Thanks Mark.

The reason I asked this is because I saw mentions of SolrCloud being resilient 
to split brain because it uses ZooKeeper.
However, if my half brain understands what split brain is then I think that's 
not a completely true claim because one can get unlucky and get a SolrCloud 
cluster partitioned in a way that one or even all partitions reject indexing 
(and update and deletion) requests if they do not have a complete index.


In my example of a 10-node cluster that gets split into a 7-node and a 3-node 
partition, if neither partition ends up containing the full index (i.e. at 
least one copy of each shard) then neither partition will accept updates.

And here is one more Q.
* Imagine a client is adding documents and, for simplicity, imagine SolrCloud 
routes all these documents to the same shard, call it S.
* Imagine that both the 7-node and the 3-node partition end up with a complete 
index and thus both accept updates.
* This means that both the 7-node and the 3-node partition have at least one 
replica of shard S, lets call then S7 and S3.
* Now imagine if the client sending documents for indexing happened to be 
sending documents to 2 nodes, say in round-robin fashion.
* And imagine that each of these 2 nodes ended up in a different partition.

The client now keeps sending docs to these 2 nodes and both happily take and 
index documents in their own copies of S.
To the client everything looks normal - all documents are getting indexed.
But S7 and S3 are no longer the same - they contain different documents!

Problem, no?
What happens with somebody fixes the cluster and all nodes are back in the same 
10-node cluster?  What happens to S7 and S3?
Wouldn't SolrCloud have to implement bi-directional synchronization to fix 
things and "unify" S7 and S3?

And if there are updates and deletes involved, things get even messier.... :(

Otis
----
Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 



----- Original Message -----
> From: Mark Miller <markrmil...@gmail.com>
> To: solr-user <solr-user@lucene.apache.org>
> Cc: 
> Sent: Friday, June 15, 2012 5:07 PM
> Subject: Re: SolrCloud and split-brain
> 
> 
> On Jun 15, 2012, at 3:21 PM, Otis Gospodnetic wrote:
> 
>>  Thanks Mark, will open an issue in a bit.
>> 
>>  But I think the following is the real meat of the Q about split brain and 
> SolrCloud, especially when it comes to how indexing is handled during split 
> brain:
>> 
>>>>    Does this work even when outside clients (apps for indexing or 
> searching) 
>> 
>>>  send their requests directly to individual nodes?
>>>>    Let's use the example from my email where we end up with 2 
> groups of 
>>>  nodes: 7-node group with 2 ZK nodes on the same network and 3-node 
> group with 1 
>>>  ZK node on the same network.
>>>   
>>>  The 3-node group with 1 ZK would not have a functioning zk - so it 
> would stop 
>>>  accepting updates. If it could serve a complete view of the index, it 
> would 
>>>  though, for searches.
>> 
>>  So in this case information in this 1 ZK node would tell the 3 Solr nodes 
> whether they have all index data or if some shards are missing (i.e. were 
> only 
> on nodes in the other 7-node group)?
>>  And if nodes figure out they don't have all index data they will reject 
> search requests?  Or will they accept and perform searches, but return 
> responses 
> that tell the client that the searched index was not complete?
> 
> The 1 ZK node will not function, so the 3 Solr nodes will not accept updates.
> 
> If there is one replica for each shard available, search will still work. I 
> don't think partial results has been committed yet for distrib search. In 
> that case, we will put something in the header to indicate a full copy of the 
> index was not available. I think we can also add something in the header if 
> we 
> know we cannot talk to zookeeper to let the client know it could be seeing 
> stale 
> state. SmartClients that talked to zookeeper would see those nodes appear as 
> down in zookeeper and stop trying to talk to them.
> 
>> 
>>>  The 7-node group would have a working ZK it could talk to, and it would 
> continue 
>>>  to accept updates as long as a node for a shard for that hash range is 
> up. It 
>>>  would also of course serve searches.
>> 
>>  Right, so if the node for the shard where a doc is supposed to go to is in 
> that 3-node group, then the indexing request will be rejected.  Is this 
> correct? 
> 
> 
> it depends on what is available - but you will need at least one replica for 
> each shard available - eg your partition needs to have one copy of the index 
> - 
> otherwise updates are rejected if there are no nodes hosting a shard of the 
> hash 
> range. So if a replica made it into the larger partition, you will be fine - 
> it 
> will become the leader.
> 
>> 
>> 
>> 
>>  Otis 
>>  ----
>>  Performance Monitoring for Solr / ElasticSearch / HBase - 
> http://sematext.com/spm 
>> 
>> 
>> 
>>  ----- Original Message -----
>>>  From: Mark Miller <markrmil...@gmail.com>
>>>  To: solr-user <solr-user@lucene.apache.org>
>>>  Cc: 
>>>  Sent: Friday, June 15, 2012 2:22 PM
>>>  Subject: Re: SolrCloud and split-brain
>>> 
>>> 
>>>  On Jun 15, 2012, at 2:12 PM, Otis Gospodnetic wrote:
>>> 
>>>>  Makes sense.  Do responses carry something to alert the client that 
> 
>>>  "something is rotten in the state of cluster"?
>>> 
>>>  No, I don't think so - we should probably add that to the header 
> similar to 
>>>  how I assume partial results will work.
>>> 
>>>  Feel free to fire up a JIRA issue for that.
>>> 
>>>  - Mark Miller
>>>  lucidimagination.com
>>> 
> 
> - Mark Miller
> lucidimagination.com
>

Re: SolrCloud and split-brain

Reply via email to