Sure, ZK does by design not support a two-node/two-location setup. But still, 
users may want/need to deploy that,
and my question was if there are smart ways to make such a setup as little 
painful as possible in case of failure.

Take the example of DC1: 3xZK and DC2: 2xZK again. And then DC1 goes BOOM.
Without an active action DC2 would be read-only
What if then the Ops personnel in DC2 could, with a single script/command, 
instruct DC2 to resume “master” role:
- Add a 3rd DC2 ZK to the two existing, reconfigure and let them sync up.
- Rolling restart of Solr nodes with new ZK_HOST string
Of course, they would also then need to make sure that DC1 does not boot up 
again before compatible change has been done there too.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 23. mai 2017 kl. 18.56 skrev Shawn Heisey <elyog...@elyograg.org>:
> 
> On 5/23/2017 10:12 AM, Susheel Kumar wrote:
>> Hi Jan, FYI - Since last year, I have been running a Solr 6.0 cluster in one 
>> of lower env with 6 shards/replica in dc1 & 6 shard/replica in dc2 (each 
>> shard replicated cross data center) with 3 ZK in dc1 and 2 ZK in dc2. (I 
>> didn't have the availability of 3rd data center for ZK so went with only 2 
>> data center with above configuration) and so far no issues. Its been running 
>> fine, indexing, replicating data, serving queries etc. So in my test, 
>> setting up single cluster across two zones/data center works without any 
>> issue when there is no or very minimal latency (in my case around 30ms one 
>> way
> 
> With that setup, if dc2 goes down, you're all good, but if dc1 goes down, 
> you're not.
> 
> There aren't enough ZK servers in dc2 to maintain quorum when dc1 is 
> unreachable, and SolrCloud is going to go read-only.  Queries would most 
> likely work, but you would not be able to change the indexes at all.
> 
> ZooKeeper with N total servers requires int((N/2)+1) servers to be 
> operational to maintain quorum.  This means that with five total servers, 
> three must be operational and able to talk to each other, or ZK cannot 
> guarantee that there is no split-brain, so quorum is lost.
> 
> ZK in two data centers will never be fully fault-tolerant. There is no 
> combination of servers that will work properly.  You must have three data 
> centers for a geographically fault-tolerant cluster.  Solr would be optional 
> in the third data center.  ZK must be installed in all three.
> 
> Thanks,
> Shawn
> 

Reply via email to