Interesting about the Zookeeper quorum problem. What if we were to run three 
Zookeepers in our primary data center and four in the backup data center. If we 
failed over, we wouldn't have a quorum, but we could kill one of the Zookeepers 
to restore a quorum, couldn't we? If we did extend the SolrCloud cluster into a 
second data center, wouldn't queries against the cluster be routed to the 
second data center sometimes? 

Unfortunately we do generally need near real time, as our search index is under 
constant update, although we could afford for updates to be delayed for a 
while. We feed the search index based upon the contents of a queue. But we 
definitely cannot bring Solr down to re-establish SolrCloud's cluster.

I will look into Flume and see what it offers us as well.

Thanks for the input!

Darrell


-----Original Message-----
From: Daniel Collins [mailto:danwcoll...@gmail.com] 
Sent: Monday, February 03, 2014 4:16 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud multiple data center support

Option a) doesn't really work out of the box, *if you need NRT support*.
 The main reason (for us at least) is the ZK ensemble and maintaining quorum. 
If you have a single ensemble, say 3 ZKs in 1 DC and 2 in another, then if you 
lose DC 2, you lose 2 ZKs and the rest are fine.  But if you lose the main DC 
that has 3 ZKs, you lose quorum.  Searches will be ok, but if you are an 
NRT-setup, your updates will all stall until you get another ZK started (and 
reload the whole Solr Cloud to give them the ID of that new ZK).

For us, availability is more important than consistency, so we currently have 2 
independent setups, 1 ZK ensemble and Solr Cloud per DC.  We already had an 
indexing system that serviced DCs so we didn't need something like Flume.  We 
also have external systems that handle routing to some extent, so we can route 
"locally" to each Cloud, and not have to worry about cross-DC traffic.

One solution to that is have a 3rd DC with few instances in, say another 2 ZKs. 
That would take your total ensemble to 7, and you can lose 3 whilst still 
maintaining quorum.  Since ZK is relatively light-weight, that 3rd "Data 
Centre" doesn't have to be as robust, or contain Solr replicas, its just a 
place to house 1 or 2 machines for holding ZKs.  We will probably migrate to 
this kind of setup soon as it ticks more of our boxes.

One other option is in ZK trunk (but not yet in a release) is the ability to 
dynamically reconfigure ZK ensembles ( 
https://issues.apache.org/jira/browse/ZOOKEEPER-107).  That would give the 
ability to create new ZK instances in the event of a DC failure, and 
reconfigure the Solr Cloud without having to reload everything. That would help 
to some extent.

If you don't need NRT, then the solution is somewhat easier, as you don't have 
to worry as much about ZK quorum, a single ZK ensemble across DCs might be 
sufficient for you in that case.

Reply via email to