Interesting about the Zookeeper quorum problem. What if we were to run three Zookeepers in our primary data center and four in the backup data center. If we failed over, we wouldn't have a quorum, but we could kill one of the Zookeepers to restore a quorum, couldn't we? If we did extend the SolrCloud cluster into a second data center, wouldn't queries against the cluster be routed to the second data center sometimes?
Unfortunately we do generally need near real time, as our search index is under constant update, although we could afford for updates to be delayed for a while. We feed the search index based upon the contents of a queue. But we definitely cannot bring Solr down to re-establish SolrCloud's cluster. I will look into Flume and see what it offers us as well. Thanks for the input! Darrell -----Original Message----- From: Daniel Collins [mailto:danwcoll...@gmail.com] Sent: Monday, February 03, 2014 4:16 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud multiple data center support Option a) doesn't really work out of the box, *if you need NRT support*. The main reason (for us at least) is the ZK ensemble and maintaining quorum. If you have a single ensemble, say 3 ZKs in 1 DC and 2 in another, then if you lose DC 2, you lose 2 ZKs and the rest are fine. But if you lose the main DC that has 3 ZKs, you lose quorum. Searches will be ok, but if you are an NRT-setup, your updates will all stall until you get another ZK started (and reload the whole Solr Cloud to give them the ID of that new ZK). For us, availability is more important than consistency, so we currently have 2 independent setups, 1 ZK ensemble and Solr Cloud per DC. We already had an indexing system that serviced DCs so we didn't need something like Flume. We also have external systems that handle routing to some extent, so we can route "locally" to each Cloud, and not have to worry about cross-DC traffic. One solution to that is have a 3rd DC with few instances in, say another 2 ZKs. That would take your total ensemble to 7, and you can lose 3 whilst still maintaining quorum. Since ZK is relatively light-weight, that 3rd "Data Centre" doesn't have to be as robust, or contain Solr replicas, its just a place to house 1 or 2 machines for holding ZKs. We will probably migrate to this kind of setup soon as it ticks more of our boxes. One other option is in ZK trunk (but not yet in a release) is the ability to dynamically reconfigure ZK ensembles ( https://issues.apache.org/jira/browse/ZOOKEEPER-107). That would give the ability to create new ZK instances in the event of a DC failure, and reconfigure the Solr Cloud without having to reload everything. That would help to some extent. If you don't need NRT, then the solution is somewhat easier, as you don't have to worry as much about ZK quorum, a single ZK ensemble across DCs might be sufficient for you in that case.