On 2/4/2014 10:14 PM, Darrell Burgan wrote: > Interesting about the Zookeeper quorum problem. What if we were to run three > Zookeepers in our primary data center and four in the backup data center. If > we failed over, we wouldn't have a quorum, but we could kill one of the > Zookeepers to restore a quorum, couldn't we? If we did extend the SolrCloud > cluster into a second data center, wouldn't queries against the cluster be > routed to the second data center sometimes?
If you have seven zookeeper servers in your ensemble, at least four of them must be operational to have quorum. With N instances, int(N/2)+1 of them need to be running. In order to restore quorum when a data center outage takes out half your quorum, you would need to reconfigure each surviving instance in the cluster so that it had fewer servers in it, then restart all the ZK instances. I have no idea what would happen when the down data center is restored, but to get it working right, you'd have to reconfigure and restart again. Zookeeper simply isn't designed to deal with data center failure in a two center scenario. You can have workable solution if you have at least three data centers and you assume that you won't ever have a situation where more than one goes down. I don't know that you can make that assumption, of course. If you have replicas for one collection in two data centers, SolrCloud will direct queries to all of the replicas, meaning that some of them will have high latency. There is currently no logic to specify or prefer "local" replicas. Right now the only viable solution with two data centers is independent SolrCloud installs that are kept up to date independently. I've never looked at Flume. My indexing program will update multiple independent copies of the index. All my servers are in the same location, but it would theoretically work with multiple locations too. Thanks, Shawn