On 2/4/2014 10:14 PM, Darrell Burgan wrote:
> Interesting about the Zookeeper quorum problem. What if we were to run three 
> Zookeepers in our primary data center and four in the backup data center. If 
> we failed over, we wouldn't have a quorum, but we could kill one of the 
> Zookeepers to restore a quorum, couldn't we? If we did extend the SolrCloud 
> cluster into a second data center, wouldn't queries against the cluster be 
> routed to the second data center sometimes? 

If you have seven zookeeper servers in your ensemble, at least four of
them must be operational to have quorum.  With N instances, int(N/2)+1
of them need to be running.  In order to restore quorum when a data
center outage takes out half your quorum, you would need to reconfigure
each surviving instance in the cluster so that it had fewer servers in
it, then restart all the ZK instances.  I have no idea what would happen
when the down data center is restored, but to get it working right,
you'd have to reconfigure and restart again.

Zookeeper simply isn't designed to deal with data center failure in a
two center scenario.  You can have workable solution if you have at
least three data centers and you assume that you won't ever have a
situation where more than one goes down.  I don't know that you can make
that assumption, of course.

If you have replicas for one collection in two data centers, SolrCloud
will direct queries to all of the replicas, meaning that some of them
will have high latency.  There is currently no logic to specify or
prefer "local" replicas.

Right now the only viable solution with two data centers is independent
SolrCloud installs that are kept up to date independently.

I've never looked at Flume.  My indexing program will update multiple
independent copies of the index.  All my servers are in the same
location, but it would theoretically work with multiple locations too.

Thanks,
Shawn

Reply via email to