Let's say I was primarily interested in ensuring there is a DR copy of the
search index that is replicated to the remote data center, but I do not want
the Solr instances in the remote data center to be part of the SolrCloud
cluster, and that I am willing to accept some downtime in bringing up a Solr
cluster in the remote data center if we have to use it. Can I use old
http-based replication from a remote slave against one of the SolrCloud servers
to accomplish that?
Primary Data Center
3 x Zookeeper
2 x Solr (clustered via SolrCloud)
1 x collection
1 x shard
Remote Data Center
1 x Solr (configured as standalone replication slave against one of the
primary data center Solr servers)
Would this work to at least get the data to the remote data center in a
reliable way?
Thanks,
Darrell
-----Original Message-----
From: Shawn Heisey [mailto:[email protected]]
Sent: Wednesday, February 05, 2014 12:39 AM
To: [email protected]
Subject: Re: SolrCloud multiple data center support
On 2/4/2014 10:14 PM, Darrell Burgan wrote:
> Interesting about the Zookeeper quorum problem. What if we were to run three
> Zookeepers in our primary data center and four in the backup data center. If
> we failed over, we wouldn't have a quorum, but we could kill one of the
> Zookeepers to restore a quorum, couldn't we? If we did extend the SolrCloud
> cluster into a second data center, wouldn't queries against the cluster be
> routed to the second data center sometimes?
If you have seven zookeeper servers in your ensemble, at least four of them
must be operational to have quorum. With N instances, int(N/2)+1 of them need
to be running. In order to restore quorum when a data center outage takes out
half your quorum, you would need to reconfigure each surviving instance in the
cluster so that it had fewer servers in it, then restart all the ZK instances.
I have no idea what would happen when the down data center is restored, but to
get it working right, you'd have to reconfigure and restart again.
Zookeeper simply isn't designed to deal with data center failure in a two
center scenario. You can have workable solution if you have at least three
data centers and you assume that you won't ever have a situation where more
than one goes down. I don't know that you can make that assumption, of course.
If you have replicas for one collection in two data centers, SolrCloud will
direct queries to all of the replicas, meaning that some of them will have high
latency. There is currently no logic to specify or prefer "local" replicas.
Right now the only viable solution with two data centers is independent
SolrCloud installs that are kept up to date independently.
I've never looked at Flume. My indexing program will update multiple
independent copies of the index. All my servers are in the same location, but
it would theoretically work with multiple locations too.
Thanks,
Shawn