Here's what we've decided to do. All updates and deletes from our collections 
will no longer be applied directly to SolrCloud via Solrj. Instead, they will 
become messages of a certain topic that go through a RabbitMQ exchange, where 
an agent in each data center subscribes to the topic with a queue specific to 
its data center. We will run each agent as a separate webapp inside the same 
Tomcat instance that hosts Solr itself, on each of our Solr servers. As 
messages come in, the agent receives them, and then uses Solrj to update them 
directly into SolrCloud.

The key is RabbitMQ's ability to send the same message to multiple queues that 
subscribe to the same topic. If each data center sets up a single queue that 
subscribes to the correct topic, both data centers will receive all the update 
and delete messages, and will update their indexes accordingly. The net is we 
have two completely separate SolrCloud clusters, with 2 Solr servers and 3 
Zookeepers each, which are all kept up to date in almost lock step.

We're planning on using this capability both to provide for a hot disaster 
recovery backup in a remote data center, as well as to provide distributed 
active/active search indexes across many data centers. As long as every 
update/delete message goes into the same federated RabbitMQ exchange and queue, 
all data centers will receive the update/delete messages and keep their indexes 
up to date independently.

We're also talking to the folks at DataStax about their commercial product, 
which seems to layer Solr atop the Cassandra distributed data store. This might 
provide and even more elegant solution that what we're doing. But that is a bit 
further down the road.

Thanks for the help,
Darrell Burgan



-----Original Message-----
From: Darrell Burgan 
Sent: Wednesday, February 05, 2014 6:48 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrCloud multiple data center support

Let's say I was primarily interested in ensuring there is a DR copy of the 
search index that is replicated to the remote data center, but I do not want 
the Solr instances in the remote data center to be part of the SolrCloud 
cluster, and that I am willing to accept some downtime in bringing up a Solr 
cluster in the remote data center if we have to use it. Can I use old 
http-based replication from a remote slave against one of the SolrCloud servers 
to accomplish that?

Primary Data Center
        3 x Zookeeper
        2 x Solr (clustered via SolrCloud)
        1 x collection
        1 x shard

Remote Data Center
        1 x Solr (configured as standalone replication slave against one of the 
primary data center Solr servers)

Would this work to at least get the data to the remote data center in a 
reliable way?

Thanks,
Darrell

Reply via email to