Re: Replicating Between Solr Clouds

Toby Lazar Wed, 05 Mar 2014 05:42:14 -0800

Unless Solr is your system of record, aren't you already replicating your 
source data across the WAN?  If so, could you load Solr in colo B from your 
colo B data source?  You may be duplicating some indexing work, but at least 
your colo B Solr would be more closely in sync with your colo B data.

Toby
Sent via BlackBerry by AT&T

-----Original Message-----
From: Tim Potter <tim.pot...@lucidworks.com>
Date: Wed, 5 Mar 2014 02:51:21 
To: solr-user@lucene.apache.org<solr-user@lucene.apache.org>
Reply-To: solr-user@lucene.apache.org
Subject: RE: Replicating Between Solr Clouds

Unfortunately, there is no out-of-the-box solution for this at the moment. 

In the past, I solved this using a couple of different approaches, which 
weren't all that elegant but served the purpose and were simple enough to allow 
the ops folks to setup monitors and alerts if things didn't work.

1) use DIH's Solr entity processor to pull data from one Solr to another, see: 
http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor

This only works if you store all fields, which in my use case was OK because I 
also did lots of partial document updates, which also required me to store all 
fields

2) use the replication handler's snapshot support to create snapshots on a 
regular basis and then move the files over the network

This one works but required the use of read and write aliases and two 
collections on the remote (slave) data center so that I could rebuild my write 
collection from the snapshots and then update the aliases to point the reads at 
the updated collection. Work on an automated backup/restore solution is 
planned, see https://issues.apache.org/jira/browse/SOLR-5750, but if you need 
something sooner, you can write a backup driver using SolrJ that uses 
CloudSolrServer to get the address of all the shard leaders, initiate the 
backup command on each leader, poll the replication details handler for 
snapshot completion on each shard, and then ship the files across the network. 
Obviously, this isn't a solution for NRT multi-homing ;-)

Lastly, these aren't the only ways to go about this, just wanted to share some 
high-level details about what has worked.

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com

________________________________________
From: perdurabo <robert_par...@volusion.com>
Sent: Tuesday, March 04, 2014 1:04 PM
To: solr-user@lucene.apache.org
Subject: Replicating Between Solr Clouds

We are looking to setup a highly available failover site across a WAN for our
SolrCloud instance.  The main production instance is at colo center A and
consists of a 3-node ZooKeeper ensemble managing configs for a 4-node
SolrCloud running Solr 4.6.1.  We only have one collection among the 4 cores
and there are two shards in the collection, one master node and one replica
node for each shard.  Our search and indexing services address the Solr
cloud through a load balancer VIP, not a compound API call.

Anyway, the Solr wiki explains fairly well how to replicate single node Solr
collections, but I do not see an obvious way for replicating a SolrCloud's
indices over a WAN to another SolrCloud.  I need for a SolrCloud in another
data center to be able to replicate both shards of the collection in the
other data center over a WAN.  It needs to be able to replicate from a load
balancer VIP, not a single named server of the SolrCloud, which round robins
across all four nodes/2 shards for high availability.

I've searched high and low for a white paper or some discussion of how to do
this and haven't found anything.  Any ideas?

Thanks in advance.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replicating-Between-Solr-Clouds-tp4121196.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Replicating Between Solr Clouds

Reply via email to