Unless Solr is your system of record, aren't you already replicating your source data across the WAN? If so, could you load Solr in colo B from your colo B data source? You may be duplicating some indexing work, but at least your colo B Solr would be more closely in sync with your colo B data.
Toby Sent via BlackBerry by AT&T -----Original Message----- From: Tim Potter <tim.pot...@lucidworks.com> Date: Wed, 5 Mar 2014 02:51:21 To: solr-user@lucene.apache.org<solr-user@lucene.apache.org> Reply-To: solr-user@lucene.apache.org Subject: RE: Replicating Between Solr Clouds Unfortunately, there is no out-of-the-box solution for this at the moment. In the past, I solved this using a couple of different approaches, which weren't all that elegant but served the purpose and were simple enough to allow the ops folks to setup monitors and alerts if things didn't work. 1) use DIH's Solr entity processor to pull data from one Solr to another, see: http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor This only works if you store all fields, which in my use case was OK because I also did lots of partial document updates, which also required me to store all fields 2) use the replication handler's snapshot support to create snapshots on a regular basis and then move the files over the network This one works but required the use of read and write aliases and two collections on the remote (slave) data center so that I could rebuild my write collection from the snapshots and then update the aliases to point the reads at the updated collection. Work on an automated backup/restore solution is planned, see https://issues.apache.org/jira/browse/SOLR-5750, but if you need something sooner, you can write a backup driver using SolrJ that uses CloudSolrServer to get the address of all the shard leaders, initiate the backup command on each leader, poll the replication details handler for snapshot completion on each shard, and then ship the files across the network. Obviously, this isn't a solution for NRT multi-homing ;-) Lastly, these aren't the only ways to go about this, just wanted to share some high-level details about what has worked. Timothy Potter Sr. Software Engineer, LucidWorks www.lucidworks.com ________________________________________ From: perdurabo <robert_par...@volusion.com> Sent: Tuesday, March 04, 2014 1:04 PM To: solr-user@lucene.apache.org Subject: Replicating Between Solr Clouds We are looking to setup a highly available failover site across a WAN for our SolrCloud instance. The main production instance is at colo center A and consists of a 3-node ZooKeeper ensemble managing configs for a 4-node SolrCloud running Solr 4.6.1. We only have one collection among the 4 cores and there are two shards in the collection, one master node and one replica node for each shard. Our search and indexing services address the Solr cloud through a load balancer VIP, not a compound API call. Anyway, the Solr wiki explains fairly well how to replicate single node Solr collections, but I do not see an obvious way for replicating a SolrCloud's indices over a WAN to another SolrCloud. I need for a SolrCloud in another data center to be able to replicate both shards of the collection in the other data center over a WAN. It needs to be able to replicate from a load balancer VIP, not a single named server of the SolrCloud, which round robins across all four nodes/2 shards for high availability. I've searched high and low for a white paper or some discussion of how to do this and haven't found anything. Any ideas? Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Replicating-Between-Solr-Clouds-tp4121196.html Sent from the Solr - User mailing list archive at Nabble.com.