Making two indexing calls, one to each, works until one system is not 
available. Then they are out of sync.

You might want to put the updates into a persistent message queue, then have 
both systems indexed from that queue.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Feb 9, 2016, at 1:49 PM, Upayavira <u...@odoko.co.uk> wrote:
> 
> There is a Cross Datacenter replication feature in the works - not sure
> of its status.
> 
> In lieu of that, I'd simply have two copies of your indexing code -
> index everything simultaneously into both clusters.
> 
> There is, of course risks that both get out of sync, so you might want
> to find some ways to identify/manage that.
> 
> Upayavira
> 
> On Tue, Feb 9, 2016, at 08:43 PM, tedsolr wrote:
>> I have a Solr Cloud cluster (v5.2.1) using a Zookeeper ensemble in my
>> primary
>> data center. I am now trying to plan for disaster recovery with an
>> available
>> warm site. I have read (many times) the disaster recovery section in the
>> Apache ref guide. I suppose I don't fully understand it.
>> 
>> What I'd like to know is the best way to sync up the existing data, and
>> the
>> best way to keep that data in sync. Assume that the warm site is an exact
>> copy (not at the network level) of the production cluster - so the same
>> servers with the same config. All servers are virtual. The use case is
>> the
>> active cluster goes down and cannot be repaired, so the warm site would
>> become the active site. This is a manual process that takes many hours to
>> accomplish (I just need to fit Solr into this existing process, I can't
>> change the process :).
>> 
>> I expect that rsync can be used initially to copy the collection data
>> folders and the zookeeper data and transaction log folders. So after
>> verifying Solr/ZK is functional after the install, shut it down and
>> perform
>> the copy. This may sound slow but my production index size is < 100GB. Is
>> this approach reasonable?
>> 
>> So now to keep the warm site in sync, I could use rsync on a scheduled
>> basis
>> but I assume there's a better way. The ref guide says to send all
>> indexing
>> requests to the second cluster at the same time they are sent to the
>> active
>> cluster. I use SolrJ for all requests. So would this entail using a
>> second
>> CloudSolrClient instance that only knows about the second cluster? Seems
>> reasonable but I don't want to lengthen the response time for the users.
>> Is
>> this just a software problem to work out (separate thread)? Or is there a
>> SolrJ solution (asyc calls)?
>> 
>> Thanks!!
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/replicate-indexing-to-second-site-tp4256240.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to