Making two indexing calls, one to each, works until one system is not available. Then they are out of sync.
You might want to put the updates into a persistent message queue, then have both systems indexed from that queue. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 9, 2016, at 1:49 PM, Upayavira <u...@odoko.co.uk> wrote: > > There is a Cross Datacenter replication feature in the works - not sure > of its status. > > In lieu of that, I'd simply have two copies of your indexing code - > index everything simultaneously into both clusters. > > There is, of course risks that both get out of sync, so you might want > to find some ways to identify/manage that. > > Upayavira > > On Tue, Feb 9, 2016, at 08:43 PM, tedsolr wrote: >> I have a Solr Cloud cluster (v5.2.1) using a Zookeeper ensemble in my >> primary >> data center. I am now trying to plan for disaster recovery with an >> available >> warm site. I have read (many times) the disaster recovery section in the >> Apache ref guide. I suppose I don't fully understand it. >> >> What I'd like to know is the best way to sync up the existing data, and >> the >> best way to keep that data in sync. Assume that the warm site is an exact >> copy (not at the network level) of the production cluster - so the same >> servers with the same config. All servers are virtual. The use case is >> the >> active cluster goes down and cannot be repaired, so the warm site would >> become the active site. This is a manual process that takes many hours to >> accomplish (I just need to fit Solr into this existing process, I can't >> change the process :). >> >> I expect that rsync can be used initially to copy the collection data >> folders and the zookeeper data and transaction log folders. So after >> verifying Solr/ZK is functional after the install, shut it down and >> perform >> the copy. This may sound slow but my production index size is < 100GB. Is >> this approach reasonable? >> >> So now to keep the warm site in sync, I could use rsync on a scheduled >> basis >> but I assume there's a better way. The ref guide says to send all >> indexing >> requests to the second cluster at the same time they are sent to the >> active >> cluster. I use SolrJ for all requests. So would this entail using a >> second >> CloudSolrClient instance that only knows about the second cluster? Seems >> reasonable but I don't want to lengthen the response time for the users. >> Is >> this just a software problem to work out (separate thread)? Or is there a >> SolrJ solution (asyc calls)? >> >> Thanks!! >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/replicate-indexing-to-second-site-tp4256240.html >> Sent from the Solr - User mailing list archive at Nabble.com.