Updating two systems in parallel gets into two-phase commit, instantly. So you need a persistent pool of updates that both clusters pull from.
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 9, 2016, at 4:15 PM, Shawn Heisey <apa...@elyograg.org> wrote: > > On 2/9/2016 1:43 PM, tedsolr wrote: >> I expect that rsync can be used initially to copy the collection data >> folders and the zookeeper data and transaction log folders. So after >> verifying Solr/ZK is functional after the install, shut it down and perform >> the copy. This may sound slow but my production index size is < 100GB. Is >> this approach reasonable? >> >> So now to keep the warm site in sync, I could use rsync on a scheduled basis >> but I assume there's a better way. The ref guide says to send all indexing >> requests to the second cluster at the same time they are sent to the active >> cluster. I use SolrJ for all requests. So would this entail using a second >> CloudSolrClient instance that only knows about the second cluster? Seems >> reasonable but I don't want to lengthen the response time for the users. Is >> this just a software problem to work out (separate thread)? Or is there a >> SolrJ solution (asyc calls)? > > The way I would personally handle keeping both systems in sync at the > moment would be to modify my indexing system to update both systems in > parallel. That likely would involve a second CloudSolrClient instance. > > There's a new feature called "Cross Data Center Replication" but as far > as I know, it is only available in development versions, and has not > been made available in any released version of Solr. > > http://yonik.com/solr-cross-data-center-replication/ > > This new feature may become available in 6.0 or a later 6.x release. I > do not have any concrete information about the expected release date for > 6.0. > > Thanks, > Shawn >