On 2/9/2016 1:43 PM, tedsolr wrote: > I expect that rsync can be used initially to copy the collection data > folders and the zookeeper data and transaction log folders. So after > verifying Solr/ZK is functional after the install, shut it down and perform > the copy. This may sound slow but my production index size is < 100GB. Is > this approach reasonable? > > So now to keep the warm site in sync, I could use rsync on a scheduled basis > but I assume there's a better way. The ref guide says to send all indexing > requests to the second cluster at the same time they are sent to the active > cluster. I use SolrJ for all requests. So would this entail using a second > CloudSolrClient instance that only knows about the second cluster? Seems > reasonable but I don't want to lengthen the response time for the users. Is > this just a software problem to work out (separate thread)? Or is there a > SolrJ solution (asyc calls)?
The way I would personally handle keeping both systems in sync at the moment would be to modify my indexing system to update both systems in parallel. That likely would involve a second CloudSolrClient instance. There's a new feature called "Cross Data Center Replication" but as far as I know, it is only available in development versions, and has not been made available in any released version of Solr. http://yonik.com/solr-cross-data-center-replication/ This new feature may become available in 6.0 or a later 6.x release. I do not have any concrete information about the expected release date for 6.0. Thanks, Shawn