On 1/24/2015 10:56 PM, Dan Davis wrote: > When I polled the various projects already using Solr at my organization, I > was greatly surprised that none of them were using Solr replication, > because they had talked about "replicating" the data. > > But we are not Pinterest, and do not expect to be taking in changes one > post at a time (at least the engineers don't - just wait until its used for > a Crud app that wants full-text search on a description field!). Still, > rsync can be very, very fast with the right options (-W for gigabit > ethernet, and maybe -S for sparse files). I've clocked it at 48 MB/s over > GigE previously. > > Does anyone have any numbers for how fast Solr replication goes, and what > to do to tune it? > > I'm not enthusiastic to give-up recently tested cluster stability for a > home grown mess, but I am interested in numbers that are out there.
Numbers are included on the Solr replication wiki page, both in graph and numeric form. Gathering these numbers must have been pretty easy -- before the HTTP replication made it into Solr, Solr used to contain an rsync-based implementation. http://wiki.apache.org/solr/SolrReplication#Performance_numbers Other data on that wiki page discusses the replication config. There's not a lot to tune. I run a redundant non-SolrCloud index myself through a different method -- my indexing program indexes each index copy completely independently. There is no replication. This separation allows me to upgrade any component, or change any part of solrconfig or schema, on either copy of the index without affecting the other copy at all. With replication, if something is changed on the master or the slave, you might find that the slave no longer works, because it will be handling an index created by different software or a different config. Thanks, Shawn