On 1/24/2015 10:56 PM, Dan Davis wrote:
> When I polled the various projects already using Solr at my organization, I
> was greatly surprised that none of them were using Solr replication,
> because they had talked about "replicating" the data.
> 
> But we are not Pinterest, and do not expect to be taking in changes one
> post at a time (at least the engineers don't - just wait until its used for
> a Crud app that wants full-text search on a description field!).    Still,
> rsync can be very, very fast with the right options (-W for gigabit
> ethernet, and maybe -S for sparse files).   I've clocked it at 48 MB/s over
> GigE previously.
> 
> Does anyone have any numbers for how fast Solr replication goes, and what
> to do to tune it?
> 
> I'm not enthusiastic to give-up recently tested cluster stability for a
> home grown mess, but I am interested in numbers that are out there.

Numbers are included on the Solr replication wiki page, both in graph
and numeric form.  Gathering these numbers must have been pretty easy --
before the HTTP replication made it into Solr, Solr used to contain an
rsync-based implementation.

http://wiki.apache.org/solr/SolrReplication#Performance_numbers

Other data on that wiki page discusses the replication config.  There's
not a lot to tune.

I run a redundant non-SolrCloud index myself through a different method
-- my indexing program indexes each index copy completely independently.
 There is no replication.  This separation allows me to upgrade any
component, or change any part of solrconfig or schema, on either copy of
the index without affecting the other copy at all.  With replication, if
something is changed on the master or the slave, you might find that the
slave no longer works, because it will be handling an index created by
different software or a different config.

Thanks,
Shawn

Reply via email to