Thanks! On Sunday, January 25, 2015, Erick Erickson <erickerick...@gmail.com> wrote:
> @Shawn: Cool table, thanks! > > @Dan: > Just to throw a different spin on it, if you migrate to SolrCloud, then > this question becomes moot as the raw documents are sent to each of the > replicas so you very rarely have to copy the full index. Kind of a tradeoff > between constant load because you're sending the raw documents around > whenever you index and peak usage when the index replicates. > > There are a bunch of other reasons to go to SolrCloud, but you know your > problem space best. > > FWIW, > Erick > > On Sun, Jan 25, 2015 at 9:26 AM, Shawn Heisey <apa...@elyograg.org > <javascript:;>> wrote: > > > On 1/24/2015 10:56 PM, Dan Davis wrote: > > > When I polled the various projects already using Solr at my > > organization, I > > > was greatly surprised that none of them were using Solr replication, > > > because they had talked about "replicating" the data. > > > > > > But we are not Pinterest, and do not expect to be taking in changes one > > > post at a time (at least the engineers don't - just wait until its used > > for > > > a Crud app that wants full-text search on a description field!). > > Still, > > > rsync can be very, very fast with the right options (-W for gigabit > > > ethernet, and maybe -S for sparse files). I've clocked it at 48 MB/s > > over > > > GigE previously. > > > > > > Does anyone have any numbers for how fast Solr replication goes, and > what > > > to do to tune it? > > > > > > I'm not enthusiastic to give-up recently tested cluster stability for a > > > home grown mess, but I am interested in numbers that are out there. > > > > Numbers are included on the Solr replication wiki page, both in graph > > and numeric form. Gathering these numbers must have been pretty easy -- > > before the HTTP replication made it into Solr, Solr used to contain an > > rsync-based implementation. > > > > http://wiki.apache.org/solr/SolrReplication#Performance_numbers > > > > Other data on that wiki page discusses the replication config. There's > > not a lot to tune. > > > > I run a redundant non-SolrCloud index myself through a different method > > -- my indexing program indexes each index copy completely independently. > > There is no replication. This separation allows me to upgrade any > > component, or change any part of solrconfig or schema, on either copy of > > the index without affecting the other copy at all. With replication, if > > something is changed on the master or the slave, you might find that the > > slave no longer works, because it will be handling an index created by > > different software or a different config. > > > > Thanks, > > Shawn > > > > >