Re: SolrCloud vs Solr master-slave replication

Shawn Heisey Fri, 12 Apr 2013 09:55:11 -0700

On 4/12/2013 6:45 AM, Victor Ruiz wrote:

As you can read, at the end it was due to a fail in the Solr master-slave
replication, and now I don't know if we should think about migrating to
SolrCloud, since Solr master-slave replications seems not to fit to our
requirements:


* index size:  ~20 million documents, ~9GB
* ~1200 updates/min
* ~10000 queries/min (distributed over 2 slaves)  MoreLikeThis, RealTimeGet,
TermVectorComponent, SearchHandler

I would thank you if anyone could help me to answer these questions:

* Would it be advisable to migrate to SolrCloud? Would it have impact on the
replication performance?
* In that case, what would have better performance? to maintain a copy of
the index in every server, or to use shard servers?
* How many shards and replicas would you advice for ensuring high
availability?

The fact that your replication is producing a corrupt index suggeststhat your network, your server hardware, or your software install isunreliable. The TCP protocol used for all Solr communication (as wellas the Internet in general) has error detection and retransmissions.I'm not saying that replication can't have bugs, but usually those bugsresult in replication not working, they don't typically cause indexcorruption.

I see a previous message where you say everything is on the same LANwith gigabit ethernet. There are a lot of things that can go wrong withgigabit. At the physical layer: Using cat5 cable instead of cat5e orcat6 can lead to problems. You could have a bad cable, or the RJ45connectors could be badly crimped. If you are using patch panels, theymay be bad or only rated for cat5. At layer 2, you can have duplexmismatches, common when one side is hard-set to full duplex and theother side is left at auto or is a dumb switch that can't be changed.Even if you have these problems, it still won't usually cause datacorruption unless the hardware or OS is also faulty.

One somewhat common example of a problem that can cause data corruptionin network communication is buggy firmware on the network card,especially with Broadcom chips. Upgrading to the latest firmware willusually fix these problems.

Now for your questions: SolrCloud doesn't use replication during normaloperation. When you index, the indexing happens on all replicas inparallel.

Replication does sometimes get used by SolrCloud, but only if a replicagoes down and there's not enough information in the transaction log toreconstruct recent updates when it comes back up.

As for whether or not to use shards: that's really up to you. Solrshould have no trouble with a single-shard 9GB index that has 20 milliondocuments, as long as you give enough memory to the java heap and have8GB or so left over for the OS to cache the index. That means you wantto have 12-16GB of RAM in each server. If Solr is not the only thingrunning on the hardware, then you'd want more RAM.

For the update and query volume you have described, having plenty of RAMand lots of CPU cores will be critical.


Thanks,
Shawn

Re: SolrCloud vs Solr master-slave replication

Reply via email to