On 4/12/2013 6:45 AM, Victor Ruiz wrote:
As you can read, at the end it was due to a fail in the Solr master-slave
replication, and now I don't know if we should think about migrating to
SolrCloud, since Solr master-slave replications seems not to fit to our
requirements:

* index size:  ~20 million documents, ~9GB
* ~1200 updates/min
* ~10000 queries/min (distributed over 2 slaves)  MoreLikeThis, RealTimeGet,
TermVectorComponent, SearchHandler

I would thank you if anyone could help me to answer these questions:

* Would it be advisable to migrate to SolrCloud? Would it have impact on the
replication performance?
* In that case, what would have better performance? to maintain a copy of
the index in every server, or to use shard servers?
* How many shards and replicas would you advice for ensuring high
availability?

The fact that your replication is producing a corrupt index suggests that your network, your server hardware, or your software install is unreliable. The TCP protocol used for all Solr communication (as well as the Internet in general) has error detection and retransmissions. I'm not saying that replication can't have bugs, but usually those bugs result in replication not working, they don't typically cause index corruption.

I see a previous message where you say everything is on the same LAN with gigabit ethernet. There are a lot of things that can go wrong with gigabit. At the physical layer: Using cat5 cable instead of cat5e or cat6 can lead to problems. You could have a bad cable, or the RJ45 connectors could be badly crimped. If you are using patch panels, they may be bad or only rated for cat5. At layer 2, you can have duplex mismatches, common when one side is hard-set to full duplex and the other side is left at auto or is a dumb switch that can't be changed. Even if you have these problems, it still won't usually cause data corruption unless the hardware or OS is also faulty.

One somewhat common example of a problem that can cause data corruption in network communication is buggy firmware on the network card, especially with Broadcom chips. Upgrading to the latest firmware will usually fix these problems.

Now for your questions: SolrCloud doesn't use replication during normal operation. When you index, the indexing happens on all replicas in parallel.

Replication does sometimes get used by SolrCloud, but only if a replica goes down and there's not enough information in the transaction log to reconstruct recent updates when it comes back up.

As for whether or not to use shards: that's really up to you. Solr should have no trouble with a single-shard 9GB index that has 20 million documents, as long as you give enough memory to the java heap and have 8GB or so left over for the OS to cache the index. That means you want to have 12-16GB of RAM in each server. If Solr is not the only thing running on the hardware, then you'd want more RAM.

For the update and query volume you have described, having plenty of RAM and lots of CPU cores will be critical.

Thanks,
Shawn

Reply via email to