On 4/12/2013 6:45 AM, Victor Ruiz wrote:
As you can read, at the end it was due to a fail in the Solr master-slave
replication, and now I don't know if we should think about migrating to
SolrCloud, since Solr master-slave replications seems not to fit to our
requirements:
* index size: ~20 million documents, ~9GB
* ~1200 updates/min
* ~10000 queries/min (distributed over 2 slaves) MoreLikeThis, RealTimeGet,
TermVectorComponent, SearchHandler
I would thank you if anyone could help me to answer these questions:
* Would it be advisable to migrate to SolrCloud? Would it have impact on the
replication performance?
* In that case, what would have better performance? to maintain a copy of
the index in every server, or to use shard servers?
* How many shards and replicas would you advice for ensuring high
availability?
The fact that your replication is producing a corrupt index suggests
that your network, your server hardware, or your software install is
unreliable. The TCP protocol used for all Solr communication (as well
as the Internet in general) has error detection and retransmissions.
I'm not saying that replication can't have bugs, but usually those bugs
result in replication not working, they don't typically cause index
corruption.
I see a previous message where you say everything is on the same LAN
with gigabit ethernet. There are a lot of things that can go wrong with
gigabit. At the physical layer: Using cat5 cable instead of cat5e or
cat6 can lead to problems. You could have a bad cable, or the RJ45
connectors could be badly crimped. If you are using patch panels, they
may be bad or only rated for cat5. At layer 2, you can have duplex
mismatches, common when one side is hard-set to full duplex and the
other side is left at auto or is a dumb switch that can't be changed.
Even if you have these problems, it still won't usually cause data
corruption unless the hardware or OS is also faulty.
One somewhat common example of a problem that can cause data corruption
in network communication is buggy firmware on the network card,
especially with Broadcom chips. Upgrading to the latest firmware will
usually fix these problems.
Now for your questions: SolrCloud doesn't use replication during normal
operation. When you index, the indexing happens on all replicas in
parallel.
Replication does sometimes get used by SolrCloud, but only if a replica
goes down and there's not enough information in the transaction log to
reconstruct recent updates when it comes back up.
As for whether or not to use shards: that's really up to you. Solr
should have no trouble with a single-shard 9GB index that has 20 million
documents, as long as you give enough memory to the java heap and have
8GB or so left over for the OS to cache the index. That means you want
to have 12-16GB of RAM in each server. If Solr is not the only thing
running on the hardware, then you'd want more RAM.
For the update and query volume you have described, having plenty of RAM
and lots of CPU cores will be critical.
Thanks,
Shawn