On 8/4/2019 8:53 AM, Kaminski, Adi wrote:
Erick - thanks a lot for answering and sharing the below article, it's very 
helpful !

I have another follow-up question - assuming we have 400 vCPUs across our 
SolrCloud cluster nodes, will it be better to have 400 shards with replication 
factor 2
or 200 shards with replication factor 4 ? What utilizes better the CPUs - 
shards/or their replicas ?

Why so many shards? When I think of sharding an index, it's typically somewhere between 2 and 10 shards. In my opinion, the only use case for over-sharding is for situations where the index will be growing at a significant rate. As it grows, you can add servers and spread the shards out.

I think the best rule of thumb for the number of shards is "the smallest number that will get the job done. One if possible."

A key piece of information for this decision is the query rate. If there is a lot of spare CPU capacity and the query rate is REALLY low, multiple shards on the same machine might increase performance. But as the query rate increases, performance will decrease when multiple shards are on each machine, and it will become important for one machine to handle one shard replica.

Thanks,
Shawn

Reply via email to