On 8/4/2019 8:53 AM, Kaminski, Adi wrote:
Erick - thanks a lot for answering and sharing the below article, it's very
helpful !
I have another follow-up question - assuming we have 400 vCPUs across our
SolrCloud cluster nodes, will it be better to have 400 shards with replication
factor 2
or 200 shards with replication factor 4 ? What utilizes better the CPUs -
shards/or their replicas ?
Why so many shards? When I think of sharding an index, it's typically
somewhere between 2 and 10 shards. In my opinion, the only use case for
over-sharding is for situations where the index will be growing at a
significant rate. As it grows, you can add servers and spread the
shards out.
I think the best rule of thumb for the number of shards is "the smallest
number that will get the job done. One if possible."
A key piece of information for this decision is the query rate. If
there is a lot of spare CPU capacity and the query rate is REALLY low,
multiple shards on the same machine might increase performance. But as
the query rate increases, performance will decrease when multiple shards
are on each machine, and it will become important for one machine to
handle one shard replica.
Thanks,
Shawn