On 6/4/2018 4:36 PM, Oakley, Craig (NIH/NLM/NCBI) [C] wrote:
We have a collection (one shard, two replicas, currently running Solr6.6) which sometimes becomes
unresponsive on the non-leader node. It is 214 gigabytes, and we were wondering whether there is a
rule of thumb how large to allow a core to grow before sharding. I have a reference in my notes
from the 2015 Solr conference in Austin "baseline no more than 100 million docs/shard"
and "ideal shard-to-memory ratio, if at all possible index should fit into RAM, but other than
that it gets really specific really fast"; but that was several versions ago, and so I wanted
to ask whether these suggestions have been recalculated.
In a word, no.
It is impossible to give generic advice. One person may have very good
performance with 300 million docs in a single index. Another may have
terrible performance with half a million docs per shard. It depends on
a lot of things, including but not limited to the specs of the servers
you use, exactly what is in your index, how you have configured Solr,
and the nature of your queries.
Thanks,
Shawn