One thing that I forget to mention is that my clients can aggregate by any field in the schema with limit=-1, this is not a problem with 99% of the fields, but 2 or 3 of them are URLs. URLs has very high cardinality and one of the reasons to sharding collections is to lower the memory footprint to not blow the node and do the last merge in a big machine.
"Should a collection grow past whatever threshold you determine, you can always split it.” Every time I run the SPLITSHARD command, the command fails in a different way. IMHO right now Solr doesn’t have an efficient way to rebalance collection’s shard. "And yes, more logistics on your part as one size no longer fits all” The key point of this deploy is reduce the amount of management as much as possible, Solr improved the management of the cluster a lot in comparison with 4.x release. Even so, remains difficult manage a big cluster without custom tools. Solr continues to improve with each version, and I saw issues with a lot of nice stuff like SOLR-9735 and SOLR-9241 -- /Yago Riveiro On 26 Dec 2016 22:10 +0000, Toke Eskildsen <t...@statsbiblioteket.dk>, wrote: > Yago Riveiro <yago.rive...@gmail.com> wrtoe: > > My cluster holds more than 10B documents stored in 15T. > > > > The size of my collections is variable but I have collections with 800M > > documents distributed over the 12 nodes, the amount of documents per shard > > is ~66M and indeed the performance is good. > > The math supports Erick's point about over-sharding. On average you have: > 15 TB/ 1200 collections / 12 shards ~= 1GB / shard. > 10B docs / 1200 collections / 12 shards ~= 700K docs/shard > > While your 12 shards fits well with your large collections, such as the one > you described above, they are a very poor match for your average collection. > Assuming your collections behave roughly the same way as each other, your > average and smaller than average collections would be much better off with > just 1 shard (and 2 replicas). That eliminates the overhead of distributed > search-requests (for that collection) and lowers your overall shard-count > significantly. Should a collection grow past whatever threshold you > determine, you can always split it. > > Better performance, lower hardware requirements, more manageable shard > amount. And yes, more logistics on your part as one size no longer fits all. > > - Toke Eskildsen