Yago Riveiro <yago.rive...@gmail.com> wrtoe: > My cluster holds more than 10B documents stored in 15T. > > The size of my collections is variable but I have collections with 800M > documents distributed over the 12 nodes, the amount of documents per shard > is ~66M and indeed the performance is good.
The math supports Erick's point about over-sharding. On average you have: 15 TB/ 1200 collections / 12 shards ~= 1GB / shard. 10B docs / 1200 collections / 12 shards ~= 700K docs/shard While your 12 shards fits well with your large collections, such as the one you described above, they are a very poor match for your average collection. Assuming your collections behave roughly the same way as each other, your average and smaller than average collections would be much better off with just 1 shard (and 2 replicas). That eliminates the overhead of distributed search-requests (for that collection) and lowers your overall shard-count significantly. Should a collection grow past whatever threshold you determine, you can always split it. Better performance, lower hardware requirements, more manageable shard amount. And yes, more logistics on your part as one size no longer fits all. - Toke Eskildsen