Yago Riveiro <yago.rive...@gmail.com> wrtoe:
> My cluster holds more than 10B documents stored in 15T.
> 
> The size of my collections is variable but I have collections with 800M
> documents distributed over the 12 nodes, the amount of documents per shard
> is ~66M and indeed the performance is good.

The math supports Erick's point about over-sharding. On average you have:
15 TB/ 1200 collections / 12 shards ~= 1GB / shard.
10B docs / 1200 collections / 12 shards ~= 700K docs/shard

While your 12 shards fits well with your large collections, such as the one you 
described above, they are a very poor match for your average collection. 
Assuming your collections behave roughly the same way as each other, your 
average and smaller than average collections would be much better off with just 
1 shard (and 2 replicas). That eliminates the overhead of distributed 
search-requests (for that collection) and lowers your overall shard-count 
significantly. Should a collection grow past whatever threshold you determine, 
you can always split it.

Better performance, lower hardware requirements, more manageable shard amount. 
And yes, more logistics on your part as one size no longer fits all.

- Toke Eskildsen

Reply via email to