One thing that I forget to mention is that my clients can aggregate by any 
field in the schema with limit=-1, this is not a problem with 99% of the 
fields, but 2 or 3 of them are URLs. URLs has very high cardinality and one of 
the reasons to sharding collections is to lower the memory footprint to not 
blow the node and do the last merge in a big machine.

"Should a collection grow past whatever threshold you determine, you can always 
split it.”

Every time I run the SPLITSHARD command, the command fails in a different way. 
IMHO right now Solr doesn’t have an efficient way to rebalance collection’s 
shard.

"And yes, more logistics on your part as one size no longer fits all”

The key point of this deploy is reduce the amount of management as much as 
possible, Solr improved the management of the cluster a lot in comparison with 
4.x release. Even so, remains difficult manage a big cluster without custom 
tools.

Solr continues to improve with each version, and I saw issues with a lot of 
nice stuff like SOLR-9735 and SOLR-9241

--

/Yago Riveiro

On 26 Dec 2016 22:10 +0000, Toke Eskildsen <t...@statsbiblioteket.dk>, wrote:
> Yago Riveiro <yago.rive...@gmail.com> wrtoe:
> > My cluster holds more than 10B documents stored in 15T.
> >
> > The size of my collections is variable but I have collections with 800M
> > documents distributed over the 12 nodes, the amount of documents per shard
> > is ~66M and indeed the performance is good.
>
> The math supports Erick's point about over-sharding. On average you have:
> 15 TB/ 1200 collections / 12 shards ~= 1GB / shard.
> 10B docs / 1200 collections / 12 shards ~= 700K docs/shard
>
> While your 12 shards fits well with your large collections, such as the one 
> you described above, they are a very poor match for your average collection. 
> Assuming your collections behave roughly the same way as each other, your 
> average and smaller than average collections would be much better off with 
> just 1 shard (and 2 replicas). That eliminates the overhead of distributed 
> search-requests (for that collection) and lowers your overall shard-count 
> significantly. Should a collection grow past whatever threshold you 
> determine, you can always split it.
>
> Better performance, lower hardware requirements, more manageable shard 
> amount. And yes, more logistics on your part as one size no longer fits all.
>
> - Toke Eskildsen

Reply via email to