Re: Solr Cloud with 5 servers cluster failed due to Leader out of memory

Erick Erickson Thu, 04 Aug 2016 22:02:52 -0700

The fact that all the shards have the same leader is somewhat of a red
herring. Until you get hundreds of shards (perhaps across a _lot_ of
collections), the additional load on the leaders is hard to measure.
If you really see this as a problem, consider the BALANCESHARDUNIQUE
and REBALANCELEADERS Collection API commands, see:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-BalanceSliceUnique


That said, your OOM errors indicate you simply have too many Solr
collections doing too many things with too little memory.

bq: All other non-leader server are relying on Leader to finish the
new document index.

This is not the case. The indexing is done independently on all
replicas. What's _probably_ happening here is that the leaders are
spinning off threads to pass the data on to the replicas and you're
running so close to the heap limit that spinning up those threads is
pushing you to OOM errors.

And, if my hypothesis is true you'll soon run into problems on the
non-leaders as you index more and more documents to your collections.
Consider some serious effort in terms of determining your hardware/JVM
needs, see: 
https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick

Re: Solr Cloud with 5 servers cluster failed due to Leader out of memory

Reply via email to