Solr Cloud with 5 servers cluster failed due to Leader out of memory

Tim Chen Thu, 04 Aug 2016 19:14:46 -0700

Hi Guys,

Me again. :)


We have 5 Solr servers:
01 -04 running Solr version 4.10 and ZooKeeper service
05 running ZooKeeper only.

JVM Max Memory set to 10G.

We have around 20 collections, and for each collection, there are 4 shards, for 
each shard, there are 4 replica sitting across on 4 Solr servers.

Unfortunately most of time, all the Shards have the same Leader (eg, Solr 
server 01).

Now, If we are adding a lot of documents to Solr, and eventually Solr 01 (All 
Shard's Leader) throws Out of memory in Tomcat log, and service goes down (but 
8983 port is still responding to telnet).
At this moment, I went to see logs on Solr02, Solr03, Solr04, and there are a 
lot of "Connection time out", in another 2 minutes, all these three Solr 
servers' service goes down too!

My feeling is that, when there are a lot of documents pushing in, Leader will 
be busy with indexing, and also requesting other (non-leader) servers to do the 
index as well. All other non-leader server are relying on Leader to finish the 
new document index. At a certain point, that Solr01 (Leader) server has no more 
memory, it gives up, but other (non-leader) servers are still waiting for 
Leader to respond. The whole Solr Cloud cluster breaks from here....  No more 
requests being served.

Couple of thoughts:
1, If Leader goes down, it should just go down, like dead down, so other 
servers can do the election and choose the new leader. This at least avoids 
bringing down the whole cluster. Am I right?
2, Apparently we should not pushing too many documents to Solr, how do you guys 
handle this? Set a limit somewhere?

Thanks,
Tim




[Premiere League Starts Saturday 13 August 9.30pm on 
SBS]<http://theworldgame.sbs.com.au/>

Solr Cloud with 5 servers cluster failed due to Leader out of memory

Reply via email to