Hi,

I have a 6.2.1 solr cloud setup with 5 nodes containing close to 3000 collections having one shard and three replicas each. It looks like when nodes crash the overseer queue can go wild on grows until ZooKeeper is not working anymore correctly. This looks pretty much like SOLR-5961 (https://issues.apache.org/jira/browse/SOLR-5961). The only solution seems to be to delete the overseer queue entries. If I notice the problem too late and ZK is not working correct anymore then setting jute.maxbuffer allows to clear the entries again as also described here https://cwiki.apache.org/confluence/display/CURATOR/TN4.

Is there some way to prevent the overseer to run amok?

regards,
Hendrik

Reply via email to