Hi,
I have a 6.2.1 solr cloud setup with 5 nodes containing close to 3000
collections having one shard and three replicas each. It looks like when
nodes crash the overseer queue can go wild on grows until ZooKeeper is
not working anymore correctly. This looks pretty much like SOLR-5961
(https://issues.apache.org/jira/browse/SOLR-5961). The only solution
seems to be to delete the overseer queue entries. If I notice the
problem too late and ZK is not working correct anymore then setting
jute.maxbuffer allows to clear the entries again as also described here
https://cwiki.apache.org/confluence/display/CURATOR/TN4.
Is there some way to prevent the overseer to run amok?
regards,
Hendrik