We’re seeing something similar to what Ryan reported, e.g. a massively clogged overseer queue that gets so bad it brings down our solr nodes. I tried “rmr”ing the entire /overseer/queue but it keeps returning with “Node does not exist: /overseer/queue/qn-00000######”, after which in order to continue I have to create the node complained about and then execute the “rmr /overseer/queue” again, until it stumbled upon another node that doesn’t exist, rinse, wash, repeat…
This is w/ Solr 4.7.1 and ZooKeeper 3.4.6 -- James Hardwick On Thursday, May 1, 2014 at 10:25 AM, Mark Miller wrote: > What version are you running? This was fixed in a recent release. It can > happen if you hit add core with the defaults on the admin page in older > versions. > > -- > Mark Miller > about.me/markrmiller (http://about.me/markrmiller) > > On May 1, 2014 at 11:19:54 AM, ryan.cooke (ryan.co...@gmail.com > (mailto:ryan.co...@gmail.com)) wrote: > > I saw an overseer queue clogged as well due to a bad message in the queue. > Unfortunately this went unnoticed for a while until there were 130K messages > in the overseer queue. Since it was a production system we were not able to > simply stop everything and delete all Zookeeper data, so we manually deleted > messages by issuing commands directly through the zkCli.sh (http://zkCli.sh) > tool. After all > the messages had been cleared, some nodes were in the wrong state (e.g. > 'down' when should have been 'active'). Restarting the 'down' or 'recovery > failed' nodes brought the whole cluster back to a stable and healthy state. > > Since it can take some digging to determine backlog in the overseer queue, > some of the symptoms we saw were: > Overseer throwing an exception like "Path must not end with / character" > Random nodes throwing an exception like "ClusterState says we are the > leader, but locally we don't think so" > Bringing up new replicas time out when attempting to fetch shard id > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/overseer-queue-clogged-tp4047878p4134129.html > > Sent from the Solr - User mailing list archive at Nabble.com > (http://Nabble.com). > >