We’re seeing something similar to what Ryan reported, e.g. a massively clogged 
overseer queue that gets so bad it brings down our solr nodes. I tried “rmr”ing 
the entire /overseer/queue but it keeps returning with “Node does not exist: 
/overseer/queue/qn-00000######”, after which in order to continue I have to 
create the node complained about and then execute the “rmr /overseer/queue” 
again, until it stumbled upon another node that doesn’t exist, rinse, wash, 
repeat…  


This is w/ Solr 4.7.1 and ZooKeeper 3.4.6

--  
James Hardwick


On Thursday, May 1, 2014 at 10:25 AM, Mark Miller wrote:

> What version are you running? This was fixed in a recent release. It can 
> happen if you hit add core with the defaults on the admin page in older 
> versions.
>  
> --  
> Mark Miller
> about.me/markrmiller (http://about.me/markrmiller)
>  
> On May 1, 2014 at 11:19:54 AM, ryan.cooke (ryan.co...@gmail.com 
> (mailto:ryan.co...@gmail.com)) wrote:
>  
> I saw an overseer queue clogged as well due to a bad message in the queue.  
> Unfortunately this went unnoticed for a while until there were 130K messages  
> in the overseer queue. Since it was a production system we were not able to  
> simply stop everything and delete all Zookeeper data, so we manually deleted  
> messages by issuing commands directly through the zkCli.sh (http://zkCli.sh) 
> tool. After all  
> the messages had been cleared, some nodes were in the wrong state (e.g.  
> 'down' when should have been 'active'). Restarting the 'down' or 'recovery  
> failed' nodes brought the whole cluster back to a stable and healthy state.  
>  
> Since it can take some digging to determine backlog in the overseer queue,  
> some of the symptoms we saw were:  
> Overseer throwing an exception like "Path must not end with / character"  
> Random nodes throwing an exception like "ClusterState says we are the  
> leader, but locally we don't think so"  
> Bringing up new replicas time out when attempting to fetch shard id  
>  
>  
>  
> --  
> View this message in context: 
> http://lucene.472066.n3.nabble.com/overseer-queue-clogged-tp4047878p4134129.html
>   
> Sent from the Solr - User mailing list archive at Nabble.com 
> (http://Nabble.com).  
>  
>  


Reply via email to