Strange - we hardened that loop in 4.1 - so I'm not sure what happened here.

Can you do a stack dump on the overseer and see if you see an Overseer thread 
running perhaps? Or just post the results?

To recover, you should be able to just restart the Overseer node and have 
someone else take over - they should pick up processing the queue.

Any logs you might be able to share could be useful too.

- Mark

On Mar 15, 2013, at 7:51 PM, Gary Yngve <gary.yn...@gmail.com> wrote:

> Also, looking at overseer_elect, everything looks fine.  node is valid and
> live.
> 
> 
> On Fri, Mar 15, 2013 at 4:47 PM, Gary Yngve <gary.yn...@gmail.com> wrote:
> 
>> Sorry, should have specified.  4.1
>> 
>> 
>> 
>> 
>> On Fri, Mar 15, 2013 at 4:33 PM, Mark Miller <markrmil...@gmail.com>wrote:
>> 
>>> What Solr version? 4.0, 4.1 4.2?
>>> 
>>> - Mark
>>> 
>>> On Mar 15, 2013, at 7:19 PM, Gary Yngve <gary.yn...@gmail.com> wrote:
>>> 
>>>> my solr cloud has been running fine for weeks, but about a week ago, it
>>>> stopped dequeueing from the overseer queue, and now there are thousands
>>> of
>>>> tasks on the queue, most which look like
>>>> 
>>>> {
>>>> "operation":"state",
>>>> "numShards":null,
>>>> "shard":"shard3",
>>>> "roles":null,
>>>> "state":"recovering",
>>>> "core":"production_things_shard3_2",
>>>> "collection":"production_things",
>>>> "node_name":"10.31.41.59:8883_solr",
>>>> "base_url":"http://10.31.41.59:8883/solr"}
>>>> 
>>>> i'm trying to create a new collection through collection API, and
>>>> obviously, nothing is happening...
>>>> 
>>>> any suggestion on how to fix this?  drop the queue in zk?
>>>> 
>>>> how could did it have gotten in this state in the first place?
>>>> 
>>>> thanks,
>>>> gary
>>> 
>>> 
>> 

Reply via email to