Re: no servers hosting shard

2014-01-07 Thread patrick conant
After a full bounce of Tomcat, I'm now getting a new exception (below).  I
can browse the Zookeeper config in the Solr admin UI, and can confirm that
there's a node for '/collections/customerOrderSearch/leaders/shard2', but
no node for 'collections/customerOrderSearch/leaders/shard1'.  Still, any
ideas or guidance on how to recover would be appreciated.  We've restarted
all three zookeeper instances and both Solr instances, but that hasn't made
any appreciable difference.

--p.




2014-01-07 10:06:14,980 [coreLoadExecutor-4-thread-1] ERROR
org.apache.solr.core.CoreContainer -
null:org.apache.solr.common.cloud.ZooKeeperException:
at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:309)
at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:556)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:365)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.solr.common.SolrException: Error getting leader from
zk for shard shard1
at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:864)
at org.apache.solr.cloud.ZkController.register(ZkController.java:773)
at org.apache.solr.cloud.ZkController.register(ZkController.java:723)
at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:286)
... 11 more
Caused by: org.apache.solr.common.SolrException: Could not get leader props
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:911)
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:875)
at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:839)
... 14 more
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /collections/customerOrderSearch/leaders/shard1
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:252)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:249)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)
at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:249)
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:889)
... 16 more



On Tue, Jan 7, 2014 at 9:57 AM, patrick conant wrote:

> In our Solr instance we have two shards each running on two servers.  The
> server that was the leader for one of the shards ran into a problem, and
> when we restarted the service, Solar is no longer electing a leader for the
> shard.
>
> The stack traces from the logs are below, and the 'Cloud Dump' from the
> Solr console is attached.  We're running Solr 4.4.0.  Any guidance on how
> to recover from this?  Restarting or redeploying the service doesn't seem
> to make any difference.
>
> Thanks,
> Pat.
>
>
> 2014-01-07 00:00:10,754 [http-8080-62] ERROR org.apache.solr.core.SolrCore
> - org.apache.solr.common.SolrException: no servers hosting shard:
>  at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
> at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>  at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:662)
>
> 2014-01-07 09:38:33,701 [http-8080-21] ERROR org.apache.solr.core.SolrCore
> - org.apache.solr.common.SolrException: No registered leader was found,
> collection:customerOrderSearch slice:shard1
>  at
> org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:487)
> at
> org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:470)
>  at
> org.

Re: no servers hosting shard

2014-01-07 Thread patrick conant
We found a way to recover.  This sequence allowed everything to start up
successfully.

- Stop all Solr instances
- Stop all Zookeeper instances
- Start all Zookeeper instances
- Start Solr instances one at a time.

Restarting the first Solr instance took several minutes, but the subsequent
instances started up much more quickly.

Cheers,
Pat.





On Tue, Jan 7, 2014 at 10:20 AM, patrick conant wrote:

> After a full bounce of Tomcat, I'm now getting a new exception (below).  I
> can browse the Zookeeper config in the Solr admin UI, and can confirm that
> there's a node for '/collections/customerOrderSearch/leaders/shard2', but
> no node for 'collections/customerOrderSearch/leaders/shard1'.  Still, any
> ideas or guidance on how to recover would be appreciated.  We've restarted
> all three zookeeper instances and both Solr instances, but that hasn't made
> any appreciable difference.
>
> --p.
>
>
>
>
> 2014-01-07 10:06:14,980 [coreLoadExecutor-4-thread-1] ERROR
> org.apache.solr.core.CoreContainer -
> null:org.apache.solr.common.cloud.ZooKeeperException:
>  at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:309)
> at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:556)
>  at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:365)
> at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>  at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.solr.common.SolrException: Error getting leader from
> zk for shard shard1
> at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:864)
>  at org.apache.solr.cloud.ZkController.register(ZkController.java:773)
> at org.apache.solr.cloud.ZkController.register(ZkController.java:723)
>  at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:286)
> ... 11 more
> Caused by: org.apache.solr.common.SolrException: Could not get leader props
>  at
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:911)
> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:875)
>  at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:839)
> ... 14 more
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for /collections/customerOrderSearch/leaders/shard1
>  at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>  at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:252)
>  at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:249)
> at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)
>  at
> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:249)
> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:889)
>  ... 16 more
>
>
>
> On Tue, Jan 7, 2014 at 9:57 AM, patrick conant 
> wrote:
>
>> In our Solr instance we have two shards each running on two servers.  The
>> server that was the leader for one of the shards ran into a problem, and
>> when we restarted the service, Solar is no longer electing a leader for the
>> shard.
>>
>> The stack traces from the logs are below, and the 'Cloud Dump' from the
>> Solr console is attached.  We're running Solr 4.4.0.  Any guidance on how
>> to recover from this?  Restarting or redeploying the service doesn't seem
>> to make any difference.
>>
>> Thanks,
>> Pat.
>>
>>
>> 2014-01-07 00:00:10,754 [http-8080-62] ERROR
>> org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: no
>> servers hosting shard:
>>  at
>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
>> at
>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
>>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>  at
>> java.util.concurrent.Executors$RunnableAdapter.call(E