Re: 6.4.0 collection leader election and recovery issues

Ravi Solr Thu, 02 Feb 2017 06:24:19 -0800

When i try to rollback from 6.4.0 to my original version of 6.0.1 it now
throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1


Could not load codec 'Lucene62'.  Did you forget to add
lucene-backward-codecs.jar?
    at org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.java:429)
    at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:349)
    at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:284)

Hope this doesnt cost me dearly. Any ideas at least on how to rollback
safely.

Thanks

Ravi Kiran Bhaskar

On Thu, Feb 2, 2017 at 4:52 AM, Ravi Solr <ravis...@gmail.com> wrote:

> Following up on my previous email, the intermittent server unavailability
> seems to be linked to the interaction between Solr and Zookeeper. Can
> somebody help me understand what this error means and how to recover from
> it.
>
> 2017-02-02 09:44:24.648 ERROR (recoveryExecutor-3-thread-16-
> processing-n:xx.xxx.xxx.xxx:1234_solr x:clicktrack_shard1_replica4
> s:shard1 c:clicktrack r:core_node3) [c:clicktrack s:shard1 r:core_node3
> x:clicktrack_shard1_replica4] o.a.s.c.RecoveryStrategy Error while trying
> to recover. core=clicktrack_shard1_replica4:org.apache.zookeeper.
> KeeperException$SessionExpiredException: KeeperErrorCode = Session
> expired for /overseer/queue/qn-
>     at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:127)
>     at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>     at org.apache.solr.common.cloud.SolrZkClient$9.execute(
> SolrZkClient.java:391)
>     at org.apache.solr.common.cloud.SolrZkClient$9.execute(
> SolrZkClient.java:388)
>     at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(
> ZkCmdExecutor.java:60)
>     at org.apache.solr.common.cloud.SolrZkClient.create(
> SolrZkClient.java:388)
>     at org.apache.solr.cloud.DistributedQueue.offer(
> DistributedQueue.java:244)
>     at org.apache.solr.cloud.ZkController.publish(ZkController.java:1215)
>     at org.apache.solr.cloud.ZkController.publish(ZkController.java:1128)
>     at org.apache.solr.cloud.ZkController.publish(ZkController.java:1124)
>     at org.apache.solr.cloud.RecoveryStrategy.doRecovery(
> RecoveryStrategy.java:334)
>     at org.apache.solr.cloud.RecoveryStrategy.run(
> RecoveryStrategy.java:222)
>     at com.codahale.metrics.InstrumentedExecutorService$
> InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>     at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at org.apache.solr.common.util.ExecutorUtil$
> MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
>
> Thanks
>
> Ravi Kiran Bhaskar
>
> On Thu, Feb 2, 2017 at 2:27 AM, Ravi Solr <ravis...@gmail.com> wrote:
>
>> Hello,
>>          Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
>> hours of debugging spree!! Can somebody kindly help me  out of this misery.
>>
>> I have a set has 8 single shard collections with 3 replicas. As soon as I
>> updated the configs and started the servers one of my collection got stuck
>> with no leader. I have restarted solr to no avail, I also tried to force a
>> leader via collections API that dint work either. I also see that, from
>> time to time multiple solr nodes go down all at the same time, only a
>> restart resolves the issue.
>>
>> The error snippets are shown below
>>
>> 2017-02-02 01:43:42.785 ERROR (recoveryExecutor-3-thread-6-processing-n:
>> 10.128.159.245:9001_solr x:clicktrack_shard1_replica1 s:shard1
>> c:clicktrack r:core_node1) [c:clicktrack s:shard1 r:core_node1
>> x:clicktrack_shard1_replica1] o.a.s.c.RecoveryStrategy Error while trying
>> to recover. 
>> core=clicktrack_shard1_replica1:org.apache.solr.common.SolrException:
>> No registered leader was found after waiting for 4000ms , collection:
>> clicktrack slice: shard1
>>
>> solr.log.9:2017-02-02 01:43:41.336 INFO  (zkCallback-4-thread-29-proces
>> sing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A cluster
>> state change: [WatchedEvent state:SyncConnected type:NodeDataChanged
>> path:/collections/clicktrack/state.json] for collection [clicktrack] has
>> occurred - updating... (live nodes size: [1])
>> solr.log.9:2017-02-02 01:43:42.224 INFO  (zkCallback-4-thread-29-proces
>> sing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A cluster
>> state change: [WatchedEvent state:SyncConnected type:NodeDataChanged
>> path:/collections/clicktrack/state.json] for collection [clicktrack] has
>> occurred - updating... (live nodes size: [1])
>> solr.log.9:2017-02-02 01:43:43.767 INFO  (zkCallback-4-thread-23-proces
>> sing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A cluster
>> state change: [WatchedEvent state:SyncConnected type:NodeDataChanged
>> path:/collections/clicktrack/state.json] for collection [clicktrack] has
>> occurred - updating... (live nodes size: [1])
>>
>>
>> Suspecting the worst I backed up the index and renamed the collection's
>> data folder and restarted the servers, this time the collection got a
>> proper leader. So is my index really corrupted ? Solr UI showed live nodes
>> just like the logs but without any leader. Even with the leader issue
>> somewhat alleviated after renaming the data folder and letting silr create
>> a new data folder my servers did go down a couple of times.
>>
>> I am not all that well versed with zookeeper...any trick to make
>> zookeeper pick a leader and be happy ? Did anybody have solr/zookeeper
>> issues with 6.4.0 ?
>>
>> Thanks
>>
>> Ravi Kiran Bhaskar
>>
>
>

Re: 6.4.0 collection leader election and recovery issues

Reply via email to