Hi,

Multiple 6.5.1. clouds / collections went down this weekend around the same 
time, they share the same ZK quorum. The nodes stayed up but did not rejoin the 
cluster (find or connect to ZK)

This is what the log told us:

2017-05-06 18:58:34.893 WARN  
(zkCallback-5-thread-9-processing-n:idx6.example.org:8983_solr) [   ] 
o.a.s.c.c.ConnectionManager Watcher 
org.apache.solr.common.cloud.ConnectionManager@4f97bdad name: ZooKe
eperConnection 
Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
 got event WatchedEvent state:Disconnected type:None path:null path: null type: 
None
2017-05-06 18:58:34.893 WARN  
(zkCallback-5-thread-9-processing-n:idx6.example.org:8983_solr) [   ] 
o.a.s.c.c.ConnectionManager zkClient has disconnected
2017-05-06 18:58:35.001 WARN  
(zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr 
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search 
s:shard2 r:core_node6 x:search_shard2_replica3] o.a.s.c.c.ConnectionManager 
Watcher org.apache.solr.common.cloud.ConnectionManager@c226cc name: 
ZooKeeperConnection 
Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
 got event WatchedEvent state:Disconnected type:None path:null path: null type: 
None
2017-05-06 18:58:35.010 WARN  
(zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr 
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search 
s:shard2 r:core_node6 x:search_shard2_replica3] o.a.s.c.c.ConnectionManager 
zkClient has disconnected
2017-05-06 18:58:45.360 WARN  
(zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
o.a.s.c.c.ConnectionManager Watcher 
org.apache.solr.common.cloud.ConnectionManager@4f97bdad name: 
ZooKeeperConnection 
Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
 got event WatchedEvent state:Expired type:None path:null path: null type: None
2017-05-06 18:58:45.360 WARN  
(zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
o.a.s.c.c.ConnectionManager Our previous ZooKeeper session was expired. 
Attempting to reconnect to recover relationship with ZooKeeper...
2017-05-06 18:58:45.380 WARN  
(OverseerStateUpdate-97740792370385619-idx6.example.org:8983_solr-n_0000000558) 
[   ] o.a.s.c.Overseer Solr cannot talk to ZK, exiting Overseer main queue loop
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /overseer/queue
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
        at 
org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:339)
        at 
org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:336)
        at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
        at 
org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:336)
        at 
org.apache.solr.cloud.DistributedQueue.fetchZkChildren(DistributedQueue.java:308)
        at 
org.apache.solr.cloud.DistributedQueue.firstChild(DistributedQueue.java:285)
        at 
org.apache.solr.cloud.DistributedQueue.firstElement(DistributedQueue.java:393)
        at 
org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:159)
        at 
org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:137)
        at 
org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:180)
        at java.lang.Thread.run(Thread.java:745)
2017-05-06 18:58:45.381 WARN  
(zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
o.a.s.c.c.DefaultConnectionStrategy Connection expired - starting a new one...
2017-05-06 18:58:45.382 ERROR (OverseerExitThread) [   ] o.a.s.c.Overseer could 
not read the data
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /overseer_elect/leader
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
        at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:356)
        at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353)
        at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
        at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:353)
        at 
org.apache.solr.cloud.Overseer$ClusterStateUpdater.checkIfIamStillLeader(Overseer.java:287)
        at java.lang.Thread.run(Thread.java:745)
2017-05-06 18:58:46.453 WARN  
(zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr 
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search 
s:shard2 r:core_node6 x:search_shard2_replica3] o.a.s.c.c.ConnectionManager 
Watcher org.apache.solr.common.cloud.ConnectionManager@c226cc name: 
ZooKeeperConnection 
Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
 got event WatchedEvent state:Expired type:None path:null path: null type: None
2017-05-06 18:58:46.453 WARN  
(zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr 
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search 
s:shard2 r:core_node6 x:search_shard2_replica3] o.a.s.c.c.ConnectionManager Our 
previous ZooKeeper session was expired. Attempting to reconnect to recover 
relationship with ZooKeeper...
2017-05-06 18:58:46.460 WARN  
(zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr 
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search 
s:shard2 r:core_node6 x:search_shard2_replica3] 
o.a.s.c.c.DefaultConnectionStrategy Connection expired - starting a new one...
2017-05-06 18:58:53.599 ERROR 
(zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
o.a.s.c.ZkController :org.apache.zookeeper.KeeperException$NodeExistsException: 
KeeperErrorCode = NodeExists for /live_nodes/idx6.example.org:8983_solr
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
        at 
org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:526)
        at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
        at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:523)
        at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:466)
        at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:453)
        at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:430)
        at 
org.apache.solr.cloud.ZkController.createEphemeralLiveNode(ZkController.java:823)
        at org.apache.solr.cloud.ZkController.access$600(ZkController.java:120)
        at org.apache.solr.cloud.ZkController$1.command(ZkController.java:340)
        at 
org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:168)
        at 
org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:57)
        at 
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:142)
        at 
org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:268)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

2017-05-06 18:58:53.599 ERROR 
(zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
o.a.s.c.c.DefaultConnectionStrategy Reconnect to ZooKeeper 
failed:org.apache.solr.common.cloud.ZooKeeperException: 
        at org.apache.solr.cloud.ZkController$1.command(ZkController.java:392)
        at 
org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:168)
        at 
org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:57)
        at 
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:142)
        at 
org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:268)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: 
KeeperErrorCode = NodeExists for /live_nodes/idx6.example.org:8983_solr
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
        at 
org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:526)
        at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
        at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:523)
        at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:466)
        at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:453)
        at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:430)
        at 
org.apache.solr.cloud.ZkController.createEphemeralLiveNode(ZkController.java:823)
        at org.apache.solr.cloud.ZkController.access$600(ZkController.java:120)
        at org.apache.solr.cloud.ZkController$1.command(ZkController.java:340)
        ... 10 more
2017-05-06 18:58:53.600 WARN  
(zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
o.a.s.c.c.DefaultConnectionStrategy Reconnect to ZooKeeper failed
2017-05-06 18:58:57.052 ERROR (qtp1873653341-14807) [   ] 
o.a.s.h.RequestHandlerBase 
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /collections/search/state.json
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
        at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:356)
        at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353)
        at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
        at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:353)
        at 
org.apache.solr.common.cloud.ZkStateReader.fetchCollectionState(ZkStateReader.java:1110)
        at 
org.apache.solr.common.cloud.ZkStateReader.forceUpdateCollection(ZkStateReader.java:321)
        at 
org.apache.solr.handler.admin.PrepRecoveryOp.execute(PrepRecoveryOp.java:102)
        at 
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:370)
        at 
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:388)
        at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:174)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
        at 
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:748)
        at 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:729)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:510)

After that we occasionally see:

2017-05-06 18:58:59.079 ERROR (qtp1873653341-14989) [   ] o.a.s.s.HttpSolrCall 
null:org.apache.zookeeper.KeeperException$SessionExpiredException: 
KeeperErrorCode = Session expired for /collections/search/state.json

We executed a hard Solr restart to get stuff back up. Is this a known issue?

Thanks,
Markus

Reply via email to