Hi,
We are using Solr cloud with solr 4.10.4.
On the passed week we encountered a problem where all of our servers
disconnected from zookeeper cluster.
This might be ok, the problem is that after reconnecting to zookeeper it
looks like for every collection both replicas do not have a leader and are
stuck in some kind of a deadlock for a few minutes.

>From what we understand:
One of the replicas assume it ill be the leader and at some point starting
to wait on leaderVoteWait, which is by default 3 minutes.
The other replica is stuck on this part of code for a few minutes:
 at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:957)
        at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:921)
        at
org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1521)
        at
org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:392)

Looks like replica 1 waits for a leader to be registered in the zookeeper,
but replica 2 is waiting for replica 1.
(org.apache.solr.cloud.ShardLeaderElectionContext.waitForReplicasToComeUp).

We have 100 collections distributed in 3 pairs of Solr nodes. Each
collection has one shard with 2 replicas.
As I understand from code and logs, all the collections are being
registered synchronously, which means that we have to wait 3 minutes *
number of collections for the whole cluster to come up. It could be more
than an hour!



1. We thought about lowering leaderVoteWait to solve the problem, but we
are not sure what is the risk?

2. The following thread is very similar to our case:
http://qnalist.com/questions/4812859/waitforleadertoseedownstate-when-leader-is-down.
Does anybody know if it is indeed a bug and if there's a related JIRA issue?

3. I see this on logs before the reconnection "Client session timed out,
have not heard from server in 48865ms for sessionid 0x44efbb91b5f0001,
closing socket connection and attempting reconnect", does it mean that
there was a disconnection of over 50 seconds between SOLR and zookeeper?


Thanks in advance for your kind answer

Reply via email to