What version of solr is having this problem?

On Tue, Mar 28, 2023 at 10:47 AM Pierre Salagnac <pierre.salag...@gmail.com>
wrote:

> Hello everyone,
> I'm investigating issues where a replica ends in having no leader, and I
> wonder whether my specified cases were already discussed somewhere.
>
> More specifically in the code, I (with the help of my colleagues)
> identified two gaps where we exit the leadership process, without going
> back to it ever. Both of them happen when the election ephemeral node is
> dropped because the Zookeeper session expired.
>
> First one, in class LeaderElector:
> - we log *"Our node is no longer in line to be leader"*
> - and immediately return
>
> Second one, in class
> * - we log "Will not register as leader because it seems the election is no
> longer taking place."*
>  - and immediately return
>
> For both cases, we explicitly check our sequential node still exists in the
> election. First case has a call to zkClient.getChildren(...) and we then
> validate the results, while the second case catches a NoNodeException.
> If I don't miss anything, the node won't get back to this election. Since
> we aborted, this allows other eventual nodes to be the leader for this
> shard. But if they're not there (and we are), we just can't be the leader.
>
>
> Taking a step back, it seems to me error handling in the leader election
> code is messy. There are a large number of catch blocks. Some of them
> trigger a retry of the election while some of them don't.
>
> Are they issues that were already discussed ?
> Thanks
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Reply via email to