Re: multithreading in leader election

David Smiley Tue, 19 Dec 2023 08:46:40 -0800

My reply might be a little surprising; maybe I hit "send" too quickly.  Of
course one should work to invest in getting more consensus; maybe the idea
isn't fully understood; maybe the concerns aren't fully understood.  But
consensus isn't so much a state that is achieved or not; it's shades of
gray.  Many people can be silent or not follow-up with a response of any
kind.  In the end, no technical change is voted on, there is just the
potential for a veto.  Announcing concluding intentions (I'm about to go do
XYZ) is an opportunity for a veto to be expressed.


On Tue, Dec 19, 2023 at 11:19 AM David Smiley <dsmi...@apache.org> wrote:

> You may be surprised at what can be accomplished without "consensus" :-).
> Vetoes are the blocker.  If you/anyone are convinced enough and put forth a
> proposal of what you are going to do, get feedback, and say you are going
> to do it (in spite of concerns but obviously try to address them!), go for
> it.
>
> On Tue, Dec 19, 2023 at 10:45 AM Ilan Ginzburg <ilans...@gmail.com> wrote:
>
>> The message by Pierre is regarding fixing existing code.
>>
>> The leader on demand doesn't seem to be a short term solution in any case,
>> and there wasn't really a consensus around the proposal.
>>
>> Ilan
>>
>> On Tue, Dec 19, 2023 at 4:16 PM David Smiley <dsmi...@apache.org> wrote:
>>
>> > I would be more in favor of going back to the drawing board on leader
>> > election than incremental improvements.  Go back to first principles.
>> The
>> > clarity just isn't there to be maintained.  I don't trust it.
>> >
>> > Coincidentally I sent a message to the Apache Curator users list
>> yesterday
>> > to inquire about leader prioritization:
>> > https://lists.apache.org/thread/lmm30qpm17cjf4b93jxv0rt3bq99c0sb
>> > I suspect the "users" list is too low activity to be useful for the
>> Curator
>> > project; I'm going to try elsewhere.
>> >
>> > For shards, there doesn't even need to be a "leader election" recipe
>> > because there are no shard leader threads that always need to be
>> > thinking/doing stuff, unlike the Overseer.  It could be more
>> demand-driven
>> > (assign leader on-demand if needs to be re-assigned), and thus be more
>> > scalable as well for many shards.
>> > Some of my ideas on this:
>> > https://lists.apache.org/thread/kowcp2ftc132pq0y38g9736m0slchjg7
>> >
>> > On Mon, Dec 18, 2023 at 11:33 AM Pierre Salagnac <
>> > pierre.salag...@gmail.com>
>> > wrote:
>> >
>> > > We recently had a couple of issues with production clusters because of
>> > race
>> > > conditions in shard leader election. By race condition here, in mean
>> for
>> > a
>> > > single node. I'm not discussing how leader election is distributed
>> > > across multiple Solr nodes, but how multiple threads in a single Solr
>> > node
>> > > conflict with each other.
>> > >
>> > > On the overall, when two threads (on the same server) concurrently
>> join
>> > > leader election for the same replica, the outcome is unpredictable. it
>> > may
>> > > end in two nodes thinking they are the leader or not having any
>> leader at
>> > > all.
>> > > I identified two scenarios, but maybe there are more:
>> > >
>> > > 1. Zookeeper session expires while an election is already in progress.
>> > > When we re-create the Zookeeper session, we re-register all the cores,
>> > and
>> > > join elections for all of them. If an election is already in-progress
>> or
>> > is
>> > > triggered for any reason, we can have two threads on the same Solr
>> server
>> > > node running leader election for the same core.
>> > >
>> > > 2. Command REJOINLEADERELECTION is received twice concurrently for the
>> > same
>> > > core.
>> > > This scenario is much easier to reproduce with an external client. It
>> > > occurs for us since we have customizations using this command.
>> > >
>> > >
>> > > The code for leader election hasn't changed much for a while, and I
>> don't
>> > > understand the full history behind it. I wonder whether multithreading
>> > was
>> > > already discussed and/or taken into account. The code has a "TODO:
>> can we
>> > > even get into this state?" that makes me think this issue was already
>> > > reproduced but noy fully solved/understood.
>> > > Since this code has many calls to Zookeeper, I don't think we can just
>> > > "synchronize" it with mutual exclusions, as these calls that involve
>> the
>> > > network can be incredibly slow when something bad happens. We don't
>> want
>> > > any thread to be blocked by another waiting for a remote call to
>> > complete.
>> > >
>> > > I would like to get some opinions about making this code more robust
>> to
>> > > concurrency. Unless the main opinion is "no, this code should
>> actually be
>> > > mono threaded !", I can give it a try.
>> > >
>> > > Thanks
>> > >
>> >
>>
>

Re: multithreading in leader election

Reply via email to