Re: multithreading in leader election

Pierre Salagnac Tue, 19 Dec 2023 10:38:39 -0800

Thanks for your answers.

> The message by Pierre is regarding fixing existing code.


Definitely. Here I want to fix some gaps in the current mechanism for
leader election, which is in my opinion a much smaller work than a full
rework with a different approach.

I will fill a Jira ticket for this and will try later to give more
technical details on the possible solution (I don't have them yet! :-) )

Le mar. 19 déc. 2023 à 18:19, Gus Heck <gus.h...@gmail.com> a écrit :

> Well we're always operating on consensus, just sometimes it's lazy
> consensus. If the sentiment in the community is unclear, we (should)
> clarify with a vote before commiting... Ideally it wouldn't get to the
> point of a veto. At least that's my understanding.
>
> If Pierre comes up with a patch to fix a threading issue we should consider
> it. If there's a competing patch that should be considered too. If there's
> no alternate proposal developed enough to create a patch and it looks
> technically sound, it should go in.
>
> May the best patch win.
>
> On Tue, Dec 19, 2023 at 11:46 AM David Smiley <dsmi...@apache.org> wrote:
>
> > My reply might be a little surprising; maybe I hit "send" too quickly.
> Of
> > course one should work to invest in getting more consensus; maybe the
> idea
> > isn't fully understood; maybe the concerns aren't fully understood.  But
> > consensus isn't so much a state that is achieved or not; it's shades of
> > gray.  Many people can be silent or not follow-up with a response of any
> > kind.  In the end, no technical change is voted on, there is just the
> > potential for a veto.  Announcing concluding intentions (I'm about to go
> do
> > XYZ) is an opportunity for a veto to be expressed.
> >
> > On Tue, Dec 19, 2023 at 11:19 AM David Smiley <dsmi...@apache.org>
> wrote:
> >
> > > You may be surprised at what can be accomplished without "consensus"
> :-).
> > > Vetoes are the blocker.  If you/anyone are convinced enough and put
> > forth a
> > > proposal of what you are going to do, get feedback, and say you are
> going
> > > to do it (in spite of concerns but obviously try to address them!), go
> > for
> > > it.
> > >
> > > On Tue, Dec 19, 2023 at 10:45 AM Ilan Ginzburg <ilans...@gmail.com>
> > wrote:
> > >
> > >> The message by Pierre is regarding fixing existing code.
> > >>
> > >> The leader on demand doesn't seem to be a short term solution in any
> > case,
> > >> and there wasn't really a consensus around the proposal.
> > >>
> > >> Ilan
> > >>
> > >> On Tue, Dec 19, 2023 at 4:16 PM David Smiley <dsmi...@apache.org>
> > wrote:
> > >>
> > >> > I would be more in favor of going back to the drawing board on
> leader
> > >> > election than incremental improvements.  Go back to first
> principles.
> > >> The
> > >> > clarity just isn't there to be maintained.  I don't trust it.
> > >> >
> > >> > Coincidentally I sent a message to the Apache Curator users list
> > >> yesterday
> > >> > to inquire about leader prioritization:
> > >> > https://lists.apache.org/thread/lmm30qpm17cjf4b93jxv0rt3bq99c0sb
> > >> > I suspect the "users" list is too low activity to be useful for the
> > >> Curator
> > >> > project; I'm going to try elsewhere.
> > >> >
> > >> > For shards, there doesn't even need to be a "leader election" recipe
> > >> > because there are no shard leader threads that always need to be
> > >> > thinking/doing stuff, unlike the Overseer.  It could be more
> > >> demand-driven
> > >> > (assign leader on-demand if needs to be re-assigned), and thus be
> more
> > >> > scalable as well for many shards.
> > >> > Some of my ideas on this:
> > >> > https://lists.apache.org/thread/kowcp2ftc132pq0y38g9736m0slchjg7
> > >> >
> > >> > On Mon, Dec 18, 2023 at 11:33 AM Pierre Salagnac <
> > >> > pierre.salag...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > We recently had a couple of issues with production clusters
> because
> > of
> > >> > race
> > >> > > conditions in shard leader election. By race condition here, in
> mean
> > >> for
> > >> > a
> > >> > > single node. I'm not discussing how leader election is distributed
> > >> > > across multiple Solr nodes, but how multiple threads in a single
> > Solr
> > >> > node
> > >> > > conflict with each other.
> > >> > >
> > >> > > On the overall, when two threads (on the same server) concurrently
> > >> join
> > >> > > leader election for the same replica, the outcome is
> unpredictable.
> > it
> > >> > may
> > >> > > end in two nodes thinking they are the leader or not having any
> > >> leader at
> > >> > > all.
> > >> > > I identified two scenarios, but maybe there are more:
> > >> > >
> > >> > > 1. Zookeeper session expires while an election is already in
> > progress.
> > >> > > When we re-create the Zookeeper session, we re-register all the
> > cores,
> > >> > and
> > >> > > join elections for all of them. If an election is already
> > in-progress
> > >> or
> > >> > is
> > >> > > triggered for any reason, we can have two threads on the same Solr
> > >> server
> > >> > > node running leader election for the same core.
> > >> > >
> > >> > > 2. Command REJOINLEADERELECTION is received twice concurrently for
> > the
> > >> > same
> > >> > > core.
> > >> > > This scenario is much easier to reproduce with an external client.
> > It
> > >> > > occurs for us since we have customizations using this command.
> > >> > >
> > >> > >
> > >> > > The code for leader election hasn't changed much for a while, and
> I
> > >> don't
> > >> > > understand the full history behind it. I wonder whether
> > multithreading
> > >> > was
> > >> > > already discussed and/or taken into account. The code has a "TODO:
> > >> can we
> > >> > > even get into this state?" that makes me think this issue was
> > already
> > >> > > reproduced but noy fully solved/understood.
> > >> > > Since this code has many calls to Zookeeper, I don't think we can
> > just
> > >> > > "synchronize" it with mutual exclusions, as these calls that
> involve
> > >> the
> > >> > > network can be incredibly slow when something bad happens. We
> don't
> > >> want
> > >> > > any thread to be blocked by another waiting for a remote call to
> > >> > complete.
> > >> > >
> > >> > > I would like to get some opinions about making this code more
> robust
> > >> to
> > >> > > concurrency. Unless the main opinion is "no, this code should
> > >> actually be
> > >> > > mono threaded !", I can give it a try.
> > >> > >
> > >> > > Thanks
> > >> > >
> > >> >
> > >>
> > >
> >
>
>
> --
> http://www.needhamsoftware.com (work)
> https://a.co/d/b2sZLD9 (my fantasy fiction book)
>

Re: multithreading in leader election

Reply via email to