The message by Pierre is regarding fixing existing code.

The leader on demand doesn't seem to be a short term solution in any case,
and there wasn't really a consensus around the proposal.

Ilan

On Tue, Dec 19, 2023 at 4:16 PM David Smiley <dsmi...@apache.org> wrote:

> I would be more in favor of going back to the drawing board on leader
> election than incremental improvements.  Go back to first principles.  The
> clarity just isn't there to be maintained.  I don't trust it.
>
> Coincidentally I sent a message to the Apache Curator users list yesterday
> to inquire about leader prioritization:
> https://lists.apache.org/thread/lmm30qpm17cjf4b93jxv0rt3bq99c0sb
> I suspect the "users" list is too low activity to be useful for the Curator
> project; I'm going to try elsewhere.
>
> For shards, there doesn't even need to be a "leader election" recipe
> because there are no shard leader threads that always need to be
> thinking/doing stuff, unlike the Overseer.  It could be more demand-driven
> (assign leader on-demand if needs to be re-assigned), and thus be more
> scalable as well for many shards.
> Some of my ideas on this:
> https://lists.apache.org/thread/kowcp2ftc132pq0y38g9736m0slchjg7
>
> On Mon, Dec 18, 2023 at 11:33 AM Pierre Salagnac <
> pierre.salag...@gmail.com>
> wrote:
>
> > We recently had a couple of issues with production clusters because of
> race
> > conditions in shard leader election. By race condition here, in mean for
> a
> > single node. I'm not discussing how leader election is distributed
> > across multiple Solr nodes, but how multiple threads in a single Solr
> node
> > conflict with each other.
> >
> > On the overall, when two threads (on the same server) concurrently join
> > leader election for the same replica, the outcome is unpredictable. it
> may
> > end in two nodes thinking they are the leader or not having any leader at
> > all.
> > I identified two scenarios, but maybe there are more:
> >
> > 1. Zookeeper session expires while an election is already in progress.
> > When we re-create the Zookeeper session, we re-register all the cores,
> and
> > join elections for all of them. If an election is already in-progress or
> is
> > triggered for any reason, we can have two threads on the same Solr server
> > node running leader election for the same core.
> >
> > 2. Command REJOINLEADERELECTION is received twice concurrently for the
> same
> > core.
> > This scenario is much easier to reproduce with an external client. It
> > occurs for us since we have customizations using this command.
> >
> >
> > The code for leader election hasn't changed much for a while, and I don't
> > understand the full history behind it. I wonder whether multithreading
> was
> > already discussed and/or taken into account. The code has a "TODO: can we
> > even get into this state?" that makes me think this issue was already
> > reproduced but noy fully solved/understood.
> > Since this code has many calls to Zookeeper, I don't think we can just
> > "synchronize" it with mutual exclusions, as these calls that involve the
> > network can be incredibly slow when something bad happens. We don't want
> > any thread to be blocked by another waiting for a remote call to
> complete.
> >
> > I would like to get some opinions about making this code more robust to
> > concurrency. Unless the main opinion is "no, this code should actually be
> > mono threaded !", I can give it a try.
> >
> > Thanks
> >
>

Reply via email to