You may be surprised at what can be accomplished without "consensus" :-). Vetoes are the blocker. If you/anyone are convinced enough and put forth a proposal of what you are going to do, get feedback, and say you are going to do it (in spite of concerns but obviously try to address them!), go for it.
On Tue, Dec 19, 2023 at 10:45 AM Ilan Ginzburg <ilans...@gmail.com> wrote: > The message by Pierre is regarding fixing existing code. > > The leader on demand doesn't seem to be a short term solution in any case, > and there wasn't really a consensus around the proposal. > > Ilan > > On Tue, Dec 19, 2023 at 4:16 PM David Smiley <dsmi...@apache.org> wrote: > > > I would be more in favor of going back to the drawing board on leader > > election than incremental improvements. Go back to first principles. > The > > clarity just isn't there to be maintained. I don't trust it. > > > > Coincidentally I sent a message to the Apache Curator users list > yesterday > > to inquire about leader prioritization: > > https://lists.apache.org/thread/lmm30qpm17cjf4b93jxv0rt3bq99c0sb > > I suspect the "users" list is too low activity to be useful for the > Curator > > project; I'm going to try elsewhere. > > > > For shards, there doesn't even need to be a "leader election" recipe > > because there are no shard leader threads that always need to be > > thinking/doing stuff, unlike the Overseer. It could be more > demand-driven > > (assign leader on-demand if needs to be re-assigned), and thus be more > > scalable as well for many shards. > > Some of my ideas on this: > > https://lists.apache.org/thread/kowcp2ftc132pq0y38g9736m0slchjg7 > > > > On Mon, Dec 18, 2023 at 11:33 AM Pierre Salagnac < > > pierre.salag...@gmail.com> > > wrote: > > > > > We recently had a couple of issues with production clusters because of > > race > > > conditions in shard leader election. By race condition here, in mean > for > > a > > > single node. I'm not discussing how leader election is distributed > > > across multiple Solr nodes, but how multiple threads in a single Solr > > node > > > conflict with each other. > > > > > > On the overall, when two threads (on the same server) concurrently join > > > leader election for the same replica, the outcome is unpredictable. it > > may > > > end in two nodes thinking they are the leader or not having any leader > at > > > all. > > > I identified two scenarios, but maybe there are more: > > > > > > 1. Zookeeper session expires while an election is already in progress. > > > When we re-create the Zookeeper session, we re-register all the cores, > > and > > > join elections for all of them. If an election is already in-progress > or > > is > > > triggered for any reason, we can have two threads on the same Solr > server > > > node running leader election for the same core. > > > > > > 2. Command REJOINLEADERELECTION is received twice concurrently for the > > same > > > core. > > > This scenario is much easier to reproduce with an external client. It > > > occurs for us since we have customizations using this command. > > > > > > > > > The code for leader election hasn't changed much for a while, and I > don't > > > understand the full history behind it. I wonder whether multithreading > > was > > > already discussed and/or taken into account. The code has a "TODO: can > we > > > even get into this state?" that makes me think this issue was already > > > reproduced but noy fully solved/understood. > > > Since this code has many calls to Zookeeper, I don't think we can just > > > "synchronize" it with mutual exclusions, as these calls that involve > the > > > network can be incredibly slow when something bad happens. We don't > want > > > any thread to be blocked by another waiting for a remote call to > > complete. > > > > > > I would like to get some opinions about making this code more robust to > > > concurrency. Unless the main opinion is "no, this code should actually > be > > > mono threaded !", I can give it a try. > > > > > > Thanks > > > > > >