My reply might be a little surprising; maybe I hit "send" too quickly. Of course one should work to invest in getting more consensus; maybe the idea isn't fully understood; maybe the concerns aren't fully understood. But consensus isn't so much a state that is achieved or not; it's shades of gray. Many people can be silent or not follow-up with a response of any kind. In the end, no technical change is voted on, there is just the potential for a veto. Announcing concluding intentions (I'm about to go do XYZ) is an opportunity for a veto to be expressed.
On Tue, Dec 19, 2023 at 11:19 AM David Smiley <dsmi...@apache.org> wrote: > You may be surprised at what can be accomplished without "consensus" :-). > Vetoes are the blocker. If you/anyone are convinced enough and put forth a > proposal of what you are going to do, get feedback, and say you are going > to do it (in spite of concerns but obviously try to address them!), go for > it. > > On Tue, Dec 19, 2023 at 10:45 AM Ilan Ginzburg <ilans...@gmail.com> wrote: > >> The message by Pierre is regarding fixing existing code. >> >> The leader on demand doesn't seem to be a short term solution in any case, >> and there wasn't really a consensus around the proposal. >> >> Ilan >> >> On Tue, Dec 19, 2023 at 4:16 PM David Smiley <dsmi...@apache.org> wrote: >> >> > I would be more in favor of going back to the drawing board on leader >> > election than incremental improvements. Go back to first principles. >> The >> > clarity just isn't there to be maintained. I don't trust it. >> > >> > Coincidentally I sent a message to the Apache Curator users list >> yesterday >> > to inquire about leader prioritization: >> > https://lists.apache.org/thread/lmm30qpm17cjf4b93jxv0rt3bq99c0sb >> > I suspect the "users" list is too low activity to be useful for the >> Curator >> > project; I'm going to try elsewhere. >> > >> > For shards, there doesn't even need to be a "leader election" recipe >> > because there are no shard leader threads that always need to be >> > thinking/doing stuff, unlike the Overseer. It could be more >> demand-driven >> > (assign leader on-demand if needs to be re-assigned), and thus be more >> > scalable as well for many shards. >> > Some of my ideas on this: >> > https://lists.apache.org/thread/kowcp2ftc132pq0y38g9736m0slchjg7 >> > >> > On Mon, Dec 18, 2023 at 11:33 AM Pierre Salagnac < >> > pierre.salag...@gmail.com> >> > wrote: >> > >> > > We recently had a couple of issues with production clusters because of >> > race >> > > conditions in shard leader election. By race condition here, in mean >> for >> > a >> > > single node. I'm not discussing how leader election is distributed >> > > across multiple Solr nodes, but how multiple threads in a single Solr >> > node >> > > conflict with each other. >> > > >> > > On the overall, when two threads (on the same server) concurrently >> join >> > > leader election for the same replica, the outcome is unpredictable. it >> > may >> > > end in two nodes thinking they are the leader or not having any >> leader at >> > > all. >> > > I identified two scenarios, but maybe there are more: >> > > >> > > 1. Zookeeper session expires while an election is already in progress. >> > > When we re-create the Zookeeper session, we re-register all the cores, >> > and >> > > join elections for all of them. If an election is already in-progress >> or >> > is >> > > triggered for any reason, we can have two threads on the same Solr >> server >> > > node running leader election for the same core. >> > > >> > > 2. Command REJOINLEADERELECTION is received twice concurrently for the >> > same >> > > core. >> > > This scenario is much easier to reproduce with an external client. It >> > > occurs for us since we have customizations using this command. >> > > >> > > >> > > The code for leader election hasn't changed much for a while, and I >> don't >> > > understand the full history behind it. I wonder whether multithreading >> > was >> > > already discussed and/or taken into account. The code has a "TODO: >> can we >> > > even get into this state?" that makes me think this issue was already >> > > reproduced but noy fully solved/understood. >> > > Since this code has many calls to Zookeeper, I don't think we can just >> > > "synchronize" it with mutual exclusions, as these calls that involve >> the >> > > network can be incredibly slow when something bad happens. We don't >> want >> > > any thread to be blocked by another waiting for a remote call to >> > complete. >> > > >> > > I would like to get some opinions about making this code more robust >> to >> > > concurrency. Unless the main opinion is "no, this code should >> actually be >> > > mono threaded !", I can give it a try. >> > > >> > > Thanks >> > > >> > >> >