Thanks for your answers. > The message by Pierre is regarding fixing existing code.
Definitely. Here I want to fix some gaps in the current mechanism for leader election, which is in my opinion a much smaller work than a full rework with a different approach. I will fill a Jira ticket for this and will try later to give more technical details on the possible solution (I don't have them yet! :-) ) Le mar. 19 déc. 2023 à 18:19, Gus Heck <gus.h...@gmail.com> a écrit : > Well we're always operating on consensus, just sometimes it's lazy > consensus. If the sentiment in the community is unclear, we (should) > clarify with a vote before commiting... Ideally it wouldn't get to the > point of a veto. At least that's my understanding. > > If Pierre comes up with a patch to fix a threading issue we should consider > it. If there's a competing patch that should be considered too. If there's > no alternate proposal developed enough to create a patch and it looks > technically sound, it should go in. > > May the best patch win. > > On Tue, Dec 19, 2023 at 11:46 AM David Smiley <dsmi...@apache.org> wrote: > > > My reply might be a little surprising; maybe I hit "send" too quickly. > Of > > course one should work to invest in getting more consensus; maybe the > idea > > isn't fully understood; maybe the concerns aren't fully understood. But > > consensus isn't so much a state that is achieved or not; it's shades of > > gray. Many people can be silent or not follow-up with a response of any > > kind. In the end, no technical change is voted on, there is just the > > potential for a veto. Announcing concluding intentions (I'm about to go > do > > XYZ) is an opportunity for a veto to be expressed. > > > > On Tue, Dec 19, 2023 at 11:19 AM David Smiley <dsmi...@apache.org> > wrote: > > > > > You may be surprised at what can be accomplished without "consensus" > :-). > > > Vetoes are the blocker. If you/anyone are convinced enough and put > > forth a > > > proposal of what you are going to do, get feedback, and say you are > going > > > to do it (in spite of concerns but obviously try to address them!), go > > for > > > it. > > > > > > On Tue, Dec 19, 2023 at 10:45 AM Ilan Ginzburg <ilans...@gmail.com> > > wrote: > > > > > >> The message by Pierre is regarding fixing existing code. > > >> > > >> The leader on demand doesn't seem to be a short term solution in any > > case, > > >> and there wasn't really a consensus around the proposal. > > >> > > >> Ilan > > >> > > >> On Tue, Dec 19, 2023 at 4:16 PM David Smiley <dsmi...@apache.org> > > wrote: > > >> > > >> > I would be more in favor of going back to the drawing board on > leader > > >> > election than incremental improvements. Go back to first > principles. > > >> The > > >> > clarity just isn't there to be maintained. I don't trust it. > > >> > > > >> > Coincidentally I sent a message to the Apache Curator users list > > >> yesterday > > >> > to inquire about leader prioritization: > > >> > https://lists.apache.org/thread/lmm30qpm17cjf4b93jxv0rt3bq99c0sb > > >> > I suspect the "users" list is too low activity to be useful for the > > >> Curator > > >> > project; I'm going to try elsewhere. > > >> > > > >> > For shards, there doesn't even need to be a "leader election" recipe > > >> > because there are no shard leader threads that always need to be > > >> > thinking/doing stuff, unlike the Overseer. It could be more > > >> demand-driven > > >> > (assign leader on-demand if needs to be re-assigned), and thus be > more > > >> > scalable as well for many shards. > > >> > Some of my ideas on this: > > >> > https://lists.apache.org/thread/kowcp2ftc132pq0y38g9736m0slchjg7 > > >> > > > >> > On Mon, Dec 18, 2023 at 11:33 AM Pierre Salagnac < > > >> > pierre.salag...@gmail.com> > > >> > wrote: > > >> > > > >> > > We recently had a couple of issues with production clusters > because > > of > > >> > race > > >> > > conditions in shard leader election. By race condition here, in > mean > > >> for > > >> > a > > >> > > single node. I'm not discussing how leader election is distributed > > >> > > across multiple Solr nodes, but how multiple threads in a single > > Solr > > >> > node > > >> > > conflict with each other. > > >> > > > > >> > > On the overall, when two threads (on the same server) concurrently > > >> join > > >> > > leader election for the same replica, the outcome is > unpredictable. > > it > > >> > may > > >> > > end in two nodes thinking they are the leader or not having any > > >> leader at > > >> > > all. > > >> > > I identified two scenarios, but maybe there are more: > > >> > > > > >> > > 1. Zookeeper session expires while an election is already in > > progress. > > >> > > When we re-create the Zookeeper session, we re-register all the > > cores, > > >> > and > > >> > > join elections for all of them. If an election is already > > in-progress > > >> or > > >> > is > > >> > > triggered for any reason, we can have two threads on the same Solr > > >> server > > >> > > node running leader election for the same core. > > >> > > > > >> > > 2. Command REJOINLEADERELECTION is received twice concurrently for > > the > > >> > same > > >> > > core. > > >> > > This scenario is much easier to reproduce with an external client. > > It > > >> > > occurs for us since we have customizations using this command. > > >> > > > > >> > > > > >> > > The code for leader election hasn't changed much for a while, and > I > > >> don't > > >> > > understand the full history behind it. I wonder whether > > multithreading > > >> > was > > >> > > already discussed and/or taken into account. The code has a "TODO: > > >> can we > > >> > > even get into this state?" that makes me think this issue was > > already > > >> > > reproduced but noy fully solved/understood. > > >> > > Since this code has many calls to Zookeeper, I don't think we can > > just > > >> > > "synchronize" it with mutual exclusions, as these calls that > involve > > >> the > > >> > > network can be incredibly slow when something bad happens. We > don't > > >> want > > >> > > any thread to be blocked by another waiting for a remote call to > > >> > complete. > > >> > > > > >> > > I would like to get some opinions about making this code more > robust > > >> to > > >> > > concurrency. Unless the main opinion is "no, this code should > > >> actually be > > >> > > mono threaded !", I can give it a try. > > >> > > > > >> > > Thanks > > >> > > > > >> > > > >> > > > > > > > > -- > http://www.needhamsoftware.com (work) > https://a.co/d/b2sZLD9 (my fantasy fiction book) >