I'm trying to understand the needs of a "typical case" better with regards to this proposed design and how it would be negatively impacted. Maybe not at all for NRT as any up-to-date replica can be cheaply made the leader, so it doesn't matter when. A TLOG non-leader has to replay (uses a bunch of threads on the node). In the proposal, this is work that would be completely avoided if the node is unavailable for a duration of time short enough such that there is no indexing. In the so-called "typical case", I suppose this could be seen as doing work to prepare ourselves to be able to index docs right away if one comes in during this period so that we can optimize for indexing availability / performance instead? I think this could easily be a configurable option such that a TLOG replica would observe the non-availability in its leader so that it might take charge and be leader eagerly.
> Maybe basic improvements like that There are already basic node limits for replaying the update log, from what I see. replayUpdateThreads mainly. It defaults to the number of CPU threads. Perhaps in systems you see, it's configured to 500? Based on my recollection of some replay challenges with document versions & locks that Dat & I worked on, I could see how increasing it would be helpful. There is no cap on the number of replays happening, which I could see us wanting to do in order to speed up how soon a replica that is already replaying could become ready. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Mon, Oct 17, 2022 at 3:46 AM Mark Miller <markrmil...@gmail.com> wrote: > Determining the leader is extremely cheap in the general case. It’s when > you have to exchange data (generally when that exchange involves > replication) that’s expensive. Or when you spin up 500 threads for 500 > cheap operations. For the common use case, a very basic and long needed > feature in that regard is simple management. Rather then flood the system > at once with 500 replications, there needs to be a gate on how many > expensive operations like that can occur at once. Same with spinning up 500 > threads. Maybe basic improvements like that won’t be the ideal end game for > a system that wants 100,000 lazy cores where most of them are rarely > active, but there is always going to be lots of tension trying to solve for > the typical use and a system like that. >