Not saying we shouldn't have leadership in one spot, but there are down-sides. Internally, loading a DocCollection (cached consolidated representation of a Collection's total state) would require visiting ZK for each shard to get the leader. But at least it's cached so I suppose it's fine. I suspect the PRS stuff (something I haven't looked at closely) is similar.
~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Thu, Feb 16, 2023 at 11:52 AM Ilan Ginzburg <ilans...@gmail.com> wrote: > Hi, > > We're testing SolrCloud under high scale and high load (many replicas per > node, multiple collection creations, nodes up and down, backed up Overseer > queues) and are *running into shard leader election issues* when state.json > and the Zookeeper leader registration node for the shard disagree (leader > registration node in Zookeeper is /collections/*<collectionName>*/leaders/ > *<shardName>*/leader). > The inconsistency is the consequence of a delayed update of state.json due > to an overloaded Overseer cluster state change update queue (which sadly is > consumed by a single Overseer thread). > > I want your opinion on *no longer tracking shard leaders in state.json but > only relying on the ephemeral shard ZK leader registration node*. > > Minor side benefit would be less updates to state.json and less watches and > nodes fetching it. > > Note that relying on state.json to determine who the leader is as currently > done implies dealing with stale/incorrect data since state.json is updated > async via watches. I mention that because it means caching of the > Zookeeper leader registration node content is likely ok, no need to fetch > it anew on every access. > > Thanks, > Ilan >