I don't have an answer to this. It's been that way forever from my perspective. Maybe Mark Miller knows (cc on this replay).
TBH even the leader registration node might be one too many and the election directory itself could be sufficient, but we haven't seen issues there of inconsistencies between the two. Ilan On Thu, Feb 16, 2023 at 7:48 PM Jason Gerlowski <gerlowsk...@gmail.com> wrote: > Hi Ilan, > > You mention a few good reasons to avoid tracking leadership in > state.json (temporarily inconsistent ZK state, additional load on ZK, > etc.). Do you know why the leader information was put in state.json in > the first place? Would we have to give something up if we switch to > only looking up leadership under > /collections/<collectionName>/leaders/ > <shardName>/leader? > > Apologies for my lack of context. > > Best, > > Jason > > On Thu, Feb 16, 2023 at 12:31 PM Hitesh Khamesra > <hit...@fullstory.com.invalid> wrote: > > > > Hi, > > > > We have run into overseer and many state.json update issues in the past. > > > > Now, we have enabled the "perReplicaState" protocol on the collection. > That > > means any update to replica status doesn't go through an overseer. > > Replica/solr-core updates its status directly to zookeeper(bypass the > > overseer). This status is maintained as state.json children in > zookeeper. > > > > ---------- > > (CONNECTED [localhost) /collections/test1> ls state.json > > core_node2:2:A:L > > --------------- > > > > New solr collection can created by passing the "perReplicaState" property > > to true > > > https://solr.apache.org/guide/solr/latest/deployment-guide/collection-management.html#create > > ------------- > > perReplicaState > > Optional > > Default: false > > If true the states of individual replicas will be maintained as > individual > > child of the state.json. > > ------------------- > > There is also an option to modify this property using modifycollection > api > > > https://solr.apache.org/guide/solr/latest/deployment-guide/collection-management.html#modifycollection-parameters > > > > > > I hope this helps. > > > > Thanks. > > Hitesh. > > > > On Thu, Feb 16, 2023 at 8:52 AM Ilan Ginzburg <ilans...@gmail.com> > wrote: > > > > > Hi, > > > > > > We're testing SolrCloud under high scale and high load (many replicas > per > > > node, multiple collection creations, nodes up and down, backed up > Overseer > > > queues) and are *running into shard leader election issues* when > state.json > > > and the Zookeeper leader registration node for the shard disagree > (leader > > > registration node in Zookeeper is > /collections/*<collectionName>*/leaders/ > > > *<shardName>*/leader). > > > The inconsistency is the consequence of a delayed update of state.json > due > > > to an overloaded Overseer cluster state change update queue (which > sadly is > > > consumed by a single Overseer thread). > > > > > > I want your opinion on *no longer tracking shard leaders in state.json > but > > > only relying on the ephemeral shard ZK leader registration node*. > > > > > > Minor side benefit would be less updates to state.json and less > watches and > > > nodes fetching it. > > > > > > Note that relying on state.json to determine who the leader is as > currently > > > done implies dealing with stale/incorrect data since state.json is > updated > > > async via watches. I mention that because it means caching of the > > > Zookeeper leader registration node content is likely ok, no need to > fetch > > > it anew on every access. > > > > > > Thanks, > > > Ilan > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > For additional commands, e-mail: dev-h...@solr.apache.org > >