Hi, We have run into overseer and many state.json update issues in the past.
Now, we have enabled the "perReplicaState" protocol on the collection. That means any update to replica status doesn't go through an overseer. Replica/solr-core updates its status directly to zookeeper(bypass the overseer). This status is maintained as state.json children in zookeeper. ---------- (CONNECTED [localhost) /collections/test1> ls state.json core_node2:2:A:L --------------- New solr collection can created by passing the "perReplicaState" property to true https://solr.apache.org/guide/solr/latest/deployment-guide/collection-management.html#create ------------- perReplicaState Optional Default: false If true the states of individual replicas will be maintained as individual child of the state.json. ------------------- There is also an option to modify this property using modifycollection api https://solr.apache.org/guide/solr/latest/deployment-guide/collection-management.html#modifycollection-parameters I hope this helps. Thanks. Hitesh. On Thu, Feb 16, 2023 at 8:52 AM Ilan Ginzburg <ilans...@gmail.com> wrote: > Hi, > > We're testing SolrCloud under high scale and high load (many replicas per > node, multiple collection creations, nodes up and down, backed up Overseer > queues) and are *running into shard leader election issues* when state.json > and the Zookeeper leader registration node for the shard disagree (leader > registration node in Zookeeper is /collections/*<collectionName>*/leaders/ > *<shardName>*/leader). > The inconsistency is the consequence of a delayed update of state.json due > to an overloaded Overseer cluster state change update queue (which sadly is > consumed by a single Overseer thread). > > I want your opinion on *no longer tracking shard leaders in state.json but > only relying on the ephemeral shard ZK leader registration node*. > > Minor side benefit would be less updates to state.json and less watches and > nodes fetching it. > > Note that relying on state.json to determine who the leader is as currently > done implies dealing with stale/incorrect data since state.json is updated > async via watches. I mention that because it means caching of the > Zookeeper leader registration node content is likely ok, no need to fetch > it anew on every access. > > Thanks, > Ilan >