[
https://issues.apache.org/jira/browse/SOLR-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250308#comment-17250308
]
Ishan Chattopadhyaya commented on SOLR-14928:
---------------------------------------------
bq. Change cluster state updates so that each (Collection API) command
execution does the update directly in Zookeeper using optimistic locking
(Compare and Swap on the state.json Zookeeper files).
IIUC, the idea is for every node to do a compare-and-set (CAS) on the
state.json to update the state of the replicas it has. This approach will
result in a spinlock when lots of nodes that host the same collection recover
at the same time. Imagine there's a collection with 2000+ replicas, scattered
across many nodes. Restarting all those nodes will result in a lot of
contention and failed updates during the CAS. This spinlock is extremely
inefficient.
Here's a quick comparison that I performed for this approach vs. SOLR-15052:
https://github.com/chatman/experiments/blob/main/src/main/java/StateListVsCASSpinlock.java
{code}
Time to update (CAS): 94584.337722ms
Time to update (States List): 203.532139ms
{code}
^ This was as a result of 2048 shards, updated all at once (using multiple
threads, trying to simulate the behaviour that will result in multiple nodes
recovering at once).
Please let me know if I'm missing something.
> Remove Overseer ClusterStateUpdater
> -----------------------------------
>
> Key: SOLR-14928
> URL: https://issues.apache.org/jira/browse/SOLR-14928
> Project: Solr
> Issue Type: Sub-task
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Reporter: Ilan Ginzburg
> Assignee: Ilan Ginzburg
> Priority: Major
> Labels: cluster, collection-api, overseer
>
> Remove the Overseer {{ClusterStateUpdater}} thread and associated Zookeeper
> queue at {{<_chroot_>/overseer/queue}}.
> Change cluster state updates so that each (Collection API) command execution
> does the update directly in Zookeeper using optimistic locking (Compare and
> Swap on the {{state.json}} Zookeeper files).
> Following this change cluster state updates would still be happening only
> from the Overseer node (that's where Collection API commands are executing),
> but the code will be ready for distribution once such commands can be
> executed by any node (other work done in the context of parent task
> SOLR-14927).
> See the [Cluster State
> Updater|https://docs.google.com/document/d/1u4QHsIHuIxlglIW6hekYlXGNOP0HjLGVX5N6inkj6Ok/edit#heading=h.ymtfm3p518c]
> section in the Removing Overseer doc.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]