[
https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ishan Chattopadhyaya updated SOLR-15052:
----------------------------------------
Attachment: per-replica-states-gcp.pdf
Status: Open (was: Open)
Attaching GCP performance numbers against branch_8_7.
The "schema optimizations" or "optimized" mentioned refer to SOLR-14827 that
were tested alongside this change.
The steps to reproduce them:
* On a coordinator node in GCP, clone
https://github.com/SearchScale/solr-bench/tree/stress-gcp (this branch will be
later merged into master).
* Follow instructions to run the stress test.
* The config file is here:
https://github.com/SearchScale/solr-bench/blob/stress-gcp/cluster-test-gcp.json#L60-L80
(these are relevant lines to consider, if you're just taking a cursory glance).
* This requires a clusterstatus.json file that I have with me locally. I can
provide it upon request after performing some anonymization, and also based on
some approvals. The cluster state contains lots and lots of collections, each
with about 5 shards (on an average), 1 replica each. In the test, only 2500 of
them are used (as specified in the config).
> Reducing overseer bottlenecks using per-replica states
> ------------------------------------------------------
>
> Key: SOLR-15052
> URL: https://issues.apache.org/jira/browse/SOLR-15052
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Ishan Chattopadhyaya
> Priority: Major
> Attachments: per-replica-states-gcp.pdf
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> This work has the same goal as SOLR-13951, that is to reduce overseer
> bottlenecks by avoiding replica state updates from going to the state.json
> via the overseer. However, the approach taken here is different from
> SOLR-13951 and hence this work supercedes that work.
> The design proposed is here:
> https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
> Briefly,
> # Every replica's state will be in a separate znode nested under the
> state.json. It has the name that encodes the replica name, state, leadership
> status.
> # An additional children watcher to be set on state.json for state changes.
> # Upon a state change, a ZK multi-op to delete the previous znode and add a
> new znode with new state.
> Differences between this and SOLR-13951,
> # In SOLR-13951, we planned to leverage shard terms for per shard states.
> # As a consequence, the code changes required for SOLR-13951 were massive (we
> needed a shard state provider abstraction and introduce it everywhere in the
> codebase).
> # This approach is a drastically simpler change and design.
> Credits for this design is due to [~noble.paul]. [[email protected]],
> [~noble.paul] and I have collaborated on this effort. The reference branch
> takes a conceptually similar (but not identical) approach.
> I shall attach a PR and performance benchmarks shortly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]