[jira] [Updated] (SOLR-15052) Reducing overseer bottlenecks using per-replica states

Ishan Chattopadhyaya (Jira) Wed, 16 Dec 2020 02:47:07 -0800


     [ 
https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ishan Chattopadhyaya updated SOLR-15052:
----------------------------------------
    Attachment: per-replica-states-gcp.pdf
        Status: Open  (was: Open)

Attaching GCP performance numbers against branch_8_7.
The "schema optimizations" or "optimized" mentioned refer to SOLR-14827 that 
were tested alongside this change.

The steps to reproduce them:
* On a coordinator node in GCP, clone 
https://github.com/SearchScale/solr-bench/tree/stress-gcp (this branch will be 
later merged into master).
* Follow instructions to run the stress test.
* The config file is here: 
https://github.com/SearchScale/solr-bench/blob/stress-gcp/cluster-test-gcp.json#L60-L80
 (these are relevant lines to consider, if you're just taking a cursory glance).
* This requires a clusterstatus.json file that I have with me locally. I can 
provide it upon request after performing some anonymization, and also based on 
some approvals. The cluster state contains lots and lots of collections, each 
with about 5 shards (on an average), 1 replica each. In the test, only 2500 of 
them are used (as specified in the config).

> Reducing overseer bottlenecks using per-replica states
> ------------------------------------------------------
>
>                 Key: SOLR-15052
>                 URL: https://issues.apache.org/jira/browse/SOLR-15052
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Ishan Chattopadhyaya
>            Priority: Major
>         Attachments: per-replica-states-gcp.pdf
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> This work has the same goal as SOLR-13951, that is to reduce overseer 
> bottlenecks by avoiding replica state updates from going to the state.json 
> via the overseer. However, the approach taken here is different from 
> SOLR-13951 and hence this work supercedes that work.
> The design proposed is here: 
> https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
> Briefly,
> # Every replica's state will be in a separate znode nested under the 
> state.json. It has the name that encodes the replica name, state, leadership 
> status.
> # An additional children watcher to be set on state.json for state changes.
> # Upon a state change, a ZK multi-op to delete the previous znode and add a 
> new znode with new state.
> Differences between this and SOLR-13951,
> # In SOLR-13951, we planned to leverage shard terms for per shard states.
> # As a consequence, the code changes required for SOLR-13951 were massive (we 
> needed a shard state provider abstraction and introduce it everywhere in the 
> codebase).
> # This approach is a drastically simpler change and design.
> Credits for this design is due to [~noble.paul]. [~markrmil...@gmail.com], 
> [~noble.paul] and I have collaborated on this effort. The reference branch 
> takes a conceptually similar (but not identical) approach.
> I shall attach a PR and performance benchmarks shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-15052) Reducing overseer bottlenecks using per-replica states

Reply via email to