[jira] [Comment Edited] (SOLR-15052) Reducing overseer bottlenecks using per-replica states

Noble Paul (Jira) Mon, 21 Dec 2020 20:31:34 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253268#comment-17253268
 ]


Noble Paul edited comment on SOLR-15052 at 12/22/20, 4:30 AM:
--------------------------------------------------------------

{quote}This does mean that all replica state updates are still serialized for a 
collection.
{quote}
How? It does not. Please read the code once again. The CAS is not expecting the 
cversion to be the same old.

You have  misunderstood how it works.

example:

at timestamp t1, the cversion is 100

{{R1:1:D}}
{{R2:3:D}}
{{R3:2:A}}

say  R2 & R1 try to manipulate the state in parallel. 

{{DELETE : R1:1:D, CREATE: R1:2:A}}
{{DELETE : R2:3:D, CREATE: R2:4:A}}

 both read cversion 100 and go ahead and  try to do separate multi operations.  
both will succeed in parallel because both are trying to delete different 
nodes. R1 does not care what happens to R3 and vice versa

after the operations are complete, the cversion becomes 104 and it's all 
totally fine


was (Author: noble.paul):
{quote}This does mean that all replica state updates are still serialized for a 
collection.
{quote}
How? It does not. Please read the code once again. The CAS is not expecting the 
cversion to be the same old. 

You jave totally misunderstood how it works.

> Reducing overseer bottlenecks using per-replica states
> ------------------------------------------------------
>
>                 Key: SOLR-15052
>                 URL: https://issues.apache.org/jira/browse/SOLR-15052
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Ishan Chattopadhyaya
>            Priority: Major
>         Attachments: per-replica-states-gcp.pdf
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> This work has the same goal as SOLR-13951, that is to reduce overseer 
> bottlenecks by avoiding replica state updates from going to the state.json 
> via the overseer. However, the approach taken here is different from 
> SOLR-13951 and hence this work supercedes that work.
> The design proposed is here: 
> https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
> Briefly,
> # Every replica's state will be in a separate znode nested under the 
> state.json. It has the name that encodes the replica name, state, leadership 
> status.
> # An additional children watcher to be set on state.json for state changes.
> # Upon a state change, a ZK multi-op to delete the previous znode and add a 
> new znode with new state.
> Differences between this and SOLR-13951,
> # In SOLR-13951, we planned to leverage shard terms for per shard states.
> # As a consequence, the code changes required for SOLR-13951 were massive (we 
> needed a shard state provider abstraction and introduce it everywhere in the 
> codebase).
> # This approach is a drastically simpler change and design.
> Credits for this design and the PR is due to [~noble.paul]. 
> [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this 
> effort. The reference branch takes a conceptually similar (but not identical) 
> approach.
> I shall attach a PR and performance benchmarks shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-15052) Reducing overseer bottlenecks using per-replica states

Reply via email to