[ https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ishan Chattopadhyaya updated SOLR-15052: ---------------------------------------- Description: This work has the same goal as SOLR-13951, that is to reduce overseer bottlenecks by avoiding replica state updates from going to the state.json via the overseer. However, the approach taken here is different from SOLR-13951 and hence this work supercedes that work. The design proposed is here: https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit Briefly, # Every replica's state will be in a separate znode nested under the state.json. It has the name that encodes the replica name, state, leadership status. # An additional children watcher to be set on state.json for state changes. # Upon a state change, a ZK multi-op to delete the previous znode and add a new znode with new state. Differences between this and SOLR-13951, # In SOLR-13951, we planned to leverage shard terms for per shard states. # As a consequence, the code changes required for SOLR-13951 were massive (we needed a shard state provider abstraction and introduce it everywhere in the codebase). # This approach is a drastically simpler change and design. Credits for this design is due to [~noble.paul]. [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this effort. The reference branch takes a conceptually similar (but not identical) approach. PR and benchmarks are attached, as referred to in the comments. was: This work has the same goal as SOLR-13951, that is to reduce overseer bottlenecks by avoiding replica state updates from going to the state.json via the overseer. However, the approach taken here is different from SOLR-13951 and hence this work supercedes that work. The design proposed is here: https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit Briefly, # Every replica's state will be in a separate znode nested under the state.json. It has the name that encodes the replica name, state, leadership status. # An additional children watcher to be set on state.json for state changes. # Upon a state change, a ZK multi-op to delete the previous znode and add a new znode with new state. Differences between this and SOLR-13951, # In SOLR-13951, we planned to leverage shard terms for per shard states. # As a consequence, the code changes required for SOLR-13951 were massive (we needed a shard state provider abstraction and introduce it everywhere in the codebase). # This approach is a drastically simpler change and design. Credits for this design is due to [~noble.paul]. [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this effort. The reference branch takes a conceptually similar (but not identical) approach. I shall attach a PR and performance benchmarks shortly. > Reducing overseer bottlenecks using per-replica states > ------------------------------------------------------ > > Key: SOLR-15052 > URL: https://issues.apache.org/jira/browse/SOLR-15052 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Ishan Chattopadhyaya > Priority: Major > Attachments: per-replica-states-gcp.pdf > > Time Spent: 10m > Remaining Estimate: 0h > > This work has the same goal as SOLR-13951, that is to reduce overseer > bottlenecks by avoiding replica state updates from going to the state.json > via the overseer. However, the approach taken here is different from > SOLR-13951 and hence this work supercedes that work. > The design proposed is here: > https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit > Briefly, > # Every replica's state will be in a separate znode nested under the > state.json. It has the name that encodes the replica name, state, leadership > status. > # An additional children watcher to be set on state.json for state changes. > # Upon a state change, a ZK multi-op to delete the previous znode and add a > new znode with new state. > Differences between this and SOLR-13951, > # In SOLR-13951, we planned to leverage shard terms for per shard states. > # As a consequence, the code changes required for SOLR-13951 were massive (we > needed a shard state provider abstraction and introduce it everywhere in the > codebase). > # This approach is a drastically simpler change and design. > Credits for this design is due to [~noble.paul]. > [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this > effort. The reference branch takes a conceptually similar (but not identical) > approach. > PR and benchmarks are attached, as referred to in the comments. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org