murblanc commented on pull request #2318: URL: https://github.com/apache/lucene-solr/pull/2318#issuecomment-776840944
> Thanks for your review, @murblanc. I learned a lot from your detailed review comment. Thank you! > I've added a commit to let the overseer thread refresh the view of the cluster state immediately after the collection is created. Please review. CC @noblepaul. I have very serious doubts about the proposed change. Forcing a cluster state updater refresh in that way is going to force it to re-read all relevant ZK data through `forciblyRefreshAllClusterStateSlow()` as it does during Overseer start up, stalling processing of the state update queue in the meantime. And this will happen each time a new PRS collection is created. And it is not even sufficient... If you look at `ClusterStateUpdater.run()`, that flag was really meant to be used once at the beginning. There are two processing loops, and the inner loop happens after the refresh and will not look at the flag, so some processing can take place before the flag is even considered. So it may well be that the PRS collection is created, the flag is set on the Overseer, the creation call returns, client sends another Collection API related to the collection but `ClusterStateUpdater` hasn't updated yet from ZK (hasn't even started to update). This type of race would in theory be possible even without the inner loop given the batching done by the cluster state updater (+ the waiting time to see if new messages arrive in order to batch them together). I really feel that a proper refactoring of what gets executed and where it gets executed is a better option (esp. for the master branch). Why not submit the `state.json` through the Overseer as happens for non PRS collections then create the PRS znodes in the Collection API call to not stall the cluster state updater for too long? If you really have to force an Overseer `ClusterStateUpdater` refresh for that collection anyway, introducing a new type of cluster state updater message (in `OverseerAction`) called `LOADCOLLECTION` for example that triggers the loading of that single collection from ZK into the state used by the cluster state updater seems to me a better option. I don't like it either, but I hate it less. Note the client will see correct behavior only if it then submits the collection creation call and wait for it to return before submitting another Collection API call for that collection, to make sure that the `LOADCOLLECTION` gets executed first (with current Solr client can submit collection creation immediately followed by another Collection API call for that collection and they will be executed in order). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org