vigyasharma commented on issue #14219: URL: https://github.com/apache/lucene/issues/14219#issuecomment-2742476154
> The searchers can then carefully pick and choose which commit points they want to switch too, in a bite sized / stepping stone manner The key here is making searchers refresh on a point-in-time commit instead of the latest one. Searcher refresh today, for classes like `SearcherManager`, `SearcherTaxonomyManager`, or `ReaderManager` is implemented in `refreshIfNeeded()`, which in turn, calls `DirectoryReader.openIfChanged(oldReader)`. This opens a new reader on the latest commit available in the directory. For bite-sized commits, we would want to use the `DirectoryReader.openIfChanged(oldReader, commit)` method, and open readers on the specific commit we select. `refreshIfNeeded` is invoked via the base class method – `ReferenceManager#maybeRefresh()`, which takes care of reference management tasks like acquiring exclusive locks, invoking refreshIfNeeded, doing an atomic swap if needed, releasing the old reference, and notifying listeners. This API however, does not accept an index commit. It also doesn't feel right to modify this API or add new ones, since that would break the reference agnostic nature of ReferenceManager. One way to achieve specific commit refresh, is for the applications to extend `SearcherManager` (or `SearcherTaxonomyManager`) and override `refreshIfNeeded`. However, this can be a bit involved as applications may have to re-implement some logic (like searcher and taxonomy coupling in SearcherTaxonomyManager). To support this out of box within Lucene, we could modify the existing `SearcherManager` / `SearcherTaxonomyManager` to use the commit specific DirectoryReader refresh, and provide hooks within these classes to select the ideal commit. Like a `RefreshCommitSelector` policy that users can implement to supply the refresh commit. The policy could default to returning the latest directory commit, to stay in line with current behavior. Expert users could supply more nuanced logic like picking the newest commit that has < N bytes worth of new segment files. Alternately, we could also add a new `NRTSearcherManager` with above-mentioned support. I think these hooks can be useful for segment replication. Would like to hear from the community on whether this seems like a useful thing to support from within Lucene. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org