vigyasharma commented on issue #14219:
URL: https://github.com/apache/lucene/issues/14219#issuecomment-2742476154

   > The searchers can then carefully pick and choose which commit points they 
want to switch too, in a bite sized / stepping stone manner
   
   The key here is making searchers refresh on a point-in-time commit instead 
of the latest one. 
   
   Searcher refresh today, for classes like `SearcherManager`, 
`SearcherTaxonomyManager`, or `ReaderManager` is implemented in 
`refreshIfNeeded()`, which in turn, calls 
`DirectoryReader.openIfChanged(oldReader)`. This opens a new reader on the 
latest commit available in the directory. For bite-sized commits, we would want 
to use the `DirectoryReader.openIfChanged(oldReader, commit)` method, and open 
readers on the specific commit we select.
   
   `refreshIfNeeded` is invoked via the base class method – 
`ReferenceManager#maybeRefresh()`, which takes care of reference management 
tasks like acquiring exclusive locks, invoking refreshIfNeeded, doing an atomic 
swap if needed, releasing the old reference, and notifying listeners. This API 
however, does not accept an index commit. It also doesn't feel right to modify 
this API or add new ones, since that would break the reference agnostic nature 
of ReferenceManager.
   
   One way to achieve specific commit refresh, is for the applications to 
extend `SearcherManager` (or `SearcherTaxonomyManager`) and override 
`refreshIfNeeded`. However, this can be a bit involved as applications may have 
to re-implement some logic (like searcher and taxonomy coupling in 
SearcherTaxonomyManager).
   
   To support this out of box within Lucene, we could modify the existing 
`SearcherManager` / `SearcherTaxonomyManager` to use the commit specific 
DirectoryReader refresh, and provide hooks within these classes to select the 
ideal commit. Like a `RefreshCommitSelector` policy that users can implement to 
supply the refresh commit. The policy could default to returning the latest 
directory commit, to stay in line with current behavior. Expert users could 
supply more nuanced logic like picking the newest commit that has < N bytes 
worth of new segment files.
   Alternately, we could also add a new `NRTSearcherManager` with 
above-mentioned support. 
   
   I think these hooks can be useful for segment replication. Would like to 
hear from the community on whether this seems like a useful thing to support 
from within Lucene.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to