RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2492296015

   > Does the OpenSearch client directly work with 'n' different log-group 
specific IndexWriters?
   
   While writing logs, OpenSearch will interact with n' different log-group 
specific IndexWriters. For example, if logs are grouped by status codes, a 5xx 
log entry will be written using a 5xx specific IndexWriter.
   
   Conversely for read flows, like creating a reader, retrieving the latest 
commit (or segmentInfo state) associated with a directory (or IndexWriter) (for 
uploading to snapshot or syncing the state of replica from primary during 
checkpoint in SegRep, etc), OpenSearch will interact with Lucene via the 
combined view (parent IndexWriter). This parent Index Writer internally 
references segments of group level IndexWriters (200_0, 300_0 etc).
   
   Having separate IndexWriters for different groups ensures logs with 
different groups are maintained in different segments. Meanwhile, the combined 
view for group-level Segments of a Lucene Index in the form of parent 
IndexWriter provides a common view for operation like opening readers, syncing 
replicas, uploading segmentInfos of an index to a remote snapshot etc.
   
   > When a new log group is discovered, does the client create a new 
IndexWriter and add it to parent?
   
   Number of groups (IndexWriters) will be fixed and will be determined via a 
setting during Index creation.
   
   > Do we really need a parent "IndexWriter" with this approach? Would a 
Multi-Reader on all the child log-group directories work?
   
   Having a Multi-Reader on all the child log-group directories still won't 
provide a unified view of all group level segments associated with a Lucene 
Index. Even now, OpenSearch interacts with a Lucene index not only for indexing 
documents or opening a reader to read these indexed docs, but also for 
retrieving SegmentInfos associated with the latest commit of an IndexWriter 
directory (for eg: for storing snapshots of an Index on a remote store) or for 
obtaining file list associated with a past commit (for deleting unreferenced 
files inside commit deletion policy). Having a common view of multiple group 
level segments as an Index Writer associated with a single Lucene Index ensures 
that a Lucene index still behaves as a single entity (parent IndexWriters can 
be used to get a common commit for group level IndexWriters).
   
   Another approach is to use a SegmentInfos instance instead of an IndexWriter 
to maintain a common view for group level IndexWriters. Since in the above 
approach, parent IndexWriter periodically syncs and combines only segmentInfos 
of group-level IndexWriters, we can replace parent IndexWriter with a 
SegmentInfos as a combined view. This parent SegmentInfos will reference 
segments of group level segments similar to what a parent IndexWriter does.
   
   Let me know if this makes sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to