Mohit Godwani created LUCENE-9993:
-------------------------------------

             Summary: IndexWriter Initialisation dependent on .fnm files
                 Key: LUCENE-9993
                 URL: https://issues.apache.org/jira/browse/LUCENE-9993
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/index
            Reporter: Mohit Godwani


I am working on creating an abstraction over Lucene wherein I have 2 places 
where data is stored: local disk and remote cloud storage. In case the host on 
which index is present gets terminated due to some issue, I want to be able to 
replicate the index on another host.

While trying to recreate index on another host, I start by downloading the 
metadata files associated with the index (segment_N, .si files) and once done 
with this, I try to initialise an IndexWriter object on top of the local 
directory to which this has been downloaded from remote storage. This helps me 
begin indexing the data (I don't have any updateDocs call and only addDocs 
operation is used) without the need to download data for older segments.

While doing so, I am seeing error during the initialization of IndexWriter 
itself as it tries to get the [field number 
mappings|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L1116]
 for the previous segments before it can create the IndexWriter object.

With the compound file system enabled, this requires to download .cfs files 
from the remote storage which in turn increases the time required to initialize 
the IndexWriter, and thus the time before which new host can accept the 
incoming requests increases resulting in the application rejecting a large 
number of customer requests.

* Why do we need the fnm files from previous segments while creating the 
IndexWriter?

* Could you help with a workaround for this to prevent downloading the extra 
files apart from commit metadata

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to