Mohit Godwani created LUCENE-9993:
-------------------------------------
Summary: IndexWriter Initialisation dependent on .fnm files
Key: LUCENE-9993
URL: https://issues.apache.org/jira/browse/LUCENE-9993
Project: Lucene - Core
Issue Type: Improvement
Components: core/index
Reporter: Mohit Godwani
I am working on creating an abstraction over Lucene wherein I have 2 places
where data is stored: local disk and remote cloud storage. In case the host on
which index is present gets terminated due to some issue, I want to be able to
replicate the index on another host.
While trying to recreate index on another host, I start by downloading the
metadata files associated with the index (segment_N, .si files) and once done
with this, I try to initialise an IndexWriter object on top of the local
directory to which this has been downloaded from remote storage. This helps me
begin indexing the data (I don't have any updateDocs call and only addDocs
operation is used) without the need to download data for older segments.
While doing so, I am seeing error during the initialization of IndexWriter
itself as it tries to get the [field number
mappings|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L1116]
for the previous segments before it can create the IndexWriter object.
With the compound file system enabled, this requires to download .cfs files
from the remote storage which in turn increases the time required to initialize
the IndexWriter, and thus the time before which new host can accept the
incoming requests increases resulting in the application rejecting a large
number of customer requests.
* Why do we need the fnm files from previous segments while creating the
IndexWriter?
* Could you help with a workaround for this to prevent downloading the extra
files apart from commit metadata
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]