[I] NRT failure due to SegmentInfo & File mismatch [lucene]

via GitHub Wed, 08 May 2024 13:36:45 -0700


benwtrent opened a new issue, #13353:
URL: https://github.com/apache/lucene/issues/13353


   ### Description
   
   There has been a nasty test failure in ES for awhile: 
https://github.com/elastic/elasticsearch/issues/105122
   
   The test simulates a document indexing failure. It turns out, that this test 
failure is caused by a series of strange conditions in Lucene. If we fail on 
indexing a field, but have points value field that comes AFTER the field that 
is indexing, things will blow up when opening a reader if the writer has 
soft-deletes enabled. 
   
   The failure description is as follows:
   
    - First, we have an IndexWriter configured with soft-deletes & no commits 
on closing
    - Index a document with fields as follows [<field that will throw>, <nice 
point field>]
    - We update the FieldInfos eagerly here: 
https://github.com/apache/lucene/blob/0aa88910ca9a1032d288996d14203eac4953f2de/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L592-L603
    - FieldInfos now indicate we have a point field
    - The field that will throw is handled, document indexing fails. Since this 
is a regular text field, it does not automatically close the indexer
    - A NRT reader is opened on the writer and attempts to flush, but the field 
info is incorrect given the fields that are there with soft-delete (e.g. don't 
delete the segment)
   
   
   <details>
   
   <summary> Test that replicates the failure </summary>
   
   ```java
   
     public void testExceptionJustBeforeFlushWithPointValues() throws Exception 
{
       Directory dir = newDirectory();
       Analyzer analyzer =
           new Analyzer(Analyzer.PER_FIELD_REUSE_STRATEGY) {
             @Override
             public TokenStreamComponents createComponents(String fieldName) {
               MockTokenizer tokenizer = new 
MockTokenizer(MockTokenizer.WHITESPACE, false);
               tokenizer.setEnableChecks(
                   false); // disable workflow checking as we forcefully 
close() in exceptional cases.
               TokenStream stream = new CrashingFilter(fieldName, tokenizer);
               return new TokenStreamComponents(tokenizer, stream);
             }
           };
       DirectoryReader r = null;
       IndexWriterConfig iwc =
           
newIndexWriterConfig(analyzer).setCommitOnClose(false).setMaxBufferedDocs(3);
       MergePolicy mp = iwc.getMergePolicy();
       iwc.setMergePolicy(
           new SoftDeletesRetentionMergePolicy("soft_delete", 
MatchAllDocsQuery::new, mp));
       IndexWriter w = RandomIndexWriter.mockIndexWriter(dir, iwc, random());
       Document newdoc = new Document();
       newdoc.add(newTextField("crash", "do it on token 4", Field.Store.NO));
       newdoc.add(new IntPoint("int", 17));
       expectThrows(IOException.class, () -> w.addDocument(newdoc));
       try {
         r = w.getReader(false, false);
       } catch (AlreadyClosedException ace) {
         // expected
       }
       dir.close();
     }
   ```
   
   </details>
   
   The exception thrown is:
   
   ```
           Caused by:
           java.io.FileNotFoundException: No sub-file with id .kdi found in 
compound file "_0.cfs" (fileName=_0.kdi files: [_Lucene99_0.tip, .nvm, .fnm, 
.tvd, _Lucene99_0.doc, _Lucene99_0.tim, _Lucene99_0.pos, .tvm, _Lucene99_0.tmd, 
.fdm, .nvd, .fdx, .tvx, .fdt])
               at 
org.apache.lucene.codecs.lucene90.Lucene90CompoundReader.openInput(Lucene90CompoundReader.java:170)
               at 
org.apache.lucene.codecs.lucene90.Lucene90PointsReader.<init>(Lucene90PointsReader.java:63)
               at 
org.apache.lucene.codecs.lucene90.Lucene90PointsFormat.fieldsReader(Lucene90PointsFormat.java:74)
               at 
org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:152)
               ... 55 more
   ```
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[I] NRT failure due to SegmentInfo & File mismatch [lucene]

Reply via email to