benwtrent opened a new issue, #13353: URL: https://github.com/apache/lucene/issues/13353
### Description There has been a nasty test failure in ES for awhile: https://github.com/elastic/elasticsearch/issues/105122 The test simulates a document indexing failure. It turns out, that this test failure is caused by a series of strange conditions in Lucene. If we fail on indexing a field, but have points value field that comes AFTER the field that is indexing, things will blow up when opening a reader if the writer has soft-deletes enabled. The failure description is as follows: - First, we have an IndexWriter configured with soft-deletes & no commits on closing - Index a document with fields as follows [<field that will throw>, <nice point field>] - We update the FieldInfos eagerly here: https://github.com/apache/lucene/blob/0aa88910ca9a1032d288996d14203eac4953f2de/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L592-L603 - FieldInfos now indicate we have a point field - The field that will throw is handled, document indexing fails. Since this is a regular text field, it does not automatically close the indexer - A NRT reader is opened on the writer and attempts to flush, but the field info is incorrect given the fields that are there with soft-delete (e.g. don't delete the segment) <details> <summary> Test that replicates the failure </summary> ```java public void testExceptionJustBeforeFlushWithPointValues() throws Exception { Directory dir = newDirectory(); Analyzer analyzer = new Analyzer(Analyzer.PER_FIELD_REUSE_STRATEGY) { @Override public TokenStreamComponents createComponents(String fieldName) { MockTokenizer tokenizer = new MockTokenizer(MockTokenizer.WHITESPACE, false); tokenizer.setEnableChecks( false); // disable workflow checking as we forcefully close() in exceptional cases. TokenStream stream = new CrashingFilter(fieldName, tokenizer); return new TokenStreamComponents(tokenizer, stream); } }; DirectoryReader r = null; IndexWriterConfig iwc = newIndexWriterConfig(analyzer).setCommitOnClose(false).setMaxBufferedDocs(3); MergePolicy mp = iwc.getMergePolicy(); iwc.setMergePolicy( new SoftDeletesRetentionMergePolicy("soft_delete", MatchAllDocsQuery::new, mp)); IndexWriter w = RandomIndexWriter.mockIndexWriter(dir, iwc, random()); Document newdoc = new Document(); newdoc.add(newTextField("crash", "do it on token 4", Field.Store.NO)); newdoc.add(new IntPoint("int", 17)); expectThrows(IOException.class, () -> w.addDocument(newdoc)); try { r = w.getReader(false, false); } catch (AlreadyClosedException ace) { // expected } dir.close(); } ``` </details> The exception thrown is: ``` Caused by: java.io.FileNotFoundException: No sub-file with id .kdi found in compound file "_0.cfs" (fileName=_0.kdi files: [_Lucene99_0.tip, .nvm, .fnm, .tvd, _Lucene99_0.doc, _Lucene99_0.tim, _Lucene99_0.pos, .tvm, _Lucene99_0.tmd, .fdm, .nvd, .fdx, .tvx, .fdt]) at org.apache.lucene.codecs.lucene90.Lucene90CompoundReader.openInput(Lucene90CompoundReader.java:170) at org.apache.lucene.codecs.lucene90.Lucene90PointsReader.<init>(Lucene90PointsReader.java:63) at org.apache.lucene.codecs.lucene90.Lucene90PointsFormat.fieldsReader(Lucene90PointsFormat.java:74) at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:152) ... 55 more ``` ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org