tflobbe opened a new pull request, #12326: URL: https://github.com/apache/lucene/pull/12326
This was reported in LUCENE-10314/#11350 and I'm surprised it didn't get more attention since it can get indices in a very bad state. This is my understanding, feel free to correct me if I'm wrong: In Lucene 8, when a field was converted from having `IndexOptions.NONE` to `IndexOptions.DOCS`, the merging logic would make the new segment have `IndexOptions.DOCS` and things would continue to work. Of course, docs that had been added before the change would not be searchable on this field, but it allowed the change to happen in-place for an existing index, users would probably want to re-submit all their docs to have the field be really searchable. Lucene 9 no longer supports this for various reasons, but it allows older indices to have this for backwards compatibility. However, the merging logic doesn't seem to handle this well. Instead of turning the `IndexOptions.NONE` into a `IndexOptions.DOCS` for the merged segment, it just takes the `IndexOptions` of whatever the first segment is which causes an invalid merge that fails, assertion error on tests, but an exception like this when running without assertions (from Lucene 9.4.2): ``` java.lang.IllegalArgumentException: cannot write negative vLong (got: -368) at org.apache.lucene.store.DataOutput.writeVLong(DataOutput.java:238) at org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$StatsWriter.add(Lucene90BlockTreeTermsWriter.java:571) at org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$TermsWriter.writeBlock(Lucene90BlockTreeTermsWriter.java:832) at org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$TermsWriter.writeBlocks(Lucene90BlockTreeTermsWriter.java:709) at org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$TermsWriter.finish(Lucene90BlockTreeTermsWriter.java:1105) at org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter.write(Lucene90BlockTreeTermsWriter.java:370) at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:95) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:204) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:208) at org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:293) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:136) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5141) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4681) at org.apache.solr.update.SolrIndexWriter.merge(SolrIndexWriter.java:274) at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6430) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:638) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:699) ``` This makes upgrading pretty risky, because since it happens on merge, Lucene 9.x would be running fine until the merge policy decides to merge segments with different `IndexOptions` which could be weeks or more after the upgrade happened, and at that point, the index can't be rolled back (because it already has 9.x segments) and any backups would be very old. Even more, the outcome of the merge (successful or not) depends on the order in which the segments are selected. As of now, this PR only includes a failing test. This passes in Lucene 8 but fails in 9. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org