tflobbe opened a new pull request, #12326:
URL: https://github.com/apache/lucene/pull/12326

   This was reported in LUCENE-10314/#11350 and I'm surprised it didn't get 
more attention since it can get indices in a very bad state. 
   This is my understanding, feel free to correct me if I'm wrong:
   In Lucene 8, when a field was converted from having `IndexOptions.NONE` to 
`IndexOptions.DOCS`, the merging logic would make the new segment have 
`IndexOptions.DOCS` and things would continue to work.  Of course, docs that 
had been added before the change would not be searchable on this field, but it 
allowed the change to happen in-place for an existing index, users would 
probably want to re-submit all their docs to have the field be really 
searchable. Lucene 9 no longer supports this for various reasons, but it allows 
older indices to have this for backwards compatibility. However, the merging 
logic doesn't seem to handle this well. Instead of turning the 
`IndexOptions.NONE` into a `IndexOptions.DOCS` for the merged segment, it just 
takes the `IndexOptions` of whatever the first segment is which causes an 
invalid merge that fails, assertion error on tests, but an exception like this 
when running without assertions (from Lucene 9.4.2):
   
   ```
   java.lang.IllegalArgumentException: cannot write negative vLong (got: -368)
        at org.apache.lucene.store.DataOutput.writeVLong(DataOutput.java:238)
        at 
org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$StatsWriter.add(Lucene90BlockTreeTermsWriter.java:571)
        at 
org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$TermsWriter.writeBlock(Lucene90BlockTreeTermsWriter.java:832)
        at 
org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$TermsWriter.writeBlocks(Lucene90BlockTreeTermsWriter.java:709)
        at 
org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$TermsWriter.finish(Lucene90BlockTreeTermsWriter.java:1105)
        at 
org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter.write(Lucene90BlockTreeTermsWriter.java:370)
        at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:95)
        at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:204)
        at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:208)
        at 
org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:293)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:136)
        at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5141)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4681)
        at 
org.apache.solr.update.SolrIndexWriter.merge(SolrIndexWriter.java:274)
        at 
org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6430)
        at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:638)
        at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:699)
   
   ```
   
   This makes upgrading pretty risky, because since it happens on merge, Lucene 
9.x would be running fine until the merge policy decides to merge segments with 
different `IndexOptions` which could be weeks or more after the upgrade 
happened, and at that point, the index can't be rolled back (because it already 
has 9.x segments) and any backups would be very old. Even more, the outcome of 
the merge (successful or not) depends on the order in which the segments are 
selected.
   
   As of now, this PR only includes a failing test. This passes in Lucene 8 but 
fails in 9. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to