DaveCTurner opened a new issue, #13354:
URL: https://github.com/apache/lucene/issues/13354

   ### Description
   
   We see some Lucene indices taking many seconds (occasionally minutes) to 
abort merges during rollback, doing a lot of now-pointless IO, with the merge 
thread spending all its time within a call to one of the various 
`checkIntegrity` methods that reads a file from beginning to end. For instance:
   
   ```
       ⋮
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.BufferedChecksumIndexInput.readBytes(BufferedChecksumIndexInput.java:46)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.DataInput.readBytes(DataInput.java:73)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.ChecksumIndexInput.skipByReading(ChecksumIndexInput.java:79)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.ChecksumIndexInput.seek(ChecksumIndexInput.java:64)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:619)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader.checkIntegrity(Lucene90CompressingStoredFieldsReader.java:725)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.merge(Lucene90CompressingStoredFieldsWriter.java:609)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:234)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger$$Lambda/0x00000080029f0c00.merge(Unknown
 Source)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:273)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:110)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5252)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4740)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6541)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639)
       
app/org.elasticsearch.server@8.15.0/org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:118)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700)
   ```
   
   Also here, although this is using a 
`org.elasticsearch.index.codec.postings.ES812PostingsReader` it doesn't look to 
be doing anything different from 
`org.elasticsearch.xpack.lucene.bwc.codecs.lucene50.Lucene50PostingsReader#checkIntegrity`:
   
   ```
       ⋮
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.BufferedChecksumIndexInput.readBytes(BufferedChecksumIndexInput.java:46)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.DataInput.readBytes(DataInput.java:73)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.ChecksumIndexInput.skipByReading(ChecksumIndexInput.java:79)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.ChecksumIndexInput.seek(ChecksumIndexInput.java:64)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:619)
       
app/org.elasticsearch.server@8.15.0/org.elasticsearch.index.codec.postings.ES812PostingsReader.checkIntegrity(ES812PostingsReader.java:1975)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsReader.checkIntegrity(Lucene90BlockTreeTermsReader.java:338)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.checkIntegrity(PerFieldPostingsFormat.java:370)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.perfield.PerFieldMergeState$FilterFieldsProducer.checkIntegrity(PerFieldMergeState.java:296)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:83)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:205)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:209)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger$$Lambda/0x0000007802b71ab8.merge(Unknown
 Source)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:298)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5252)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4740)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6541)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639)
       
app/org.elasticsearch.server@8.15.0/org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:118)
       
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700)
   ```
   
   The data in these cases is in rather cold storage so we would expect it to 
take quite some time (possibly minutes) to complete this end-to-end read. 
That's ok, we don't need such merges to complete especially quickly, but it is 
rather troublesome that it takes so long to react to the abort signal in these 
situations. Is there something we can do to abort this read more promptly? For 
instance, could we add an abort-sensitive wrapper to the `DataInput` that's 
reading the data?
   
   ### Version and environment details
   
   Lucene 9.10 embedded in Elasticsearch (here the `main` branch, currently 
targetting `8.15.0-SNAPSHOT`) but this behaviour does not seem to be at all new.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to