luyuncheng opened a new pull request, #987:
URL: https://github.com/apache/lucene/pull/987

   JIRA: https://issues.apache.org/jira/browse/LUCENE-10627
   I see When Lucene Do flush and merge store fields, need many memory copies:
   ```
   Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms 
elapsed=68.76s tid=0x00007ee990002c50 nid=0x3aac54 runnable  
[0x00007f17718db000]
      java.lang.Thread.State: RUNNABLE
        at 
org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271)
        at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239)
        at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169)
        at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654)
        at 
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
        at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364)
        at 
org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923)
        at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
        at 
org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100)
        at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682)
   ```
   
   When Lucene CompressingStoredFieldsWriter do flush documents, it needs many 
memory copies:
   
   - With Lucene90 using LZ4WithPresetDictCompressionMode:
   
   1. bufferedDocs.toArrayCopy copy blocks into one continue content for chunk 
compress
   2. compressor copy dict and data into one block buffer
   3. do compress
   4. copy compressed data out
   
   - With Lucene90 using DeflateWithPresetDictCompressionMode:
   
   1. bufferedDocs.toArrayCopy copy blocks into one continue content for chunk 
compress
   2. do compress
   3. copy compressed data out
   
   I think we can `use CompositeByteBuf` to **reduce temp memory copies** :
   - we do not have to bufferedDocs.toArrayCopy when just need continues 
content for chunk compress
   
   
   I write a simple mini benchamrk in test code:
   LZ4WithPresetDict run Capacity:41943040(bytes) , iter 10times: 
   `Origin elapse:5391ms , New elapse:5297ms`
   DeflateWithPresetDict run Capacity:41943040(bytes), iter 10times: 
   `Origin elapse:115ms, New elapse:12ms`
    
   And I run runStoredFieldsBenchmark with doc_limit=-1:
   shows:
   
   
   Msec to index | BEST_SPEED | BEST_COMPRESSION
   -- | -- | --
   Baseline | 318877.00 | 606288.00
   Candidate | 314442.00 | 604719.00
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to