luyuncheng opened a new pull request, #987: URL: https://github.com/apache/lucene/pull/987
JIRA: https://issues.apache.org/jira/browse/LUCENE-10627 I see When Lucene Do flush and merge store fields, need many memory copies: ``` Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms elapsed=68.76s tid=0x00007ee990002c50 nid=0x3aac54 runnable [0x00007f17718db000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364) at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624) at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682) ``` When Lucene CompressingStoredFieldsWriter do flush documents, it needs many memory copies: - With Lucene90 using LZ4WithPresetDictCompressionMode: 1. bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress 2. compressor copy dict and data into one block buffer 3. do compress 4. copy compressed data out - With Lucene90 using DeflateWithPresetDictCompressionMode: 1. bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress 2. do compress 3. copy compressed data out I think we can `use CompositeByteBuf` to **reduce temp memory copies** : - we do not have to bufferedDocs.toArrayCopy when just need continues content for chunk compress I write a simple mini benchamrk in test code: LZ4WithPresetDict run Capacity:41943040(bytes) , iter 10times: `Origin elapse:5391ms , New elapse:5297ms` DeflateWithPresetDict run Capacity:41943040(bytes), iter 10times: `Origin elapse:115ms, New elapse:12ms` And I run runStoredFieldsBenchmark with doc_limit=-1: shows: Msec to index | BEST_SPEED | BEST_COMPRESSION -- | -- | -- Baseline | 318877.00 | 606288.00 Candidate | 314442.00 | 604719.00 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org