[ https://issues.apache.org/jira/browse/LUCENE-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186556#comment-17186556 ]
Adrien Grand commented on LUCENE-9447: -------------------------------------- Thanks [~simonw]! > Make BEST_COMPRESSION compress more aggressively? > ------------------------------------------------- > > Key: LUCENE-9447 > URL: https://issues.apache.org/jira/browse/LUCENE-9447 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Fix For: 8.7 > > Time Spent: 50m > Remaining Estimate: 0h > > The Lucene86 codec supports setting a "Mode" for stored fields compression, > that is either "BEST_SPEED", which translates to blocks of 16kB or 128 > documents (whichever is hit first) compressed with LZ4, or > "BEST_COMPRESSION", which translates to blocks of 60kB or 512 documents > compressed with DEFLATE with default compression level (6). > After looking at indices that spent most disk space on stored fields > recently, I noticed that there was quite some room for improvement by > increasing the block size even further: > ||Block size||Stored fields size|| > |60kB|168412338| > |128kB|130813639| > |256kB|113587009| > |512kB|104776378| > |1MB|100367095| > |2MB|98152464| > |4MB|97034425| > |8MB|96478746| > For this specific dataset, I had 1M documents that each had about 2kB of > stored fields each and quite some redundancy. > This makes me want to look into bumping this block size to maybe 256kB. It > would be interesting to re-do the experiments we did on LUCENE-6100 to see > how this affects the merging speed. That said I don't think it would be > terrible if the merging time increased a bit given that we already offer the > BEST_SPEED option for CPU-savvy users. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org