[
https://issues.apache.org/jira/browse/LUCENE-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand resolved LUCENE-9447.
----------------------------------
Fix Version/s: 8.7
Resolution: Fixed
> Make BEST_COMPRESSION compress more aggressively?
> -------------------------------------------------
>
> Key: LUCENE-9447
> URL: https://issues.apache.org/jira/browse/LUCENE-9447
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Fix For: 8.7
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> The Lucene86 codec supports setting a "Mode" for stored fields compression,
> that is either "BEST_SPEED", which translates to blocks of 16kB or 128
> documents (whichever is hit first) compressed with LZ4, or
> "BEST_COMPRESSION", which translates to blocks of 60kB or 512 documents
> compressed with DEFLATE with default compression level (6).
> After looking at indices that spent most disk space on stored fields
> recently, I noticed that there was quite some room for improvement by
> increasing the block size even further:
> ||Block size||Stored fields size||
> |60kB|168412338|
> |128kB|130813639|
> |256kB|113587009|
> |512kB|104776378|
> |1MB|100367095|
> |2MB|98152464|
> |4MB|97034425|
> |8MB|96478746|
> For this specific dataset, I had 1M documents that each had about 2kB of
> stored fields each and quite some redundancy.
> This makes me want to look into bumping this block size to maybe 256kB. It
> would be interesting to re-do the experiments we did on LUCENE-6100 to see
> how this affects the merging speed. That said I don't think it would be
> terrible if the merging time increased a bit given that we already offer the
> BEST_SPEED option for CPU-savvy users.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]