[jira] [Commented] (LUCENE-9919) ZSTD Compressor/Decompressor support in Lucene

Michael McCandless (Jira) Mon, 12 Apr 2021 10:48:07 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319623#comment-17319623
 ]


Michael McCandless commented on LUCENE-9919:
--------------------------------------------

How about just building a custom {{Codec}} using pure Java ZSTD implementation, 
instead of JDK DEFLATE, and then run our standard {{luceneutil}} benchmarks?  
This is a great example use-case of why we built the {{Codec}} API in the first 
place – to enable fun experimentation exactly like this!

> ZSTD Compressor/Decompressor support in Lucene
> ----------------------------------------------
>
>                 Key: LUCENE-9919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9919
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: Praveen Nishchal
>            Priority: Major
>              Labels: compression, lucene, zstandard
>
> Lucene currently supports LZ4 and Zlib compression/decompression for 
> StoredFieldsFormat, DocValuesFormat, TermVectorsFormat and PostingsFormat 
> codecs. We propose Zstandard ([https://facebook.github.io/zstd/]) 
> compression/decompression for all codecs mentioned earlier for following 
> reasons:
>  * ZStandard is being used in some of the most popular open source projects 
> like Apache Cassandra, Hadoop and Kafka.
>  * Zstandard, at the default setting of 3, is expected to show substantial 
> improvements in both compression and decompression speed, while compressing 
> at the same ratio as zlib as per study mentioned by Yann Collet at Facebook.
>  * Zstandard currently offers 22 different Compression levels, which enable 
> flexible, granular trade-offs between compression speed and ratios for future 
> data. For example, we can use level 1 if speed is most important and level 22 
> if size is most important.
>  * Zstandard designed to scale with modern hardware.
>  * Small data
>           - It has APIs for dictionary compression as well. Small data 
> compression can range anywhere from 2x to 5x better than compression without 
> dictionaries.
>  * Zstandard is being continuously improved by Facebook/Community.
>  
> Kindly go through below link for more details:       
> [https://engineering.fb.com/2016/08/31/core-data/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9919) ZSTD Compressor/Decompressor support in Lucene

Reply via email to