[jira] [Updated] (LUCENE-9919) ZSTD Compressor/Decompressor support in Lucene

Praveen Nishchal (Jira) Fri, 09 Apr 2021 03:37:04 -0700


     [ 
https://issues.apache.org/jira/browse/LUCENE-9919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Praveen Nishchal updated LUCENE-9919:
-------------------------------------
    Description: 
Lucene currently supports LZ4 and Zlib compression/decompression for 
StoredFieldsFormat, DocValuesFormat, TermVectorsFormat and PostingsFormat 
codecs. We propose Zstandard ([https://facebook.github.io/zstd/]) 
compression/decompression for all codecs mentioned earlier for following 
reasons:
 * ZStandard is being used in some of the most popular open source projects 
like Apache Cassandra, Hadoop and Kafka.
 * Zstandard, at the default setting of 3, is expected to show substantial 
improvements in both compression and decompression speed, while compressing at 
the same ratio as zlib as per study mentioned by Yann Collet at Facebook.
 * Zstandard currently offers 22 different Compression levels, which enable 
flexible, granular trade-offs between compression speed and ratios for future 
data. For example, we can use level 1 if speed is most important and level 22 
if size is most important.
 * Zstandard designed to scale with modern hardware.
 * Small data

          - It has APIs for dictionary compression as well. Small data 
compression can range anywhere from 2x to 5x better than compression without 
dictionaries.
 * Zstandard is being continuously improved by Facebook/Community.

 

Kindly go through below link for more details:       

[https://engineering.fb.com/2016/08/31/core-data/smaller-and-faster-data-compression-with-zstandard/]

  was:
Lucene currently supports LZ4 and Zlib compression/decompression for 
StoredFieldsFormat, DocValuesFormat, TermVectorsFormat and PostingsFormat 
codecs. We propose Zstandard ([https://facebook.github.io/zstd/]) 
compression/decompression for all codecs mentioned earlier for following 
reasons:
 * ZStandard is being used in some of the most popular open source projects 
like Apache Cassandra, Hadoop and Kafka.
 * Zstandard, at the default setting of 3, is expected to show substantial 
improvements in both compression and decompression speed, while compressing at 
the same ratio as zlib as per study mentioned by Yann Collet at Facebook.
 * Zstandard currently offers 22 different Compression levels, which enable 
flexible, granular trade-offs between compression speed and ratios for future 
data. For example, we can use level 1 if speed is most important and level 22 
if size is most important.
 * Zstandard designed to scale with modern hardware.
 * Small data

          - It has APIs for dictionary compression as well. Small data 
compression can range                       anywhere from 2x to 5x better than 
compression without dictionaries.
 * Zstandard is being continuously improved by Facebook/Community.

 

Kindly go through below link for more details:       

[https://engineering.fb.com/2016/08/31/core-data/smaller-and-faster-data-compression-with-zstandard/]


> ZSTD Compressor/Decompressor support in Lucene
> ----------------------------------------------
>
>                 Key: LUCENE-9919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9919
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: Praveen Nishchal
>            Priority: Major
>              Labels: compression, lucene, zstandard
>         Attachments: RE_ JIRA Draft.msg
>
>
> Lucene currently supports LZ4 and Zlib compression/decompression for 
> StoredFieldsFormat, DocValuesFormat, TermVectorsFormat and PostingsFormat 
> codecs. We propose Zstandard ([https://facebook.github.io/zstd/]) 
> compression/decompression for all codecs mentioned earlier for following 
> reasons:
>  * ZStandard is being used in some of the most popular open source projects 
> like Apache Cassandra, Hadoop and Kafka.
>  * Zstandard, at the default setting of 3, is expected to show substantial 
> improvements in both compression and decompression speed, while compressing 
> at the same ratio as zlib as per study mentioned by Yann Collet at Facebook.
>  * Zstandard currently offers 22 different Compression levels, which enable 
> flexible, granular trade-offs between compression speed and ratios for future 
> data. For example, we can use level 1 if speed is most important and level 22 
> if size is most important.
>  * Zstandard designed to scale with modern hardware.
>  * Small data
>           - It has APIs for dictionary compression as well. Small data 
> compression can range anywhere from 2x to 5x better than compression without 
> dictionaries.
>  * Zstandard is being continuously improved by Facebook/Community.
>  
> Kindly go through below link for more details:       
> [https://engineering.fb.com/2016/08/31/core-data/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9919) ZSTD Compressor/Decompressor support in Lucene

Reply via email to