GSharayu opened a new pull request #7035:
URL: https://github.com/apache/incubator-pinot/pull/7035


   This ticket is extension of adding new compression codec LZ4 after 
ZSTD(#6876) for Issue (#6804).
   
   When the forward index is not dictionary encoded, we have 2 choices:
   store the data as is (RAW)
   store the data snappy compressed - using snappy compression codec library as 
default
   
   This PR adds supports for LZ4 compression using library - 
https://github.com/lz4/lz4-java. 
   Maven Artifact: https://mvnrepository.com/artifact/org.lz4/lz4-java
   
   LZ4 library offers 2 choices
   1. Fast Compression
   2. High Compression
   We get good compression and decompression speed comparable to Snappy using 
Fast Compression. High Compression has poor performance when compared to 
already added support for ZSTD. The Benchmark for both choices offered by 
library are part of this PR, and the perf document is updated with numbers.
   So based on the user requirements, user can configure via table config on a 
per column basis. **The default behavior continues to remain the same. It is 
snappy for dimension columns and no compression for metric columns.** 
   
   Other table level changes (column renaming, type changing, column dropping, 
index dropping) which are currently not allowed, changing the compression codec 
on an existing noDictionary column from snappy to lz4 or vice-versa will not 
happen since we currently don't have a mechanism for doing this in-place in the 
segment file. Newly pushed segments will pick up the new codec and since the 
codec type is written into the index buffer header, we will be able to read 
both old and new segments.
   
   The library also underneath uses JNI bridge to original LZ4 library 
https://github.com/lz4/lz4 as Apache commons compress library similar to 
Snappy, ZSTD. Also,lz4-java library has been used in popular projects like 
Apache Spark, Apache Kafka and many more. **This library also ensures that if 
native support is not available, it uses pure java implementation.**
   
   Please have a look at LZ4Factory class in lz4-java library for more 
information.
   
   Compatibility docs from library
   Compressors and decompressors are interchangeable: it is perfectly correct 
to compress with the JNI bindings and to decompress with a Java port, or the 
other way around.
   
   The perf comparison document and benchmarking for Snappy, ZSTD, LZ4 are 
updated in doc
   
https://docs.google.com/document/d/1JKLhDm0-gnrRhyBUDge5u4MeGjotRSgjiexJxI_abfk/edit?usp=sharing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to