GSharayu opened a new pull request #7035: URL: https://github.com/apache/incubator-pinot/pull/7035
This ticket is extension of adding new compression codec LZ4 after ZSTD(#6876) for Issue (#6804). When the forward index is not dictionary encoded, we have 2 choices: store the data as is (RAW) store the data snappy compressed - using snappy compression codec library as default This PR adds supports for LZ4 compression using library - https://github.com/lz4/lz4-java. Maven Artifact: https://mvnrepository.com/artifact/org.lz4/lz4-java LZ4 library offers 2 choices 1. Fast Compression 2. High Compression We get good compression and decompression speed comparable to Snappy using Fast Compression. High Compression has poor performance when compared to already added support for ZSTD. The Benchmark for both choices offered by library are part of this PR, and the perf document is updated with numbers. So based on the user requirements, user can configure via table config on a per column basis. **The default behavior continues to remain the same. It is snappy for dimension columns and no compression for metric columns.** Other table level changes (column renaming, type changing, column dropping, index dropping) which are currently not allowed, changing the compression codec on an existing noDictionary column from snappy to lz4 or vice-versa will not happen since we currently don't have a mechanism for doing this in-place in the segment file. Newly pushed segments will pick up the new codec and since the codec type is written into the index buffer header, we will be able to read both old and new segments. The library also underneath uses JNI bridge to original LZ4 library https://github.com/lz4/lz4 as Apache commons compress library similar to Snappy, ZSTD. Also,lz4-java library has been used in popular projects like Apache Spark, Apache Kafka and many more. **This library also ensures that if native support is not available, it uses pure java implementation.** Please have a look at LZ4Factory class in lz4-java library for more information. Compatibility docs from library Compressors and decompressors are interchangeable: it is perfectly correct to compress with the JNI bindings and to decompress with a Java port, or the other way around. The perf comparison document and benchmarking for Snappy, ZSTD, LZ4 are updated in doc https://docs.google.com/document/d/1JKLhDm0-gnrRhyBUDge5u4MeGjotRSgjiexJxI_abfk/edit?usp=sharing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org