jackluo923 opened a new pull request, #14008: URL: https://github.com/apache/pinot/pull/14008
# Summary This PR builds on #13782 by adding support for specifying segment compression using Zstandard and LZ4 via configuration. By default, segments are compressed with GZIP. However, users can configure the compression codec using the pinot.tar.compression.codec.name property. Currently, we support Zstandard and LZ4, but adding other codecs (e.g., LZMA or Snappy) can be done with a single-line change. # Core Concepts The key concept introduced in this PR is the .tar.compressed file extension, replacing hard-coded extensions like `.tar.gz`, `.tar.zst`, `.tar.lz4`, etc. When this extension is used, the default compressor (configurable at runtime, with GZIP as the default) is applied during compression. For decompression, widely used compressors (supported by the Apache Commons library) embed magic numbers at the file start, allowing Apache commons and many other compression libraries to automatic detect the correct decompression method from the compressed content itself. Therefore, the file extension doesn’t matter during decompression, and `.tar.compressed` serves as a generic placeholder for tar archives compressed with any codec. The rest of the PR revolve around this concept and makes the following general changes: 1. Change hard-coded `.tar.gz` strings when appropriate and references to `TarCompressionUtils.TAR_GZ_FILE_EXTENSION` static variable to `TarCompressionUtils.TAR_COMPRESSED_FILE_EXTENSION` 2. Change hard-coded `.metadata.tar.gz` file extension in batch ingestion to `.metadata` + TarCompressionUtils.TAR_COMPRESSED_FILE_EXTENSION`. 3. Check if `pinot.tar.compression.codec.name` is specified in `Base[Server|Controller|Minion|Broker]Starter.java` and if so set the default Tar compression codec accordingly. # Compatibility Note that this PR maintains backward compatibility with existing Pinot code—segments and configs generated by previous versions will work with the updated code. However, there is no forward compatibility, as older Pinot versions cannot handle the new `.tar.compressed` file extension. # Important Files This PR touches a lot of files. Important source code files to start code review is the following: 1. TarCompressionUtils.java 2. CommonConstants.java 3. Constants.java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org