jackluo923 opened a new pull request, #14008:
URL: https://github.com/apache/pinot/pull/14008

   # Summary
   This PR builds on #13782 by adding support for specifying segment 
compression using Zstandard and LZ4 via configuration. By default, segments are 
compressed with GZIP. However, users can configure the compression codec using 
the pinot.tar.compression.codec.name property. Currently, we support Zstandard 
and LZ4, but adding other codecs (e.g., LZMA or Snappy) can be done with a 
single-line change. 
   
   # Core Concepts
   The key concept introduced in this PR is the .tar.compressed file extension, 
replacing hard-coded extensions like `.tar.gz`, `.tar.zst`, `.tar.lz4`, etc. 
When this extension is used, the default compressor (configurable at runtime, 
with GZIP as the default) is applied during compression. For decompression, 
widely used compressors (supported by the Apache Commons library) embed magic 
numbers at the file start, allowing Apache commons and many other compression 
libraries to automatic detect the correct decompression method from the 
compressed content itself. Therefore, the file extension doesn’t matter during 
decompression, and `.tar.compressed` serves as a generic placeholder for tar 
archives compressed with any codec.
   
   The rest of the PR revolve around this concept and makes the following 
general changes:
   1. Change hard-coded `.tar.gz` strings when appropriate and references to 
`TarCompressionUtils.TAR_GZ_FILE_EXTENSION` static variable to 
`TarCompressionUtils.TAR_COMPRESSED_FILE_EXTENSION`
   2. Change hard-coded `.metadata.tar.gz` file extension in batch ingestion to 
`.metadata` + TarCompressionUtils.TAR_COMPRESSED_FILE_EXTENSION`.
   3. Check if `pinot.tar.compression.codec.name` is specified in 
`Base[Server|Controller|Minion|Broker]Starter.java` and if so set the default 
Tar compression codec accordingly.
   
   # Compatibility
   Note that this PR maintains backward compatibility with existing Pinot 
code—segments and configs generated by previous versions will work with the 
updated code. However, there is no forward compatibility, as older Pinot 
versions cannot handle the new `.tar.compressed` file extension.
   
   # Important Files
   This PR touches a lot of files. Important source code files to start code 
review is the following: 
   1. TarCompressionUtils.java
   2. CommonConstants.java
   3. Constants.java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to