wombatu-kun opened a new pull request, #16348:
URL: https://github.com/apache/iceberg/pull/16348
## Summary
Implements the LZ4 codec for Puffin, replacing the long-standing TODOs in
PuffinFormat.compress / PuffinFormat.decompress that pointed at
airlift/aircompressor#142.
## Motivation
Puffin declared `lz4` as a valid codec (used unconditionally for footer
compression via Puffin.write(...).compressFooter()), but the implementation
threw UnsupportedOperationException("Unsupported codec: LZ4"). The referenced
aircompressor PR #142 was never merged into the version Iceberg ships
(io.airlift:aircompressor:2.0.3), which provides only raw LZ4 + Hadoop streams
— not the standard LZ4 *frame* format the Puffin spec requires. As a result,
footer compression was unusable and lz4 blob compression was unreachable.
## Implementation
LZ4 frame support is provided by net.jpountz.lz4 (shipped as
at.yawk.lz4:lz4-java, already pinned in this repo via a CVE resolutionStrategy
substitution). It is promoted from a transitive-only dependency to a direct
implementation dependency of iceberg-core.
- compress: LZ4FrameOutputStream with BLOCKSIZE.SIZE_4MB, the known content
length, and FLG.Bits.CONTENT_SIZE + FLG.Bits.BLOCK_INDEPENDENCE.
- decompress: LZ4FrameInputStream drained via Guava ByteStreams.
This conforms to the Puffin spec: "Single LZ4 compression frame, with
content size present". Content size is encoded in the frame descriptor.
BLOCK_INDEPENDENCE is required by lz4-java (it only supports independent
blocks) and is orthogonal to the spec — it is also the reference lz4 CLI
default. aircompressor is retained for ZSTD.
## Tests
- TestPuffinWriter.testEmptyFooterCompressed converted from a negative test
(asserting the UnsupportedOperationException) to a positive round-trip +
byte-fixture test.
- Added testWriteMetricDataCompressedLz4 / testReadMetricDataCompressedLz4
and testValidateLz4FooterSizeValue, mirroring the existing ZSTD coverage,
against two new committed fixtures (empty-puffin-compressed-footer.bin,
sample-metric-data-compressed-lz4.bin).
- Added codec-level round-trip + empty-input tests in TestPuffinFormat,
parameterized over NONE / LZ4 / ZSTD.
Verified locally: :iceberg-core:build -x integrationTest green;
checkRuntimeDeps green for the spark-4.1 / flink-2.1 / kafka-connect bundles.
## Runtime deps & LICENSE
Making lz4-java a direct dependency of iceberg-core propagates it onto the
runtime classpath of every shaded runtime bundle that ships iceberg-core.
Accordingly:
- runtime-deps.txt baselines updated for the affected bundles (spark
v3.4/v3.5/v4.0/v4.1, flink v1.20/v2.0/v2.1, kafka-connect-runtime). Only the
single new at.yawk.lz4:lz4-java line was added; unrelated patch-level baseline
drift was intentionally left out.
- Bundle LICENSE files updated with a "This product bundles lz4-java"
stanza, mirroring the existing Airlift Aircompressor precedent. lz4-java ships
no NOTICE file, so NOTICE was not modified.
Open item for maintainers: please sanity-check the LICENSE attribution
wording / project URL for the at.yawk.lz4 fork against ASF policy — this is the
documented manual step in runtime-deps.gradle.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]