JacobSMoller opened a new pull request, #2252: URL: https://github.com/apache/iceberg-python/pull/2252
<!-- Thanks for opening a pull request! --> # Rationale for this change The `SnappyCodec.decompress()` method has a bug where the CRC32 checksum is extracted from the compressed data **after** the data has already been truncated to remove the checksum. This results in reading the wrong 4 bytes for checksum validation, causing the CRC32 check to fail incorrectly. **Root Cause:** In the current implementation: 1. `data = data[0:-4]` removes the last 4 bytes (checksum) from the data 2. `checksum = data[-4:]` then tries to get the checksum from the already-truncated data 3. This means `checksum` contains the wrong bytes (last 4 bytes of compressed data instead of the actual checksum) **Solution:** Extract the checksum **before** truncating the data: ```python checksum = data[-4:] # store checksum before truncating data data = data[0:-4] # remove checksum from the data ``` This ensures data integrity checks work correctly for snappy-compressed Avro data. # Are these changes tested? The fix resolves the logical error in the checksum extraction order. Existing tests should pass, and any snappy-compressed data with valid checksums will now decompress successfully instead of failing with "Checksum failure" errors. The change is minimal and only reorders two existing lines of code, making it low-risk. # Are there any user-facing changes? **Yes** - This is a bug fix that improves functionality: - **Before:** Snappy-compressed Avro data would fail to decompress with "Checksum failure" errors even when the data and checksum were valid - **After:** Snappy-compressed Avro data with valid checksums will decompress correctly This fix resolves data integrity validation issues for users working with snappy-compressed Avro files. No API changes are introduced. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
