JacobSMoller opened a new pull request, #2252:
URL: https://github.com/apache/iceberg-python/pull/2252

   <!--
   Thanks for opening a pull request!
   -->
   
   # Rationale for this change
   
   The `SnappyCodec.decompress()` method has a bug where the CRC32 checksum is 
extracted from the compressed data **after** the data has already been 
truncated to remove the checksum. This results in reading the wrong 4 bytes for 
checksum validation, causing the CRC32 check to fail incorrectly.
   
   **Root Cause:**
   In the current implementation:
   1. `data = data[0:-4]` removes the last 4 bytes (checksum) from the data
   2. `checksum = data[-4:]` then tries to get the checksum from the 
already-truncated data  
   3. This means `checksum` contains the wrong bytes (last 4 bytes of 
compressed data instead of the actual checksum)
   
   **Solution:**
   Extract the checksum **before** truncating the data:
   ```python
   checksum = data[-4:]  # store checksum before truncating data
   data = data[0:-4]     # remove checksum from the data
   ```
   
   This ensures data integrity checks work correctly for snappy-compressed Avro 
data.
   
   # Are these changes tested?
   
   The fix resolves the logical error in the checksum extraction order. 
Existing tests should pass, and any snappy-compressed data with valid checksums 
will now decompress successfully instead of failing with "Checksum failure" 
errors.
   
   The change is minimal and only reorders two existing lines of code, making 
it low-risk.
   
   # Are there any user-facing changes?
   
   **Yes** - This is a bug fix that improves functionality:
   
   - **Before:** Snappy-compressed Avro data would fail to decompress with 
"Checksum failure" errors even when the data and checksum were valid
   - **After:** Snappy-compressed Avro data with valid checksums will 
decompress correctly
   
   This fix resolves data integrity validation issues for users working with 
snappy-compressed Avro files. No API changes are introduced.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to