chenzl25 opened a new pull request, #2513:
URL: https://github.com/apache/iceberg-rust/pull/2513

   ## What
   
   Fix deletion vector reads from Puffin files by using the manifest-provided 
blob range for direct access.
   
   For Puffin position delete files, Iceberg manifest entries carry 
`content_offset`, `content_size_in_bytes`, and `referenced_data_file`. These 
identify the deletion-vector blob inside the Puffin file. The reader now uses 
that range directly instead of first parsing the Puffin footer.
   
   ## Why
   
   The previous path tried to parse Puffin file metadata before reading the 
deletion-vector blob. That fails when the read path is already scoped to the DV 
blob range, because the blob payload does not start with the Puffin file magic 
`PFA1`.
   
   This could produce errors like:
   
   ```text
   Bad magic value: [1, 0, 0, 0] should be [80, 70, 65, 49]
   ```
   
   Spark handles deletion vectors through the manifest-provided blob 
offset/size, so this aligns iceberg-rust with the Iceberg direct-access model 
for deletion vectors.
   
   Changes:
   - Add referenced_data_file to FileScanTask.
   - Propagate referenced_data_file from delete manifest entries into scan 
tasks.
   - For Puffin deletion vectors, read content_offset..content_offset + 
content_size_in_bytes directly.
   - Construct a deletion-vector-v1 blob from the direct range and parse it 
with DeleteVector::from_puffin_blob.
   - Keep the existing Puffin footer parsing path as fallback when 
referenced_data_file is unavailable.
   - Use path#offset:length as the positional delete load key for Puffin files, 
so multiple DV blobs in one Puffin file are handled independently.
   - Add a focused test covering direct blob-range reads from a real Puffin 
file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to