[PR] Core: HadoopFileIO to support bulk delete through the Hadoop Filesystem APIs [iceberg]

via GitHub Sat, 18 Oct 2025 15:35:18 -0700


steveloughran opened a new pull request, #10233:
URL: https://github.com/apache/iceberg/pull/10233


   Code for #12055 
   
   Using the bulk delete for files eliminates the per-file probe for status of 
the destination object, an issuing of a single delete request and then a probe 
to see if we need to recreate an empty directory marker above it, moving to 
deleting a few hundred objects in a single request at a time.
   
   Tested: S3 london
   
   In https://github.com/apache/hadoop/pull/7316 there's an (uncommitted) PR 
for hadoop which takes an iceberg library and verifies the correct operation of 
the iceberg delete operations
   against a live AWS S3 store with- and without- bulk delete enabled.
   
   
https://github.com/apache/hadoop/blob/d37310cf355f3eb137f925bde9a2a299823b8230/hadoop-tools/hadoop-aws/src/test/java17/org/apache/hadoop/fs/contract/s3a/ITestIcebergBulkDelete.java
   
   This is something we can merge into hadoop as a local regression test once 
Iceberg has a release with this. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Core: HadoopFileIO to support bulk delete through the Hadoop Filesystem APIs [iceberg]

Reply via email to