Re: [PR] Core: HadoopFileIO to support bulk delete through the Hadoop Filesystem APIs [iceberg]

via GitHub Tue, 18 Mar 2025 09:47:09 -0700


nastra commented on code in PR #10233:
URL: https://github.com/apache/iceberg/pull/10233#discussion_r2001502807



##########
core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java:
##########
@@ -173,26 +203,223 @@ public void deletePrefix(String prefix) {
     }
   }
 
+  /**
+   * Initialize the wrapped IO class if configured to do so.
+   *
+   * @return true if bulk delete should be used.
+   */
+  private synchronized boolean maybeUseBulkDeleteApi() {
+    if (!bulkDeleteConfigured.compareAndSet(false, true)) {
+      // configured already, so return.
+      return useBulkDelete;
+    }
+    boolean enableBulkDelete = conf().getBoolean(BULK_DELETE_ENABLED, 
BULK_DELETE_ENABLED_DEFAULT);
+    if (!enableBulkDelete) {
+      LOG.debug("Bulk delete is disabled");
+      useBulkDelete = false;
+    } else {
+      // library is configured to use bulk delete, so try to load it
+      // and probe for the bulk delete methods being found.
+      // this is only satisfied on Hadoop releases with the WrappedIO class.
+      wrappedIO = new DynamicWrappedIO(getClass().getClassLoader());

Review Comment:
   I guess I don't fully follow why we need to load all of this stuff 
dynamically. Iceberg is on Hadoop 3.4.1, so we should be able to use the Bulk 
Delete API of Hadoop directly?
   Additionally I'm not convinced that we would need to have a 
`BULK_DELETE_ENABLED` property. We would use bulk deletion by default and 
fallback in case it's not available



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: HadoopFileIO to support bulk delete through the Hadoop Filesystem APIs [iceberg]

Reply via email to