[GitHub] [iceberg] rahil-c commented on a diff in pull request #7914: Use SupportsPrefixOperations for Remove OrphanFile Procedure

via GitHub Wed, 12 Jul 2023 12:38:53 -0700


rahil-c commented on code in PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#discussion_r1261647676



##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##########
@@ -303,6 +304,19 @@ private Dataset<String> listedFileDS() {
     List<String> subDirs = Lists.newArrayList();
     List<String> matchingFiles = Lists.newArrayList();
 
+    if (table.io() instanceof SupportsPrefixOperations) {
+      Iterator<org.apache.iceberg.io.FileInfo> iterator = 
((SupportsPrefixOperations) table.io()).listPrefix(location).iterator();

Review Comment:
   Thanks @RussellSpitzer for taking a look. I can make the change to 3.4 only.
   However I think early exit would be ideal, as after this `if` condition we 
would be calling the
   
   ```
       listDirRecursively(
           location,
           predicate,
           hadoopConf.value(),
           MAX_DRIVER_LISTING_DEPTH,
           MAX_DRIVER_LISTING_DIRECT_SUB_DIRS,
           subDirs,
           pathFilter,
           matchingFiles);
   
   .......
    ListDirsRecursively listDirs = new ListDirsRecursively(conf, 
olderThanTimestamp, pathFilter);
    JavaRDD<String> matchingLeafFileRDD = subDirRDD.mapPartitions(listDirs);
   ```
   Which would both be making direct FileSystem `listStatus` calls which we 
ideally want to avoid and just use the S3fileIO. Let me know if there are 
concerns with the early exit/current logic 
   
   cc @jackye1995  
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rahil-c commented on a diff in pull request #7914: Use SupportsPrefixOperations for Remove OrphanFile Procedure

Reply via email to