rahil-c commented on code in PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#discussion_r1261647676


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##########
@@ -303,6 +304,19 @@ private Dataset<String> listedFileDS() {
     List<String> subDirs = Lists.newArrayList();
     List<String> matchingFiles = Lists.newArrayList();
 
+    if (table.io() instanceof SupportsPrefixOperations) {
+      Iterator<org.apache.iceberg.io.FileInfo> iterator = 
((SupportsPrefixOperations) table.io()).listPrefix(location).iterator();

Review Comment:
   Thanks @RussellSpitzer for taking a look. I can make the change to 3.4 only.
   However I think early exit would be ideal, as after this `if` condition we 
would be calling the
   
   ```
       listDirRecursively(
           location,
           predicate,
           hadoopConf.value(),
           MAX_DRIVER_LISTING_DEPTH,
           MAX_DRIVER_LISTING_DIRECT_SUB_DIRS,
           subDirs,
           pathFilter,
           matchingFiles);
   
   .......
    ListDirsRecursively listDirs = new ListDirsRecursively(conf, 
olderThanTimestamp, pathFilter);
    JavaRDD<String> matchingLeafFileRDD = subDirRDD.mapPartitions(listDirs);
   ```
   Which would both be making direct FileSystem `listStatus` calls which we 
ideally want to avoid and just use the S3fileIO. Let me know if there are 
concerns with the early exit/current logic 
   
   cc @jackye1995  
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to