ajantha-bhat commented on code in PR #12278:
URL: https://github.com/apache/iceberg/pull/12278#discussion_r1958839422


##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##########
@@ -301,24 +303,34 @@ private Dataset<FileURI> actualFileIdentDS() {
 
   private Dataset<String> listedFileDS() {
     List<String> subDirs = Lists.newArrayList();
-    List<String> matchingFiles = Lists.newArrayList();
+    Set<String> matchingFiles = Sets.newHashSet();
 
     Predicate<FileStatus> predicate = file -> file.getModificationTime() < 
olderThanTimestamp;
     PathFilter pathFilter = 
PartitionAwareHiddenPathFilter.forSpecs(table.specs());
 
+    List<String> locationsToList = Lists.newArrayList();
+    if (location.equals(table.location())) {
+      locationsToList.add(dataLocation());
+      locationsToList.add(metadataFileLocation());
+    } else {
+      locationsToList.add(location);
+    }

Review Comment:
   What if users have modified data and metadata location table properties 
multiple times? In that case orphan files from old location will never be 
cleaned up? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to