Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

via GitHub Fri, 17 Jan 2025 16:36:34 -0800


danielcweeks commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1920903258



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##########
@@ -292,19 +294,49 @@ private Dataset<FileURI> validFileIdentDS() {
 
   private Dataset<FileURI> actualFileIdentDS() {
     StringToFileURI toFileURI = new StringToFileURI(equalSchemes, 
equalAuthorities);
+    Dataset<String> dataList;
     if (compareToFileList == null) {
-      return toFileURI.apply(listedFileDS());
+      dataList =
+          table.io() instanceof SupportsPrefixOperations ? listWithPrefix() : 
listWithoutPrefix();
     } else {
-      return toFileURI.apply(filteredCompareToFileList());
+      dataList = filteredCompareToFileList();
     }
+
+    return toFileURI.apply(dataList);
+  }
+
+  @VisibleForTesting
+  Dataset<String> listWithPrefix() {

Review Comment:
   We don't want to add delimiter if at all possible. I think the right 
approach is to enumerate the key space of the first character (or the first few 
characters) and then distribute the key space for executors to process as 
tasks. 
   
   Depending on the layout strategy, this could be different, but it is 
generally predictable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

Reply via email to