ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1907274886
########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ########## @@ -589,21 +620,42 @@ private FileURI toFileURI(I input) { static class PartitionAwareHiddenPathFilter implements PathFilter, Serializable { private final Set<String> hiddenPathPartitionNames; + private final boolean checkParents; - PartitionAwareHiddenPathFilter(Set<String> hiddenPathPartitionNames) { + PartitionAwareHiddenPathFilter(Set<String> hiddenPathPartitionNames, boolean checkParents) { this.hiddenPathPartitionNames = hiddenPathPartitionNames; + this.checkParents = checkParents; } @Override public boolean accept(Path path) { + if (!checkParents) { + return doAccept(path); + } + + // if any of the parent folders is not accepted then return false + return doAccept(path) && !hasHiddenPttParentFolder(path); + } + + private boolean doAccept(Path path) { return isHiddenPartitionPath(path) || HiddenPathFilter.get().accept(path); } + /** + * Iterates through the parent folders if any of the parent folders of the given path is a + * hidden partition folder. + */ + public boolean hasHiddenPttParentFolder(Path path) { + return Stream.iterate(path, Path::getParent) + .takeWhile(Objects::nonNull) + .anyMatch(parentPath -> !doAccept(parentPath)); + } Review Comment: Now it will check parent folders per file, to ensure none of the parent folder is not hiddenpartition folder. this might be less performant for large list, if performance is a concern. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org