amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1770581266
########## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ########## @@ -81,6 +81,7 @@ public String partition() { // cache filtered manifests to avoid extra work when commits fail. private final Map<ManifestFile, ManifestFile> filteredManifests = Maps.newConcurrentMap(); + private final Set<String> manifestsWithDeletedFiles = Sets.newConcurrentHashSet(); Review Comment: Thanks for the explanation! >However, we still read all other manifests that don't have matches. If we know that deleteFileManifests can be trusted, we can completely avoid reading manifests to determine what to rewrite. Hm I'll double check if I'm missing something but I think this should already be happening in this PR. Before opening any manifest we check canContainDeletedFiles(ManifestFile manifest). If this is false, we skip the opening up of the manifest. Looking at the internal logic of this method: https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L329 Let me know if there's a flaw in my reasoning: 1. The case where `deleteFileManifests` can be trusted by itself is when there's no expression for deletion, when `hasPathOnlyDeletes` is false, when there are no dropped partitions and when there are no deletes that have aged out. 2. If the file to be deleted is referenced in a manifest the current set will contain that and we return true for that manifest. Let's go to the next case where we know the manifest does not match. 4. Given the statement in 1 where there's no row expression, no path only deletes, no dropped partitions, aged out deletes and there's no referenced manifest we would end up returning false here https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L362. 5. Then we would skip going through the manifest and add it to the filtered set/return https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L304 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org