amogh-jahagirdar commented on code in PR #11131:
URL: https://github.com/apache/iceberg/pull/11131#discussion_r1770581266
##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -81,6 +81,7 @@ public String partition() {
// cache filtered manifests to avoid extra work when commits fail.
private final Map<ManifestFile, ManifestFile> filteredManifests =
Maps.newConcurrentMap();
+ private final Set<String> manifestsWithDeletedFiles =
Sets.newConcurrentHashSet();
Review Comment:
Thanks for the explanation!
>However, we still read all other manifests that don't have matches. If we
know that deleteFileManifests can be trusted, we can completely avoid reading
manifests to determine what to rewrite.
Hm I'll double check if I'm missing something but I think this should
already be happening in this PR. Before opening any manifest we check
canContainDeletedFiles(ManifestFile manifest). If this is false, we skip the
opening up of the manifest.
Looking at the internal logic of this method:
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L329
Let me know if there's a flaw in my reasoning:
1. The case where `deleteFileManifests` can be trusted by itself is when
there's no expression for deletion, when `hasPathOnlyDeletes` is false, when
there are no dropped partitions and when there are no deletes that have aged
out. I think defining when it can be trusted is the key part driving the
current logic. I think the previous conditions would determine the state of the
flag you mentioned, so let me know if you were thinking something else here.
2. If the file to be deleted is referenced in a manifest the current set
will contain that and we return true for that manifest. Let's go to the next
case where we know the manifest does not match.
4. Given the statement in 1 where there's no row expression, no path only
deletes, no dropped partitions, aged out deletes and there's no referenced
manifest we would end up returning false here
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L362.
5. Then we would skip going through the manifest and add it to the filtered
set/return
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L304
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]