amogh-jahagirdar commented on code in PR #11131:
URL: https://github.com/apache/iceberg/pull/11131#discussion_r1770581266


##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -81,6 +81,7 @@ public String partition() {
 
   // cache filtered manifests to avoid extra work when commits fail.
   private final Map<ManifestFile, ManifestFile> filteredManifests = 
Maps.newConcurrentMap();
+  private final Set<String> manifestsWithDeletedFiles = 
Sets.newConcurrentHashSet();

Review Comment:
   Thanks for the explanation! 
   
   >However, we still read all other manifests that don't have matches. If we 
know that deleteFileManifests can be trusted, we can completely avoid reading 
manifests to determine what to rewrite.
   
   Hm I'll double check if I'm missing something but I think this should 
already be happening in this PR. Before opening any manifest we check 
canContainDeletedFiles(ManifestFile manifest). If this is false, we skip the 
opening up of the manifest.
   
   Looking at the internal logic of this method:
   
   
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L329
   
   Let me know if there's a flaw in my reasoning:
   
   1. The case where `deleteFileManifests` can be trusted by itself is when 
there's no expression for deletion, when `hasPathOnlyDeletes` is false, when 
there are no dropped partitions and when there are no deletes that have aged 
out. I think defining when it can be trusted is the key part driving the 
current logic, so let me know if you were thinking something else here.
   
   2. If the file to be deleted is referenced in a manifest the current set 
will contain that and we return true for that manifest. Let's go to the next 
case where we know the manifest does not match.
   
   4. Given the statement in 1 where there's no row expression, no path only 
deletes, no dropped partitions, aged out deletes and there's no referenced 
manifest we would end up returning false here 
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L362.
   
   5. Then we would skip going through the manifest and add it to the filtered 
set/return 
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L304
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to