amogh-jahagirdar commented on code in PR #11131:
URL: https://github.com/apache/iceberg/pull/11131#discussion_r1776062427


##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -325,7 +341,15 @@ private ManifestFile filterManifest(Schema tableSchema, 
ManifestFile manifest) {
     }
   }
 
+  @SuppressWarnings("checkstyle:CyclomaticComplexity")
   private boolean canContainDeletedFiles(ManifestFile manifest) {
+    boolean manifestReferencedInDelete = 
manifestsWithDeletedFiles.contains(manifest.path());
+    if (manifest.content() == ManifestContent.DELETES && 
allDeletesReferenceManifests) {

Review Comment:
   The reason for this condition on the manifest being a delete manifest is 
that there is still a remaining case that I'm seeing where we cannot use the 
`manifestsWithDeletedFiles` set as an absolute source of truth. It showed up in 
[this 
test](https://github.com/apache/iceberg/blob/main/core/src/test/java/org/apache/iceberg/TestSnapshotManager.java#L58).
   
   When performing the cherry pick operation with a dynamic overwrite, all the 
deleted operations (the deletion of FILE_A) do reference manifests. 
   
   However, the manifest that actually gets passed for filtering is the 
manifest produced for the appending of file A. Even though it's not in the 
manifestsWtihDeletedFiles set, we actually do need to rewrite in this case 
because the cherry pick is replacing the file with FILE_A_REPLACEMENT. 
   
   In essence, we relied on the deleted file partitions for this check 
previously and it seems like we still need to. If we limit for delete file 
cases and all the files to delete reference manifests I don't think there would 
be a case where if manifest was not in the set we'd still need to rewrite it 
since it's not possible to just delete during a cherry pick.
   
   This implementation does avoid the false positive reads for delete manifests 
which will be important for sync maintenance as well as also efficiently 
identifies for both data/delete manifests which are referenced, and that seems 
to bring a benefit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to