amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1776062427
########## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ########## @@ -325,7 +341,15 @@ private ManifestFile filterManifest(Schema tableSchema, ManifestFile manifest) { } } + @SuppressWarnings("checkstyle:CyclomaticComplexity") private boolean canContainDeletedFiles(ManifestFile manifest) { + boolean manifestReferencedInDelete = manifestsWithDeletedFiles.contains(manifest.path()); + if (manifest.content() == ManifestContent.DELETES && allDeletesReferenceManifests) { Review Comment: The reason for this condition on the manifest being a delete manifest is that there is still a remaining case that I'm seeing where we cannot use the `manifestsWithDeletedFiles` set as an absolute source of truth. It showed up in [this test](https://github.com/apache/iceberg/blob/main/core/src/test/java/org/apache/iceberg/TestSnapshotManager.java#L58). When performing the cherry pick operation with a dynamic overwrite, all the deleted operations (the deletion of FILE_A) do reference manifests. However, the manifest that actually gets passed for filtering is the manifest produced for the appending of file A. Even though it's not in the manifestsWtihDeletedFiles set, we actually do need to rewrite in this case because the cherry pick is replacing the file with FILE_A_REPLACEMENT. We relied on the deleted file partitions for this check previously to identify that the manifest with FILE_A needs to be rewritten and it seems like we still need to. If we limit for delete file cases and all the files to delete reference manifests I don't think there would be a case where if manifest was not in the set we'd still need to rewrite it since it's not possible to just delete during a cherry pick. This implementation does avoid the false positive reads for delete manifests which will be important for sync maintenance as well as also efficiently identifies for both data/delete manifests which are referenced, and that seems to bring a benefit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org