aokolnychyi commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1792459912
########## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ########## @@ -325,7 +341,15 @@ private ManifestFile filterManifest(Schema tableSchema, ManifestFile manifest) { } } + @SuppressWarnings("checkstyle:CyclomaticComplexity") private boolean canContainDeletedFiles(ManifestFile manifest) { + boolean manifestReferencedInDelete = manifestsWithDeletedFiles.contains(manifest.path()); + if (manifest.content() == ManifestContent.DELETES && allDeletesReferenceManifests) { Review Comment: Ha, I think you discovered a pretty tricky edge case. This means we can't blindly trust reported manifest locations. It is also not safe for delete files. Suppose we are doing a MoR operation with sync maintenance. It starts with S1 but someone concurrently compacts manifests and commits S2. The compaction may invalidate all referenced manifests in the MoR operation, so we won't be able to apply this optimization at all. I wonder whether we should let `MergingSnapshotProducer` implementations tell us when it is safe to use the accumulated set of manifests as the source of truth. We can keep the current logic for accumulating all referenced manifests in `ManifestFilterManager` and add an abstract method to `MergingSnapshotProducer` to indicate whether it is safe to filter manifests using referenced manifest locations. If the child reports yes, we will filter manifests prior to passing them to `ManifestFilterManager`. The new method in `MergingSnapshotProducer` can probably receive base `TableMetadata` and `Snapshot` and return a boolean. To begin with, row delta and overwrite operations can simply check the current snapshot is the start snapshot. If so, it should be safe. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org