aokolnychyi commented on code in PR #11131:
URL: https://github.com/apache/iceberg/pull/11131#discussion_r1792459912
##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -325,7 +341,15 @@ private ManifestFile filterManifest(Schema tableSchema,
ManifestFile manifest) {
}
}
+ @SuppressWarnings("checkstyle:CyclomaticComplexity")
private boolean canContainDeletedFiles(ManifestFile manifest) {
+ boolean manifestReferencedInDelete =
manifestsWithDeletedFiles.contains(manifest.path());
+ if (manifest.content() == ManifestContent.DELETES &&
allDeletesReferenceManifests) {
Review Comment:
Ha, I think you discovered a pretty tricky edge case. This means we can't
blindly trust reported manifest locations. It is also not safe for delete
files. Suppose we are doing a MoR operation with sync maintenance. It starts
with S1 but someone concurrently compacts manifests and commits S2. The
compaction may invalidate all referenced manifests in the MoR operation, so we
won't be able to apply this optimization at all.
I wonder whether we should let `MergingSnapshotProducer` implementations
tell us when it is safe to use the accumulated set of manifests as the source
of truth. We can keep the current logic for accumulating all referenced
manifests in `ManifestFilterManager` and add an abstract method to
`MergingSnapshotProducer` to indicate whether it is safe to filter manifests
using referenced manifest locations. If the child reports yes, we will filter
manifests prior to passing them to `ManifestFilterManager`. The new method in
`MergingSnapshotProducer` can probably receive base `TableMetadata` and
`Snapshot` and return a boolean. To begin with, row delta and overwrite
operations can simply check the current snapshot is the start snapshot. If so,
it should be safe.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]