aokolnychyi commented on code in PR #11131:
URL: https://github.com/apache/iceberg/pull/11131#discussion_r1792459912


##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -325,7 +341,15 @@ private ManifestFile filterManifest(Schema tableSchema, 
ManifestFile manifest) {
     }
   }
 
+  @SuppressWarnings("checkstyle:CyclomaticComplexity")
   private boolean canContainDeletedFiles(ManifestFile manifest) {
+    boolean manifestReferencedInDelete = 
manifestsWithDeletedFiles.contains(manifest.path());
+    if (manifest.content() == ManifestContent.DELETES && 
allDeletesReferenceManifests) {

Review Comment:
   Ha, I think you discovered a pretty tricky edge case. This means we can't 
blindly trust reported manifest locations. It is also not safe for delete 
files. Suppose we are doing a MoR operation with sync maintenance. It starts 
with S1 but someone concurrently compacts manifests and commits S2. The 
compaction may invalidate all referenced manifests in the MoR operation, so we 
won't be able to apply this optimization at all.
   
   I wonder whether we should let `MergingSnapshotProducer` implementations 
tell us when it is safe to use the accumulated set of manifests as the source 
of truth. We can keep the current logic for accumulating all referenced 
manifests in `ManifestFilterManager` and add an abstract method to 
`MergingSnapshotProducer` to indicate whether it is safe to filter manifests 
using referenced manifest locations. If the child reports yes, we will filter 
manifests prior to passing them to `ManifestFilterManager`. The new method in 
`MergingSnapshotProducer` can probably receive base `TableMetadata` and 
`Snapshot` and return a boolean. To begin with,  row delta and overwrite 
operations can simply check the current snapshot is the start snapshot. If so, 
it should be safe.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to