amogh-jahagirdar commented on code in PR #11131:
URL: https://github.com/apache/iceberg/pull/11131#discussion_r1813980941


##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -323,11 +345,15 @@ private ManifestFile filterManifest(Schema tableSchema, 
ManifestFile manifest) {
       PartitionSpec spec = reader.spec();
       PartitionAndMetricsEvaluator evaluator =
           new PartitionAndMetricsEvaluator(tableSchema, spec, 
deleteExpression);
+      boolean hasDeletedFiles = 
manifestsReferencedForDeletes.contains(manifest.path());
+      if (hasDeletedFiles) {

Review Comment:
   Yes, canContainDeletedFiles should skip entirely if we can trust the 
referenced set.
   
   >I think part of the problem is that this is trying to use instance state 
rather than passing data through methods. I think the logic would be more clear 
if state were passed through args rather than relying on instance fields 
several calls down the chain.
   
   Sounds good, I think what we could do is filter manager surfaces the 
referenced manifests if all the removals are performed with a referenced 
manifest location. Then in merging snapshot producer we leverage that set to 
filter down the set that gets passed to the filter manager to begin with.  I 
still think this particular case would be a good optimization (the case where a 
file is known to be in a manfiest referenced in a delete) so that we avoid 
opening up the manifest and evaluating stats etc, and just skip to writing the 
new manifest.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to