amogh-jahagirdar commented on code in PR #10962:
URL: https://github.com/apache/iceberg/pull/10962#discussion_r1877273362


##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -363,6 +363,10 @@ private ManifestFile filterManifest(
   }
 
   private boolean canContainDeletedFiles(ManifestFile manifest, boolean 
trustManifestReferences) {
+    if (manifest.minSequenceNumber() > 0 && manifest.minSequenceNumber() < 
minSequenceNumber) {
+      return true;
+    }

Review Comment:
   Sorry for the delay, yes I believe you are right that the last condition 
specifically at 
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L427
 is redundant, we can remove that! There is no longer any need to rely on that 
condition because the only time we'd hit that point is the manifest needs to be 
rewritten for correctness based on the other criteria. Then when the manifest 
does get rewritten if there happen to be aged out deletes, we will rewrite them 
here 
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L475
   
   OK so breaking down the pros/cons for eagerly removing aged out deletes 
during commit where a manifest *only* has aged out deletes: 
   
   The pro of eager removal of aged out deletes is that space consumed by 
metadata is reduced, and also planning times are at least slightly reduced. 
   
   The downside of eager removal is that even if a manifest does not need to be 
rewritten, the commit process is incurring additional latency and risk of 
failure (the IO involved) for writing out a new manifest.
   
   IMO I feel like the upside for reclaiming some metadata storage/reducing 
plan times, doesn't justify the risk of failure/additional latency being 
incurred by the process on the commit path. Maybe in the most extreme case 
where some large percentage of manifests _only_ have aged out deletes, is it 
worth it? 
   
   I think you brought up good point though on rewrite manifests, the Java API 
itself will just retain all the delete manifests but I do think the spark 
action does support rewriting delete manifests 
https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteManifestsSparkAction.java#L179
   
   Perhaps it would be better for rewrite manifests to remove the aged out 
deletes? At that point the work for removing the aged out deletes is more 
intentional in the context of table maintenance and not on the general commit 
path for `MergingSnapshotProducer` implementations? I'll need to think more on 
this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to