amogh-jahagirdar commented on code in PR #10962: URL: https://github.com/apache/iceberg/pull/10962#discussion_r1877273362
########## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ########## @@ -363,6 +363,10 @@ private ManifestFile filterManifest( } private boolean canContainDeletedFiles(ManifestFile manifest, boolean trustManifestReferences) { + if (manifest.minSequenceNumber() > 0 && manifest.minSequenceNumber() < minSequenceNumber) { + return true; + } Review Comment: Sorry for the delay, yes I believe you are right that the last condition specifically at https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L427 is redundant, we can remove that! There is no longer any need to rely on that condition because the only time we'd hit that point is the manifest needs to be rewritten for correctness based on the other criteria. Then when the manifest does get rewritten if there happen to be aged out deletes, we will rewrite them here https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L475 OK so breaking down the pros/cons for eagerly removing aged out deletes during commit where a manifest *only* has aged out deletes: The pro of eager removal of aged out deletes is that space consumed by metadata is reduced, and also planning times are at least slightly reduced. The downside of eager removal is that even if a manifest does not need to be rewritten, the commit process is incurring additional latency and risk of failure (the IO involved) for writing out a new manifest. IMO I feel like the upside for reclaiming some metadata storage/reducing plan times, doesn't justify the risk of failure/additional latency being incurred by the process on the commit path. Maybe in the most extreme case where some large percentage of manifests _only_ have aged out deletes, is it worth it? I think you brought up good point though on rewrite manifests, the Java API itself will just retain all the delete manifests but I do think the spark action does support rewriting delete manifests https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteManifestsSparkAction.java#L179 Perhaps it would be better for rewrite manifests to remove the aged out deletes? At that point the work for removing the aged out deletes is more intentional in the context of table maintenance and not on the general commit path for `MergingSnapshotProducer` implementations? I'll need to think more on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org