jasonf20 commented on code in PR #10962: URL: https://github.com/apache/iceberg/pull/10962#discussion_r1862492365
########## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ########## @@ -363,6 +363,10 @@ private ManifestFile filterManifest( } private boolean canContainDeletedFiles(ManifestFile manifest, boolean trustManifestReferences) { + if (manifest.minSequenceNumber() > 0 && manifest.minSequenceNumber() < minSequenceNumber) { + return true; + } Review Comment: If that's the case then doesn't the last condition here do nothing since it can only reach this check if one of the earlier checks was true anyway: ```java deletePaths.contains(file.location()) || deleteFiles.contains(file) || dropPartitions.contains(file.specId(), file.partition()) || (isDelete && entry.isLive() && entry.dataSequenceNumber() > 0 && entry.dataSequenceNumber() < minSequenceNumber); ``` It seems like perhaps `dropDeleteFilesOlderThan` has no affect anymore (unless `allDeletesReferenceManifests` gets set to false or something). I think not removing by `minSequenceNumber` leaves undeleted delete files that just never get applied to any files at query time, so it's not the end of the world, but it does lead to some wasted storage and slightly longer scan planning times. Assuming we want to keep this behaviour perhaps we should just not use `dropDeleteFilesOlderThan` anymore in `mergingSnapshotProducer` then? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org