ajantha-bhat commented on code in PR #13163: URL: https://github.com/apache/iceberg/pull/13163#discussion_r2111647311
########## core/src/main/java/org/apache/iceberg/PartitionStatsHandler.java: ########## @@ -336,16 +336,29 @@ private static PartitionMap<PartitionStats> computeStatsDiff( Sets.newHashSet( SnapshotUtil.ancestorIdsBetween( toSnapshot.snapshotId(), fromSnapshot.snapshotId(), table::snapshot)); - Predicate<ManifestFile> manifestFilePredicate = - manifestFile -> snapshotIdsRange.contains(manifestFile.snapshotId()); - return computeStats(table, toSnapshot, manifestFilePredicate, true /* incremental */); + return computeStats(table, toSnapshot, snapshotIdsRange); } private static PartitionMap<PartitionStats> computeStats( - Table table, Snapshot snapshot, Predicate<ManifestFile> predicate, boolean incremental) { + Table table, Snapshot snapshot, Set<Long> snapshotIdsRange) { StructType partitionType = Partitioning.partitionType(table); - List<ManifestFile> manifests = - snapshot.allManifests(table.io()).stream().filter(predicate).collect(Collectors.toList()); + boolean incremental = !snapshotIdsRange.isEmpty(); + + List<ManifestFile> manifests; + if (incremental) { + // DELETED manifest entries are not carried over to subsequent snapshots. + // So, for incremental computation, gather the manifests added by each snapshot + // instead of relying solely on those from the latest snapshot. + manifests = + snapshotIdsRange.stream() + .flatMap( + id -> + table.snapshot(id).allManifests(table.io()).stream() + .filter(file -> file.snapshotId().equals(id))) + .collect(Collectors.toList()); Review Comment: I also checked that if snapshots are expired, we cannot find previous stats for the table in the caller. So, it will fallback to full compute. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org