ajantha-bhat commented on code in PR #13163:
URL: https://github.com/apache/iceberg/pull/13163#discussion_r2111647311


##########
core/src/main/java/org/apache/iceberg/PartitionStatsHandler.java:
##########
@@ -336,16 +336,29 @@ private static PartitionMap<PartitionStats> 
computeStatsDiff(
         Sets.newHashSet(
             SnapshotUtil.ancestorIdsBetween(
                 toSnapshot.snapshotId(), fromSnapshot.snapshotId(), 
table::snapshot));
-    Predicate<ManifestFile> manifestFilePredicate =
-        manifestFile -> snapshotIdsRange.contains(manifestFile.snapshotId());
-    return computeStats(table, toSnapshot, manifestFilePredicate, true /* 
incremental */);
+    return computeStats(table, toSnapshot, snapshotIdsRange);
   }
 
   private static PartitionMap<PartitionStats> computeStats(
-      Table table, Snapshot snapshot, Predicate<ManifestFile> predicate, 
boolean incremental) {
+      Table table, Snapshot snapshot, Set<Long> snapshotIdsRange) {
     StructType partitionType = Partitioning.partitionType(table);
-    List<ManifestFile> manifests =
-        
snapshot.allManifests(table.io()).stream().filter(predicate).collect(Collectors.toList());
+    boolean incremental = !snapshotIdsRange.isEmpty();
+
+    List<ManifestFile> manifests;
+    if (incremental) {
+      // DELETED manifest entries are not carried over to subsequent snapshots.
+      // So, for incremental computation, gather the manifests added by each 
snapshot
+      // instead of relying solely on those from the latest snapshot.
+      manifests =
+          snapshotIdsRange.stream()
+              .flatMap(
+                  id ->
+                      table.snapshot(id).allManifests(table.io()).stream()
+                          .filter(file -> file.snapshotId().equals(id)))
+              .collect(Collectors.toList());

Review Comment:
   I also checked that if snapshots are expired, we cannot find previous stats 
for the table. 
   So, it will fallback to full compute. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to