ajantha-bhat commented on code in PR #13163:
URL: https://github.com/apache/iceberg/pull/13163#discussion_r2111663422


##########
core/src/main/java/org/apache/iceberg/PartitionStatsHandler.java:
##########
@@ -336,16 +336,29 @@ private static PartitionMap<PartitionStats> 
computeStatsDiff(
         Sets.newHashSet(
             SnapshotUtil.ancestorIdsBetween(
                 toSnapshot.snapshotId(), fromSnapshot.snapshotId(), 
table::snapshot));
-    Predicate<ManifestFile> manifestFilePredicate =
-        manifestFile -> snapshotIdsRange.contains(manifestFile.snapshotId());
-    return computeStats(table, toSnapshot, manifestFilePredicate, true /* 
incremental */);
+    return computeStats(table, toSnapshot, snapshotIdsRange);
   }
 
   private static PartitionMap<PartitionStats> computeStats(
-      Table table, Snapshot snapshot, Predicate<ManifestFile> predicate, 
boolean incremental) {
+      Table table, Snapshot snapshot, Set<Long> snapshotIdsRange) {
     StructType partitionType = Partitioning.partitionType(table);
-    List<ManifestFile> manifests =
-        
snapshot.allManifests(table.io()).stream().filter(predicate).collect(Collectors.toList());
+    boolean incremental = !snapshotIdsRange.isEmpty();
+
+    List<ManifestFile> manifests;
+    if (incremental) {
+      // DELETED manifest entries are not carried over to subsequent snapshots.
+      // So, for incremental computation, gather the manifests added by each 
snapshot
+      // instead of relying solely on those from the latest snapshot.
+      manifests =
+          snapshotIdsRange.stream()
+              .flatMap(
+                  id ->
+                      table.snapshot(id).allManifests(table.io()).stream()
+                          .filter(file -> file.snapshotId().equals(id)))
+              .collect(Collectors.toList());

Review Comment:
   Also note that, because of snapshot id filter, 
   Each snapshot's added manifest files will be considered only once for 
compute. So, reused manifests won't be considered again. If manifests are 
rewritten, entries will be marked as EXISTING and won't be considered for 
incremental compute from existing logic in `collectStatsForManifest`. 
   
   So, IMO it works for all the scenarios now and we have testcase to cover all 
the scenarios. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to