ajantha-bhat commented on code in PR #10246:
URL: https://github.com/apache/iceberg/pull/10246#discussion_r1594265875


##########
core/src/main/java/org/apache/iceberg/FastAppend.java:
##########
@@ -156,6 +156,8 @@ public List<ManifestFile> apply(TableMetadata base, 
Snapshot snapshot) {
       manifests.addAll(snapshot.allManifests(ops.io()));
     }
 
+    manifests.forEach(summaryBuilder::addedManifestStats);

Review Comment:
   I had a discussion with @nk1506 to understand it better. 
   
   Dremio (or query engines that uses CBO) need to estimate the cost of the 
query plan. 
   parallelism is one of the factor for cost estimation. 
   
   Query has planning and execution phase. 
   So, during planning phase we would like to know how many manifests exist to 
estimate the parallelism required for reading manifests without doing an actual 
IO of manifest list (as we want planning phase to be as fast as possible). 
   We currently estimate the parallelism of data files IO by reading the stats 
that exist in the snapshot summary. But the stats related to manifest count is 
missing in snapshot summary and we are unable to estimate. Hence, the PR. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to