ajantha-bhat commented on code in PR #10246: URL: https://github.com/apache/iceberg/pull/10246#discussion_r1594265875
########## core/src/main/java/org/apache/iceberg/FastAppend.java: ########## @@ -156,6 +156,8 @@ public List<ManifestFile> apply(TableMetadata base, Snapshot snapshot) { manifests.addAll(snapshot.allManifests(ops.io())); } + manifests.forEach(summaryBuilder::addedManifestStats); Review Comment: I had a discussion with @nk1506 to understand it better. Dremio (or query engines that uses CBO) need to estimate the cost of the query plan. parallelism is one of the factor for cost estimation. Query has planning and execution phase. So, during planning phase we would like to know how many manifests exist to estimate the parallelism required for reading manifests without doing an actual IO of manifest list (as we want planning phase to be as fast as possible). We currently estimate the parallelism of data files IO by reading the stats that exist in the snapshot summary. But the stats related to manifest count is missing in snapshot summary and we are unable to estimate. Hence, the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org