nk1506 commented on code in PR #10246: URL: https://github.com/apache/iceberg/pull/10246#discussion_r1592861026
########## core/src/main/java/org/apache/iceberg/FastAppend.java: ########## @@ -156,6 +156,8 @@ public List<ManifestFile> apply(TableMetadata base, Snapshot snapshot) { manifests.addAll(snapshot.allManifests(ops.io())); } + manifests.forEach(summaryBuilder::addedManifestStats); Review Comment: Thanks for the feedback. Regarding the usages of manifest counts for planning here is my feedback: 1. Having Manifest counts in advance helps to plan the parallelism. Like [spark](https://github.com/apache/iceberg/blob/ed0959257cba02f378f7097d81cecaaaef9fa43f/core/src/main/java/org/apache/iceberg/BaseDistributedDataScan.java#L149) is doing after reading from ManifestList. 2. How it will help with SnapshotSummary ? > Engine like Spark doesn't get any benefits from these stats. Since it's parallelism is dynamic with runtime in nature. > But other engines like Dremio which decides it's parallelism(during compiletime) in advance. Providing these stats will help for better parallelism. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org