amogh-jahagirdar commented on code in PR #10246: URL: https://github.com/apache/iceberg/pull/10246#discussion_r1588511039
########## core/src/main/java/org/apache/iceberg/FastAppend.java: ########## @@ -156,6 +156,8 @@ public List<ManifestFile> apply(TableMetadata base, Snapshot snapshot) { manifests.addAll(snapshot.allManifests(ops.io())); } + manifests.forEach(summaryBuilder::addedManifestStats); Review Comment: @nk1506 I think the main question (at least from me) is essentially: Is information on just the total number of data or delete manifests actually helpful for being able to help with whatever planning estimation you're trying to do? As @Fokko said you'll probably want to know manifest sizes (in bytes) of the files involved in planning since there can be variance depending on the strategy used for the append. I think with just determining based on number of files (and not including sizes, which means you have to read the manifest anyways) there would be a lot of overprovisioning and underprovisioning (if you're trying to a distributed planning). But definitely open to hearing more, and especially if you are able to share any data points on how this metric helped ; that would give us some more confidence that this is useful and can generalize. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org