Fokko commented on code in PR #10246: URL: https://github.com/apache/iceberg/pull/10246#discussion_r1713644175
########## core/src/main/java/org/apache/iceberg/FastAppend.java: ########## @@ -156,6 +156,8 @@ public List<ManifestFile> apply(TableMetadata base, Snapshot snapshot) { manifests.addAll(snapshot.allManifests(ops.io())); } + manifests.forEach(summaryBuilder::addedManifestStats); Review Comment: @ajantha-bhat I'm still thinking the argument of having this information helping the query planning is quite thin. I don't think you can get away with reading the manifest list for doing some meaningful query planning as the size of the manifests varies wildly. Thinking of it, another issue can be that the manifest is not live, meaning it only contains deleted manifest-entries in a certain manifest file. You'll get all this information when you read the manifest list. > Agree that Having size based cost estimation will be more accurate. But count based estimation is still better than no stats. Everything comes at a price. The snapshots are already a substantial portion of the table metadata, and users are already running into issues when the number of snapshots becomes too large. Looping in @aokolnychyi in here as well to get his opinion since he did a lot of work on performance optimization -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org