Fokko commented on code in PR #10246:
URL: https://github.com/apache/iceberg/pull/10246#discussion_r1713644175
##########
core/src/main/java/org/apache/iceberg/FastAppend.java:
##########
@@ -156,6 +156,8 @@ public List<ManifestFile> apply(TableMetadata base,
Snapshot snapshot) {
manifests.addAll(snapshot.allManifests(ops.io()));
}
+ manifests.forEach(summaryBuilder::addedManifestStats);
Review Comment:
@ajantha-bhat I'm still thinking the argument of having this information
helping the query planning is quite thin. I don't think you can get away with
reading the manifest list for doing some meaningful query planning as the size
of the manifests varies wildly. Thinking of it, another issue can be that the
manifest is not live, meaning it only contains deleted manifest-entries in a
certain manifest file. You'll get all this information when you read the
manifest list.
> Agree that Having size based cost estimation will be more accurate. But
count based estimation is still better than no stats.
Everything comes at a price. The snapshots are already a substantial portion
of the table metadata, and users are already running into issues when the
number of snapshots becomes too large.
Looping in @aokolnychyi in here as well to get his opinion since he did a
lot of work on performance optimization
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]