jeesou commented on code in PR #11040: URL: https://github.com/apache/iceberg/pull/11040#discussion_r1861546270
########## api/src/main/java/org/apache/iceberg/Table.java: ########## @@ -373,4 +374,14 @@ default Snapshot snapshot(String name) { return null; } + + /** + * Returns the statistics file for the given snapshot id, if available. + * + * @return the {@link StatisticsFile} for the given snapshot id, if available. + */ + default Optional<StatisticsFile> statistics(long snapshotId) { Review Comment: yes @amogh-jahagirdar your suggestion is perfect, considering a generic solution where we support multiple bolb types. The current implementation is considering that we will only support the "apache-datasketches-theta-v1". We recently faced this when we were dealing with presto, considering both engines were using a common catalog, and hence the puffin file created by presto was not use-able as it was of a different blob type "presto-sum-data-size-bytes-v1". This change would be a more of a futuristic change which we may take up. Regarding the best effort search of stats @amogh-jahagirdar, I thing we need to reconsider if we want to have some statistics always, because that would depend on the amount of data added or deleted after the last time we ran and Analyze. Because stale statistics could lead to wrong query plans. And what if we let the user configure how much deviation or change is the user fine with to continue using the older statistics. For the same I had made some changes so that the user may decide the amount of change https://github.com/karuppayya/iceberg/compare/fix_snapshot...jeesou:fix_snapshot_modifications?expand=1. Kindly have a look at it @amogh-jahagirdar and @karuppayya and share your suggestions please. I have not considered the delete scenario, if i find any deletion happening I am not using old stats, but that can be up to discussion as delete is a tricky subject in this case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org