amogh-jahagirdar commented on code in PR #11040: URL: https://github.com/apache/iceberg/pull/11040#discussion_r1769162105
########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java: ########## @@ -194,9 +195,9 @@ protected Statistics estimateStatistics(Snapshot snapshot) { Map<NamedReference, ColumnStatistics> colStatsMap = Collections.emptyMap(); if (readConf.reportColumnStats() && cboEnabled) { colStatsMap = Maps.newHashMap(); - List<StatisticsFile> files = table.statisticsFiles(); - if (!files.isEmpty()) { - List<BlobMetadata> metadataList = (files.get(0)).blobMetadata(); + Optional<StatisticsFile> statisticsFile = statisticsFile(snapshot); + if (statisticsFile.isPresent()) { + List<BlobMetadata> metadataList = statisticsFile.get().blobMetadata(); Review Comment: @karuppayya Sorry for missing this earlier, I think we may want to consider a table API for resolving a statistics file based on a snapshot, `statisticsFileFor`. The implementation of that API could just do a best effort search of the statistics file for a given snapshot, and if one cannot be found just return the most recent one. If an engine integration needs the exact statistics and the API response isn't it, that's OK since the engine can then just ignore the statistics file. But i think in the most common cases, having an out of date statistics file is probably acceptable and so the API should probably default to the best effort lookup. This is analagous to what happens in view.dialectFor API where a best effort for a given dialect is searched but if one cannot be found the first representation is returned. Engines like Trino which require the strict dialect can use the API response and compare against the desired and fail accordingly. Other engines like Spark don't do the strict lookup and just take the response as is. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org