stevenzwu commented on PR #8803: URL: https://github.com/apache/iceberg/pull/8803#issuecomment-1768776022
> > @pvary I think we probably want to push the `copyStatsForColumns` down to ManifestReader. https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestReader.java#L299 > > That is for reading the data from the manifest file. If we want at statistics for at least one column, then the manifest file reading schema should contain the stat fields, like: > > ``` > private static final Set<String> STATS_COLUMNS = > ImmutableSet.of( > "value_counts", > "null_value_counts", > "nan_value_counts", > "lower_bounds", > "upper_bounds", > "record_count"); > ``` > > So we can not do filtering here. We need to read the stat fields from the manifest file, and then filter later for columns where we do not need it. If we look at this line https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestReader.java#L299 it calls this method from `ContentFile` ``` default F copy(boolean withStats) { return withStats ? copy() : copyWithoutStats(); } ``` if we push down the selection to the ManifestReader, it can call the new `copyWithSpecificStats` method that you added in this PR. I understand the current code is for metadata column selection/projection, not the columns selected to include stats -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org