stevenzwu commented on PR #8803:
URL: https://github.com/apache/iceberg/pull/8803#issuecomment-1768776022

   > > @pvary I think we probably want to push the `copyStatsForColumns` down 
to ManifestReader. 
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestReader.java#L299
   > 
   > That is for reading the data from the manifest file. If we want at 
statistics for at least one column, then the manifest file reading schema 
should contain the stat fields, like:
   > 
   > ```
   >   private static final Set<String> STATS_COLUMNS =
   >       ImmutableSet.of(
   >           "value_counts",
   >           "null_value_counts",
   >           "nan_value_counts",
   >           "lower_bounds",
   >           "upper_bounds",
   >           "record_count");
   > ```
   > 
   > So we can not do filtering here. We need to read the stat fields from the 
manifest file, and then filter later for columns where we do not need it.
   
   If we look at this line
   
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestReader.java#L299
   
   it calls this method from `ContentFile`
   ```
     default F copy(boolean withStats) {
       return withStats ? copy() : copyWithoutStats();
     }
   ```
   
   if we push down the selection to the ManifestReader, it can call the new 
`copyWithSpecificStats` method that you added in this PR.
   
   I understand the current code is for metadata column selection/projection, 
not the columns selected to include stats


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to