aokolnychyi commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1382325753
########## api/src/main/java/org/apache/iceberg/Scan.java: ########## @@ -77,6 +78,21 @@ public interface Scan<ThisT, T extends ScanTask, G extends ScanTaskGroup<T>> { */ ThisT includeColumnStats(); + /** + * Create a new scan from this that loads the column stats for the specific columns with each data + * file. If the columns set is empty or <code>null</code> then all column stats will be kept, if + * {@link #includeColumnStats()} is set. + * + * <p>Column stats include: value count, null value count, lower bounds, and upper bounds. + * + * @param columnsToKeepStats column ids from the table's schema + * @return a new scan based on this that loads column stats for specific columns. + */ + default ThisT columnsToKeepStats(Set<Integer> columnsToKeepStats) { Review Comment: Why not simply overload `includeColumnStats()` with a version that accepts a collection of columns? I feel that will be easier to interpret as folks are already familiar with `includeColumnStats()` that currently loads stats for all columns. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org