Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

via GitHub Fri, 03 Nov 2023 20:22:18 -0700


aokolnychyi commented on code in PR #8803:
URL: https://github.com/apache/iceberg/pull/8803#discussion_r1382325753



##########
api/src/main/java/org/apache/iceberg/Scan.java:
##########
@@ -77,6 +78,21 @@ public interface Scan<ThisT, T extends ScanTask, G extends 
ScanTaskGroup<T>> {
    */
   ThisT includeColumnStats();
 
+  /**
+   * Create a new scan from this that loads the column stats for the specific 
columns with each data
+   * file. If the columns set is empty or <code>null</code> then all column 
stats will be kept, if
+   * {@link #includeColumnStats()} is set.
+   *
+   * <p>Column stats include: value count, null value count, lower bounds, and 
upper bounds.
+   *
+   * @param columnsToKeepStats column ids from the table's schema
+   * @return a new scan based on this that loads column stats for specific 
columns.
+   */
+  default ThisT columnsToKeepStats(Set<Integer> columnsToKeepStats) {

Review Comment:
   Why not simply overload `includeColumnStats()` with a version that accepts a 
collection of columns? I feel that will be easier to interpret as folks are 
already familiar with `includeColumnStats()` that currently loads stats for all 
columns.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

Reply via email to