[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #2276: Core: Add option to combine tasks by partition

GitBox Wed, 19 Oct 2022 17:07:19 -0700


aokolnychyi commented on code in PR #2276:
URL: https://github.com/apache/iceberg/pull/2276#discussion_r1000028349



##########
api/src/main/java/org/apache/iceberg/Scan.java:
##########
@@ -129,6 +129,34 @@ default ThisT select(String... columns) {
    */
   ThisT planWith(ExecutorService executorService);
 
+  /**
+   * Create a new {@link TableScan} which dictates that when plan tasks via 
the {@link
+   * #planTasks()}, the scan should preserve partition boundary specified by 
the provided partition
+   * column names. In other words, the scan will not attempt to combine tasks 
whose input files have
+   * different partition data w.r.t `columns`.
+   *
+   * @param columns the partition column names to preserve boundary when 
planning tasks
+   * @return a table scan preserving partition boundary when planning tasks
+   * @throws IllegalArgumentException if any of the input columns is not a 
partition column, or if
+   *     the table is un-partitioned.
+   */
+  ThisT preservePartitions(Collection<String> columns);
+
+  /**
+   * Create a new {@link TableScan} which dictates that when plan tasks via 
the {@link
+   * #planTasks()}, the scan should preserve partition boundary specified by 
the provided partition

Review Comment:
   > I think the partition expressions will be first processed via 
SparkScanBuilder (via a new interface to be introduced by Spark) and the actual 
partition columns will be passed to this method.
   
   Don't we need an already built `Scan` to obtain `KeyGroupedPartitioning`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #2276: Core: Add option to combine tasks by partition

Reply via email to