Re: [PR] Core: Add server-side implementation of remote scan planning to RESTCatalogAdapter [iceberg]

via GitHub Mon, 10 Nov 2025 04:28:34 -0800


singhpk234 commented on code in PR #14480:
URL: https://github.com/apache/iceberg/pull/14480#discussion_r2510351259



##########
core/src/test/java/org/apache/iceberg/rest/RESTCatalogAdapter.java:
##########
@@ -429,6 +562,68 @@ public <T extends RESTResponse> T handleRequest(
     return null;
   }
 
+  /**
+   * Do all the planning upfront but batch the file scan tasks across plan 
tasks. Plan Tasks have a
+   * key like <plan ID - table UUID - plan task sequence> The current 
implementation simply uses
+   * plan tasks as a pagination mechanism to control response sizes.
+   *
+   * @param tableScan
+   * @param planId
+   */
+  private void planFilesFor(TableScan tableScan, String planId) {
+    Iterable<List<FileScanTask>> taskGroupings =
+        Iterables.partition(
+            tableScan.planFiles(), 
planningBehavior.numberFileScanTasksPerPlanTask());

Review Comment:
   > planFiles already does have 1 data file per file scan task
   
   I see, yes that makes sense thanks for code pointer, this would only 
required for planTask() which does splitting since we are relying on planFiles 
we should be good !
   
   
https://github.com/apache/iceberg/blob/c22bac8b170aa5e5c931c11c4c393c27245b20b9/core/src/main/java/org/apache/iceberg/BaseTableScan.java#L44-L49
   
   > there could be a footer/metadata cache of some sort that allows the server 
to more quickly prune out row group ranges to skip compared to the client
   
   we do capture them in the manifest and server would have the handle of 
manifest to do this splitting given the FStask `List<Long> splitOffsets();` 
which can aid the splitting even in the client side all in all would be good !
   
   Resolving this thread since this is already addressed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Core: Add server-side implementation of remote scan planning to RESTCatalogAdapter [iceberg]

Reply via email to