singhpk234 commented on code in PR #14480:
URL: https://github.com/apache/iceberg/pull/14480#discussion_r2510351259
##########
core/src/test/java/org/apache/iceberg/rest/RESTCatalogAdapter.java:
##########
@@ -429,6 +562,68 @@ public <T extends RESTResponse> T handleRequest(
return null;
}
+ /**
+ * Do all the planning upfront but batch the file scan tasks across plan
tasks. Plan Tasks have a
+ * key like <plan ID - table UUID - plan task sequence> The current
implementation simply uses
+ * plan tasks as a pagination mechanism to control response sizes.
+ *
+ * @param tableScan
+ * @param planId
+ */
+ private void planFilesFor(TableScan tableScan, String planId) {
+ Iterable<List<FileScanTask>> taskGroupings =
+ Iterables.partition(
+ tableScan.planFiles(),
planningBehavior.numberFileScanTasksPerPlanTask());
Review Comment:
> planFiles already does have 1 data file per file scan task
I see, yes that makes sense thanks for code pointer, this would only
required for planTask() which does splitting since we are relying on planFiles
we should be good !
https://github.com/apache/iceberg/blob/c22bac8b170aa5e5c931c11c4c393c27245b20b9/core/src/main/java/org/apache/iceberg/BaseTableScan.java#L44-L49
> there could be a footer/metadata cache of some sort that allows the server
to more quickly prune out row group ranges to skip compared to the client
we do capture them in the manifest and server would have the handle of
manifest to do this splitting given the FStask `List<Long> splitOffsets();`
which can aid the splitting even in the client side all in all would be good !
Resolving this thread since this is already addressed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]