Re: [PR] Core: Simplify RESTTableScan by removing catalog internals [iceberg]

via GitHub Fri, 13 Mar 2026 11:33:20 -0700


rdblue commented on code in PR #15595:
URL: https://github.com/apache/iceberg/pull/15595#discussion_r2933086863



##########
core/src/main/java/org/apache/iceberg/rest/RESTTableScan.java:
##########
@@ -74,16 +71,15 @@ class RESTTableScan extends DataTableScan {
           .build();
 
   private final RESTClient client;
-  private final Map<String, String> headers;
-  private final TableOperations operations;
-  private final Table table;
-  private final ResourcePaths resourcePaths;
-  private final TableIdentifier tableIdentifier;
-  private final Set<Endpoint> supportedEndpoints;
+  private final Supplier<Map<String, String>> headers;
+  private final String planTableScanPath;
+  private final Function<String, String> planPath;
+  private final String fetchScanTasksPath;
+  private final boolean supportsAsync;
+  private final boolean supportsCancel;
+  private final boolean supportsFetchTasks;

Review Comment:
   @singhpk234, wouldn't you cache the results of each request? The motivation 
for breaking scan planning into multiple requests isn't just the cost of 
reading manifests. It is also the cost of sending results (you could have 
thousands of matching files per plan task) and caching them (if you have 
thousands per request). I doubt that anyone would want to do the work of 
aggregating plan task results in a cache only to have a more coarse cache that 
invalidates an entire planning result rather that just parts of it.
   
   My concern here is also motivated by being able to keep things simple. We 
don't want new REST catalog features that require gracefully falling back for 
every single endpoint. That's needless complexity. In this case, as we add 
planning we need to document which parts are required and which ones are 
optional. I think that supporting both plan and fetch is a minimum for getting 
this working and we can simplify by asserting that.
   
   @nastra, this doesn't mean the fetch endpoint actually has to do anything. 
You could always return all results from plan and avoid hitting it. But we 
don't need to increase the complexity of all clients.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Core: Simplify RESTTableScan by removing catalog internals [iceberg]

Reply via email to