rdblue commented on code in PR #15595:
URL: https://github.com/apache/iceberg/pull/15595#discussion_r2933086863
##########
core/src/main/java/org/apache/iceberg/rest/RESTTableScan.java:
##########
@@ -74,16 +71,15 @@ class RESTTableScan extends DataTableScan {
.build();
private final RESTClient client;
- private final Map<String, String> headers;
- private final TableOperations operations;
- private final Table table;
- private final ResourcePaths resourcePaths;
- private final TableIdentifier tableIdentifier;
- private final Set<Endpoint> supportedEndpoints;
+ private final Supplier<Map<String, String>> headers;
+ private final String planTableScanPath;
+ private final Function<String, String> planPath;
+ private final String fetchScanTasksPath;
+ private final boolean supportsAsync;
+ private final boolean supportsCancel;
+ private final boolean supportsFetchTasks;
Review Comment:
@singhpk234, wouldn't you cache the results of each request? The motivation
for breaking scan planning into multiple requests isn't just the cost of
reading manifests. It is also the cost of sending results (you could have
thousands of matching files per plan task) and caching them (if you have
thousands per request). I doubt that anyone would want to do the work of
aggregating plan task results in a cache only to have a more coarse cache that
invalidates an entire planning result rather that just parts of it.
My concern here is also motivated by being able to keep things simple. We
don't want new REST catalog features that require gracefully falling back for
every single endpoint. That's needless complexity. In this case, as we add
planning we need to document which parts are required and which ones are
optional. I think that supporting both plan and fetch is a minimum for getting
this working and we can simplify by asserting that.
@nastra, this doesn't mean the fetch endpoint actually has to do anything.
You could always return all results from plan and avoid hitting it. But we
don't need to increase the complexity of all clients.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]