rdblue commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1645215603
########## open-api/rest-catalog-open-api.yaml: ########## @@ -2838,6 +2988,69 @@ components: additionalProperties: type: string + PreplanTableRequest: + type: object + required: + - select + - filter + properties: + select: + description: A list of the selected column names. If referencing a column with a nested field, ensure full name is joined by `.` such as `a.b` + type: array + items: + type: string + filter: + $ref: '#/components/schemas/Expression' + case-sensitive: + description: Indicates whether column selection and filtering should be case sensitive + type: boolean + default: true + snapshot-id: + description: Defaults to the table's current snapshot; This option is not allowed when `snapshot-range` is present in the request. + type: integer + format: int64 + snapshot-range: + description: A JSON list containing exactly 2 snapshot IDs representing the start (exclusive) and end (inclusive) snapshots. This option is not allowed when `snapshot-id` is present in the request. + type: array + items: + type: integer + format: int64 + + PlanTableRequest: + type: object + required: + - select + properties: + select: + description: A list of the selected column names + type: array + items: + type: string + filter: + $ref: '#/components/schemas/Expression' + case-sensitive: + description: Indicates whether column selection and filtering should be case sensitive + type: boolean + default: true + stats-fields: + description: A list of field names for which stats should be included + type: array + items: + type: string + plan-task: + $ref: '#/components/schemas/PlanTask' + snapshot-id: Review Comment: @danielcweeks, this was my suggestion because I think it makes the API simpler. Rather than having several different options and having a more complicated spec, we can collapse all of those into the snapshot the client actually resolves. That also avoids cases where the client ends up reading a different snapshot than it thinks is current. ########## open-api/rest-catalog-open-api.yaml: ########## @@ -3789,6 +4003,21 @@ components: EmptyResponse: $ref: '#/components/examples/ListNamespacesEmptyExample' + PayloadTooLargeWithPlanResponse: + description: + Payload is too large. Required to use /preplan before /plan + content: + application/json: + schema: + $ref: '#/components/schemas/IcebergErrorResponse' + example: { + "error": { + "message": " Cannot plan scan: too many task results; use /preplan to split planning across multiple requests", + "type": "PayloadTooLargeWithPlanException", + "code": 413 Review Comment: I suggested either 413 or 422, but you're right. 422 is the better option. ########## open-api/rest-catalog-open-api.yaml: ########## @@ -532,6 +532,100 @@ paths: 5XX: $ref: '#/components/responses/ServerErrorResponse' + /v1/{prefix}/namespaces/{namespace}/tables/{table}/preplan: + parameters: + - $ref: '#/components/parameters/prefix' + - $ref: '#/components/parameters/namespace' + - $ref: '#/components/parameters/table' + post: + tags: + - Catalog API + summary: Find plan-tasks based on a plan context. + description: + When a user submits a query, this operation will find the relevant plan-tasks + based on the user's selected columns, and filters. The plan-tasks can be later used during PlanTable + to distribute this work for performance gain. + operationId: PreplanTable + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/PreplanTableRequest' + responses: + 200: + $ref: '#/components/responses/PreplanTableResponse' + 400: + $ref: '#/components/responses/BadRequestErrorResponse' + 401: + $ref: '#/components/responses/UnauthorizedResponse' + 403: + $ref: '#/components/responses/ForbiddenResponse' + 404: + description: + Not Found + - NoSuchTableException, the table does not exist + - NoSuchNamespaceException, the namespace does not exist + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorModel' + examples: + TableDoesNotExist: + $ref: '#/components/examples/NoSuchTableError' + NamespaceDoesNotExist: + $ref: '#/components/examples/NoSuchNamespaceError' + 419: + $ref: '#/components/responses/AuthenticationTimeoutResponse' + 503: + $ref: '#/components/responses/ServiceUnavailableResponse' + 5XX: + $ref: '#/components/responses/ServerErrorResponse' + + /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan/: + parameters: + - $ref: '#/components/parameters/prefix' + - $ref: '#/components/parameters/namespace' + - $ref: '#/components/parameters/table' + post: + tags: + - Catalog API + summary: Initiate plan on table to find relevant file scan tasks for a specific query. + operationId: PlanTable + description: + When a user submits a query, we can do a plan to find the relevant file scan tasks + based on the user's selected columns, and filters. For enhanced query performance, + users can provide a plan-task which will distribute the planning work. + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/PlanTableRequest' + responses: Review Comment: This is relevant for a [comment](https://github.com/apache/iceberg/pull/9695#discussion_r1641334986) from @danielcweeks below. Looks like 413 isn't the right response because it refers to the size of the request. ########## open-api/rest-catalog-open-api.yaml: ########## @@ -2838,6 +2978,63 @@ components: additionalProperties: type: string + PreplanTableRequest: + type: object + required: + - select + - filter + - case-sensitive + properties: + select: + description: A list of the selected column names + type: array + items: + type: string + filter: + $ref: '#/components/schemas/Expression' + case-sensitive: + description: Indicates whether column selection and filtering should be case sensitive + type: boolean + snapshot-id: + description: an int64 snapshot ID (if snapshot-range is not present); optional and defaults to the table's current snapshot + type: integer + format: int64 + snapshot-range: + description: a JSON list containing exactly 2 int64 snapshot IDs (if snapshot-id is not present) representing the start (defaults to exclusive, see `start-exclusive`) and end (only inclusive) snapshots + type: array + items: + type: integer + format: int64 + start-exclusive: + description: Indicates whether the user wants the start of `snapshot-range` to be inclusive or exclusive + type: boolean + default: true + + PlanTableRequest: + type: object + required: + - select + - filter + - case-sensitive + properties: + select: + description: A list of the selected column names Review Comment: Look good. Thank you! ########## open-api/rest-catalog-open-api.yaml: ########## @@ -2838,6 +2988,69 @@ components: additionalProperties: type: string + PreplanTableRequest: + type: object + required: + - select + - filter + properties: + select: + description: A list of the selected column names. If referencing a column with a nested field, ensure full name is joined by `.` such as `a.b` + type: array + items: + type: string + filter: + $ref: '#/components/schemas/Expression' + case-sensitive: + description: Indicates whether column selection and filtering should be case sensitive + type: boolean + default: true + snapshot-id: + description: Defaults to the table's current snapshot; This option is not allowed when `snapshot-range` is present in the request. + type: integer + format: int64 + snapshot-range: + description: A JSON list containing exactly 2 snapshot IDs representing the start (exclusive) and end (inclusive) snapshots. This option is not allowed when `snapshot-id` is present in the request. + type: array + items: + type: integer + format: int64 + + PlanTableRequest: + type: object + required: + - select + properties: + select: + description: A list of the selected column names + type: array + items: + type: string + filter: + $ref: '#/components/schemas/Expression' + case-sensitive: + description: Indicates whether column selection and filtering should be case sensitive + type: boolean + default: true + stats-fields: + description: A list of field names for which stats should be included + type: array + items: + type: string + plan-task: + $ref: '#/components/schemas/PlanTask' + snapshot-id: + description: defaults to the table's current snapshot. This option is not allowed when `snapshot-range` or 'plan-task` is present in the request. + type: integer + format: int64 + + snapshot-range: + description: A JSON list containing exactly 2 snapshot IDs representing the start (exclusive) and end (inclusive) snapshots. This option is not allowed when `snapshot-id` or `plan-task` is present in the request. + type: array Review Comment: This is something we could do. I suggested the range array but I called out that we may just want to have start and end: https://github.com/apache/iceberg/pull/9695#discussion_r1494914340 If you prefer the other option then let's go with that. ########## open-api/rest-catalog-open-api.yaml: ########## @@ -2838,6 +2988,69 @@ components: additionalProperties: type: string + PreplanTableRequest: + type: object + required: + - select + - filter + properties: + select: + description: A list of the selected column names. If referencing a column with a nested field, ensure full name is joined by `.` such as `a.b` + type: array + items: + type: string + filter: + $ref: '#/components/schemas/Expression' + case-sensitive: + description: Indicates whether column selection and filtering should be case sensitive + type: boolean + default: true + snapshot-id: + description: Defaults to the table's current snapshot; This option is not allowed when `snapshot-range` is present in the request. + type: integer + format: int64 + snapshot-range: + description: A JSON list containing exactly 2 snapshot IDs representing the start (exclusive) and end (inclusive) snapshots. This option is not allowed when `snapshot-id` is present in the request. + type: array + items: + type: integer + format: int64 + + PlanTableRequest: + type: object + required: + - select + properties: + select: + description: A list of the selected column names + type: array + items: + type: string + filter: + $ref: '#/components/schemas/Expression' + case-sensitive: + description: Indicates whether column selection and filtering should be case sensitive + type: boolean + default: true + stats-fields: + description: A list of field names for which stats should be included + type: array + items: + type: string + plan-task: + $ref: '#/components/schemas/PlanTask' + snapshot-id: Review Comment: Here's the context: https://github.com/apache/iceberg/pull/9695#discussion_r1494914340 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org