rdblue commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1687128848
########## open-api/rest-catalog-open-api.yaml: ########## @@ -3647,6 +3818,176 @@ components: type: integer description: "List of equality field IDs" + PreplanTableRequest: + type: object + required: + - table-scan-context + properties: + table-scan-context: + $ref: '#/components/schemas/TableScanContext' + + PlanTableRequest: + type: object + required: + - table-scan-context + properties: + table-scan-context: + $ref: '#/components/schemas/TableScanContext' + plan-task: + $ref: '#/components/schemas/PlanTask' + stats-fields: + description: + A list of fields that the client requests the server to send statistics + in each `FileScanTask` returned in the response + type: array + items: + $ref: '#/components/schemas/FieldName' + + TableScanContext: + anyOf: + - $ref: '#/components/schemas/SnapshotScanContext' + - $ref: '#/components/schemas/IncrementalSnapshotScanContext' + + BaseTableScanContext: + discriminator: + propertyName: type + mapping: + snapshot-scan: '#/components/schemas/SnapshotScanContext' + incremental-snapshot-scan: '#/components/schemas/IncrementalSnapshotScanContext' + type: object + required: + - type + properties: + type: + type: string + + SnapshotScanContext: + description: context for scanning data in a specific snapshot + type: object + allOf: + - $ref: '#/components/schemas/BaseTableScanContext' + required: + - type Review Comment: Right now, we don't require `snapshot-id`. The reason I suggested that originally was to avoid requiring an extra call to the service to load the table to get the snapshot. However, I think that not requiring it can lead to some odd behavior when the client and server are out of sync with what the latest snapshot is. The client might think (and could possibly log) that it is reading the latest snapshot it knows about, `s1`, and the service may know about a newer snapshot, `s2`, and plan with that instead. That could lead to a confusing situation where a user may not know what version of a table was actually read. That kind of problem is avoided by requiring a specific snapshot ID in the request, but because snapshot IDs are unique it would require that the client has loaded the table metadata. I think that is okay and I would rather not have cases where the client and service can get out of sync. What do you think, @rahil-c, @amogh-jahagirdar, @danielcweeks? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org