rdblue commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1687182174
########## open-api/rest-catalog-open-api.yaml: ########## @@ -3642,6 +3781,173 @@ components: type: integer description: "List of equality field IDs" + PreplanTableRequest: + type: object + required: + - table-scan-context + properties: + table-scan-context: + $ref: '#/components/schemas/TableScanContext' + + PlanTableRequest: + type: object + required: + - table-scan-context + properties: + table-scan-context: + $ref: '#/components/schemas/TableScanContext' + plan-task: + $ref: '#/components/schemas/PlanTask' + stats-fields: + description: + A list of fields that the client requests the server to send statistics + in each `FileScanTask` returned in the response + type: array + items: + $ref: '#/components/schemas/FieldName' + + TableScanContext: + anyOf: + - $ref: '#/components/schemas/SnapshotScanContext' + - $ref: '#/components/schemas/IncrementalSnapshotScanContext' + + BaseTableScanContext: + discriminator: + propertyName: table-scan-type + mapping: + snapshot-scan: '#/components/schemas/SnapshotScanContext' + incremental-snapshot-scan: '#/components/schemas/IncrementalSnapshotScanContext' + type: object + required: + - table-scan-type + properties: + table-scan-type: + type: string + + SnapshotScanContext: + description: context for scanning data in a specific snapshot + type: object + allOf: + - $ref: '#/components/schemas/BaseTableScanContext' + required: + - table-scan-type + properties: + table-scan-type: + type: string + enum: ["snapshot-scan"] + select: + $ref: '#/components/schemas/SelectedFieldNames' + filter: + $ref: '#/components/schemas/Filter' + case-sensitive: + description: If field selection and filtering should be case sensitive + type: boolean + default: true + snapshot-id: + description: + The ID of the snapshot to use for the table scan. + If not specified, the snapshot at the main branch head will be used. + type: integer + format: int64 + use-snapshot-schema: + description: + If the schema of the specific snapshot should be used instead of the table schema. + type: boolean + default: false + + IncrementalSnapshotScanContext: + description: + Context for scanning data appended in a range of snapshots. + The scan always follows the schema of the snapshot at the main branch head. + type: object + allOf: + - $ref: '#/components/schemas/BaseTableScanContext' + required: + - table-scan-type + - start-snapshot-id + properties: + table-scan-type: + type: string + enum: ["incremental-snapshot-scan"] + select: + $ref: '#/components/schemas/SelectedFieldNames' + filter: + $ref: '#/components/schemas/Filter' + case-sensitive: + description: If field selection and filtering should be case sensitive + type: boolean + default: true + start-snapshot-id: + description: The ID of the starting snapshot of the incremental scan + type: integer + format: int64 + inclusive-start: + description: If the data appended in the start snapshot should be included in the scan + type: boolean + default: false + end-snapshot-id: + description: + The ID of the inclusive ending snapshot of the incremental scan. + If not specified, the snapshot at the main branch head will be used as the end snapshot. + type: integer + format: int64 + + FieldName: Review Comment: @syun64, @jackye1995, @amogh-jahagirdar, flattening with `.` is not a problem for Iceberg tables because ambiguous names like `a.b` in the example type that Sung posted are not allowed. This is how `Schema` has worked from very early on ([tests added in 2019](https://github.com/apache/iceberg/commit/8b41ee5e34b5d79e7a04b3c810d7073cfed01a3a)) and why `findField` has always used the name to ID index. The rationale for this is that there are two ways to handle conflicts like this. First, you could use escaping and have extra complexity when parsing structures. Or second, you can disallow ambiguous names so that you always have a clear mapping from the flattened string. Iceberg opted for the second option for column names so that there is no need for escaping. This has really helped simplify a lot of APIs! For instance, `Schema.select(String...)` can select multiple columns easily rather than needing to have a complicated signature like `Schema.select(List<String>...)`. A couple more things to clarify: > Flattening the name actually contrasts our approach documented in Column Projection section of the docs, where we note that a name may contain '.' but that this refers to a literal name, and not a nested field This actually does not conflict. This section is documenting what the field name in a column mapping means, and this is saying that the name string identifies an immediate child and not a further nested child. This is because we do not index sub-sections of these tree structures. Iceberg only indexes fields by name in `Schema` and not in each `StructType`, for example. > I think the reason Namespace has a principle on flat representation is because there's a GET endpoint that requires us to use a flat representation. Actually, I originally advocated for the same approach for namespaces in URLs. I think we should have used `.` as the delimiter because it is easier to work with. The trade-off is that REST catalogs would not be able to support ambiguous names, like `["a", "b"]` in the same catalog as `["a.b"]`. That's not what the community decided to go forward with, so we have escape characters when sending namespaces in URLs. But also note that namespaces in the REST spec use a JSON array of strings whenever that is possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org