amogh-jahagirdar commented on code in PR #9695:
URL: https://github.com/apache/iceberg/pull/9695#discussion_r1690171513
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3647,6 +3818,176 @@ components:
type: integer
description: "List of equality field IDs"
+ PreplanTableRequest:
+ type: object
+ required:
+ - table-scan-context
+ properties:
+ table-scan-context:
+ $ref: '#/components/schemas/TableScanContext'
+
+ PlanTableRequest:
+ type: object
+ required:
+ - table-scan-context
+ properties:
+ table-scan-context:
+ $ref: '#/components/schemas/TableScanContext'
+ plan-task:
+ $ref: '#/components/schemas/PlanTask'
+ stats-fields:
+ description:
+ A list of fields that the client requests the server to send
statistics
+ in each `FileScanTask` returned in the response
+ type: array
+ items:
+ $ref: '#/components/schemas/FieldName'
+
+ TableScanContext:
+ anyOf:
+ - $ref: '#/components/schemas/SnapshotScanContext'
+ - $ref: '#/components/schemas/IncrementalSnapshotScanContext'
+
+ BaseTableScanContext:
+ discriminator:
+ propertyName: type
+ mapping:
+ snapshot-scan: '#/components/schemas/SnapshotScanContext'
+ incremental-snapshot-scan:
'#/components/schemas/IncrementalSnapshotScanContext'
+ type: object
+ required:
+ - type
+ properties:
+ type:
+ type: string
+
+ SnapshotScanContext:
+ description: context for scanning data in a specific snapshot
+ type: object
+ allOf:
+ - $ref: '#/components/schemas/BaseTableScanContext'
+ required:
+ - type
+ properties:
+ type:
+ type: string
+ enum: ["snapshot-scan"]
+ select:
+ $ref: '#/components/schemas/SelectedFieldNames'
+ filter:
+ $ref: '#/components/schemas/Filter'
+ case-sensitive:
+ description: If field selection and filtering should be case
sensitive
+ type: boolean
+ default: true
+ snapshot-id:
+ description:
+ The ID of the snapshot to use for the table scan.
+ If not specified, the snapshot at the main branch head will be
used.
+ type: integer
+ format: int64
+ use-snapshot-schema:
+ description:
+ If the schema of the specific snapshot should be used instead of
the table schema.
+ type: boolean
+ default: false
+
+ IncrementalSnapshotScanContext:
+ description:
+ Context for scanning data appended in a range of snapshots.
+ The scan always follows the schema of the snapshot at the main branch
head.
+ type: object
+ allOf:
+ - $ref: '#/components/schemas/BaseTableScanContext'
+ required:
+ - type
+ - start-snapshot-id
+ properties:
+ type:
+ type: string
+ enum: ["incremental-snapshot-scan"]
+ select:
+ $ref: '#/components/schemas/SelectedFieldNames'
+ filter:
+ $ref: '#/components/schemas/Filter'
+ case-sensitive:
+ description: If field selection and filtering should be case
sensitive
+ type: boolean
+ default: true
+ start-snapshot-id:
+ description: The ID of the starting snapshot of the incremental scan
+ type: integer
+ format: int64
+ inclusive-start:
+ description: If the data appended in the start snapshot should be
included in the scan
+ type: boolean
+ default: false
+ end-snapshot-id:
+ description:
+ The ID of the inclusive ending snapshot of the incremental scan.
+ If not specified, the snapshot at the main branch head will be
used as the end snapshot.
+ type: integer
+ format: int64
+
+ FieldName:
+ description:
+ A field name that follows the Iceberg naming standard, and can be used
in APIs like
+ Java `Schema#findField(String name)`.
+
+ The nested field name follows these rules
+ - nested struct fields are named by concatenating field names at each
struct level using dot (`.`) delimiter,
+ e.g. employer.contact_info.address.zip_code
+ - nested fields in a map key are named using the keyword `key`, e.g.
employee_address_map.key.first_name
+ - nested fields in a map value are named using the keyword `value`,
e.g. employee_address_map.value.zip_code
+ - nested fields in a list are named using the keyword `element`, e.g.
employees.element.first_name
+ type: string
+
+ SelectedFieldNames:
+ description:
+ A list of fields in schema that are selected in a table scan.
+ When not specified, all columns in the requested schema should be
selected.
+ type: array
+ items:
+ $ref: '#/components/schemas/FieldName'
+
+ Filter:
Review Comment:
@rahil-c Since the residual filter will just be an expression and that
expression is optional
https://github.com/apache/iceberg/pull/9695/files#r1689067372, I actually don't
think we really need to define a default value. The benefit to having it be
optional is that clients can compute residuals on their own as an optimization
since server side computation of residuals may be too expensive.
So in brief:
If a server sends back a residual -> client should use that for filtering
data
If a server doesn't send back a residual -> client can compute the residual
on it's own or apply the original filter.
Then considering this we really don't need the extra type since it's really
just an expression at this point.
I think that's what @rdblue was getting at and I think I agree with it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]