Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

via GitHub Wed, 24 Jul 2024 10:42:56 -0700


amogh-jahagirdar commented on code in PR #9695:
URL: https://github.com/apache/iceberg/pull/9695#discussion_r1690203265



##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3647,6 +3818,176 @@ components:
             type: integer
           description: "List of equality field IDs"
 
+    PreplanTableRequest:
+      type: object
+      required:
+        - table-scan-context
+      properties:
+        table-scan-context:
+          $ref: '#/components/schemas/TableScanContext'
+
+    PlanTableRequest:
+      type: object
+      required:
+        - table-scan-context
+      properties:
+        table-scan-context:
+          $ref: '#/components/schemas/TableScanContext'
+        plan-task:
+          $ref: '#/components/schemas/PlanTask'
+        stats-fields:
+          description:
+            A list of fields that the client requests the server to send 
statistics
+            in each `FileScanTask` returned in the response
+          type: array
+          items:
+            $ref: '#/components/schemas/FieldName'
+
+    TableScanContext:
+      anyOf:
+        - $ref: '#/components/schemas/SnapshotScanContext'
+        - $ref: '#/components/schemas/IncrementalSnapshotScanContext'
+
+    BaseTableScanContext:
+      discriminator:
+        propertyName: type
+        mapping:
+          snapshot-scan: '#/components/schemas/SnapshotScanContext'
+          incremental-snapshot-scan: 
'#/components/schemas/IncrementalSnapshotScanContext'
+      type: object
+      required:
+        - type
+      properties:
+        type:
+          type: string
+
+    SnapshotScanContext:
+      description: context for scanning data in a specific snapshot
+      type: object
+      allOf:
+        - $ref: '#/components/schemas/BaseTableScanContext'
+      required:
+        - type
+      properties:
+        type:
+          type: string
+          enum: ["snapshot-scan"]
+        select:
+          $ref: '#/components/schemas/SelectedFieldNames'
+        filter:
+          $ref: '#/components/schemas/Filter'
+        case-sensitive:
+          description: If field selection and filtering should be case 
sensitive
+          type: boolean
+          default: true
+        snapshot-id:
+          description:
+            The ID of the snapshot to use for the table scan.
+            If not specified, the snapshot at the main branch head will be 
used.
+          type: integer
+          format: int64
+        use-snapshot-schema:
+          description:
+            If the schema of the specific snapshot should be used instead of 
the table schema.
+          type: boolean
+          default: false
+
+    IncrementalSnapshotScanContext:
+      description:
+        Context for scanning data appended in a range of snapshots.
+        The scan always follows the schema of the snapshot at the main branch 
head.
+      type: object
+      allOf:
+        - $ref: '#/components/schemas/BaseTableScanContext'
+      required:
+        - type
+        - start-snapshot-id
+      properties:
+        type:
+          type: string
+          enum: ["incremental-snapshot-scan"]
+        select:
+          $ref: '#/components/schemas/SelectedFieldNames'
+        filter:
+          $ref: '#/components/schemas/Filter'
+        case-sensitive:
+          description: If field selection and filtering should be case 
sensitive
+          type: boolean
+          default: true
+        start-snapshot-id:
+          description: The ID of the starting snapshot of the incremental scan
+          type: integer
+          format: int64
+        inclusive-start:
+          description: If the data appended in the start snapshot should be 
included in the scan
+          type: boolean
+          default: false
+        end-snapshot-id:
+          description:
+            The ID of the inclusive ending snapshot of the incremental scan.
+            If not specified, the snapshot at the main branch head will be 
used as the end snapshot.
+          type: integer
+          format: int64
+
+    FieldName:
+      description:
+        A field name that follows the Iceberg naming standard, and can be used 
in APIs like
+        Java `Schema#findField(String name)`.
+
+        The nested field name follows these rules
+        - nested struct fields are named by concatenating field names at each 
struct level using dot (`.`) delimiter,
+        e.g. employer.contact_info.address.zip_code
+        - nested fields in a map key are named using the keyword `key`, e.g. 
employee_address_map.key.first_name
+        - nested fields in a map value are named using the keyword `value`, 
e.g. employee_address_map.value.zip_code
+        - nested fields in a list are named using the keyword `element`, e.g. 
employees.element.first_name
+      type: string
+
+    SelectedFieldNames:
+      description:
+        A list of fields in schema that are selected in a table scan.
+        When not specified, all columns in the requested schema should be 
selected.
+      type: array
+      items:
+        $ref: '#/components/schemas/FieldName'
+
+    Filter:
+      description:
+        an unbounded expression to describe the filters to apply to a table 
scan, 
+        default to `TrueExpression` meaning that nothing is filtered.
+      allOf:
+        - $ref: '#/components/schemas/Expression'
+      default: { "type": "true" }
+
+    PlanTask:
+      description:
+        An opaque JSON object that contains information provided by the REST 
server
+        to be utilized by clients for distributed table scan planning; should 
be supplied
+        as input in `PlanTable` operation.
+      type: object
+
+    FileScanTask:
+      type: object
+      required:
+        - data-file
+        - residual-filter
+      properties:
+        data-file:
+          $ref: '#/components/schemas/DataFile'
+        position-delete-files:

Review Comment:
   @rahil-c Essentially yes! The only thing I'd clarify is that in your first 
code snippet, DeleteFile should be a oneOf Positional/Equality delete files. 
Then later on if a new delete file type comes along, all that needs to change 
is it gets added to that list and as @rdblue said, a client can choose whether 
or not it can handle that new delete file type.
   
   The reference idea is something I didn't think of, but I really agree with 
it. It makes the response overhead much lighter weight, and opens the door for 
certain optimizations. I'd probably take a stance that it's not ID's but rather 
indices into the array of delete files. That seems like a simpler protocol and 
also avoids having to get into mappings/the type of the IDs, etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

Reply via email to