Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

via GitHub Thu, 18 Jul 2024 22:32:33 -0700


amogh-jahagirdar commented on code in PR #9695:
URL: https://github.com/apache/iceberg/pull/9695#discussion_r1683757259



##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3647,6 +3786,173 @@ components:
             type: integer
           description: "List of equality field IDs"
 
+    PreplanTableRequest:
+      type: object
+      required:
+        - table-scan-context
+      properties:
+        table-scan-context:
+          $ref: '#/components/schemas/TableScanContext'
+
+    PlanTableRequest:
+      type: object
+      required:
+        - table-scan-context
+      properties:
+        table-scan-context:
+          $ref: '#/components/schemas/TableScanContext'
+        plan-task:
+          $ref: '#/components/schemas/PlanTask'
+        stats-fields:
+          description:
+            A list of fields that the client requests the server to send 
statistics
+            in each `FileScanTask` returned in the response
+          type: array
+          items:
+            $ref: '#/components/schemas/FieldName'
+
+    TableScanContext:
+      anyOf:

Review Comment:
   I see, I think it's actually quite the same case but I think technically 
`TableUpdate` may need to be `oneOf` since those refs are mutually exclusive. I 
missed the discriminator notation, so that also makes the different types 
mutually exclusive, at least logically it's the same spec. But the example in 
https://swagger.io/docs/specification/data-models/inheritance-and-polymorphism/ 
shows `oneOf` as well.
   
   I think this may be a case where it should be `oneOf` on a technicality but 
to a spec reader and a client it's still fairly clear due to the discriminator 
usage, so arguably not worth going through a whole spec change since it doesn't 
really add much value. I'm fine as it is.
   
   cc @nastra @rdblue @danielcweeks in case they had any opinions on this



##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3642,6 +3781,173 @@ components:
             type: integer
           description: "List of equality field IDs"
 
+    PreplanTableRequest:
+      type: object
+      required:
+        - table-scan-context
+      properties:
+        table-scan-context:
+          $ref: '#/components/schemas/TableScanContext'
+
+    PlanTableRequest:
+      type: object
+      required:
+        - table-scan-context
+      properties:
+        table-scan-context:
+          $ref: '#/components/schemas/TableScanContext'
+        plan-task:
+          $ref: '#/components/schemas/PlanTask'
+        stats-fields:
+          description:
+            A list of fields that the client requests the server to send 
statistics
+            in each `FileScanTask` returned in the response
+          type: array
+          items:
+            $ref: '#/components/schemas/FieldName'
+
+    TableScanContext:
+      anyOf:
+        - $ref: '#/components/schemas/SnapshotScanContext'
+        - $ref: '#/components/schemas/IncrementalSnapshotScanContext'
+
+    BaseTableScanContext:
+      discriminator:
+        propertyName: table-scan-type
+        mapping:
+          snapshot-scan: '#/components/schemas/SnapshotScanContext'
+          incremental-snapshot-scan: 
'#/components/schemas/IncrementalSnapshotScanContext'
+      type: object
+      required:
+        - table-scan-type
+      properties:
+        table-scan-type:
+          type: string
+
+    SnapshotScanContext:
+      description: context for scanning data in a specific snapshot
+      type: object
+      allOf:
+        - $ref: '#/components/schemas/BaseTableScanContext'
+      required:
+        - table-scan-type
+      properties:
+        table-scan-type:
+          type: string
+          enum: ["snapshot-scan"]
+        select:
+          $ref: '#/components/schemas/SelectedFieldNames'
+        filter:
+          $ref: '#/components/schemas/Filter'
+        case-sensitive:
+          description: If field selection and filtering should be case 
sensitive
+          type: boolean
+          default: true
+        snapshot-id:
+          description:
+            The ID of the snapshot to use for the table scan.
+            If not specified, the snapshot at the main branch head will be 
used.
+          type: integer
+          format: int64
+        use-snapshot-schema:
+          description:
+            If the schema of the specific snapshot should be used instead of 
the table schema.
+          type: boolean
+          default: false
+
+    IncrementalSnapshotScanContext:
+      description:
+        Context for scanning data appended in a range of snapshots.
+        The scan always follows the schema of the snapshot at the main branch 
head.
+      type: object
+      allOf:
+        - $ref: '#/components/schemas/BaseTableScanContext'
+      required:
+        - table-scan-type
+        - start-snapshot-id
+      properties:
+        table-scan-type:
+          type: string
+          enum: ["incremental-snapshot-scan"]
+        select:
+          $ref: '#/components/schemas/SelectedFieldNames'
+        filter:
+          $ref: '#/components/schemas/Filter'
+        case-sensitive:
+          description: If field selection and filtering should be case 
sensitive
+          type: boolean
+          default: true
+        start-snapshot-id:
+          description: The ID of the starting snapshot of the incremental scan
+          type: integer
+          format: int64
+        inclusive-start:
+          description: If the data appended in the start snapshot should be 
included in the scan
+          type: boolean
+          default: false
+        end-snapshot-id:
+          description:
+            The ID of the inclusive ending snapshot of the incremental scan.
+            If not specified, the snapshot at the main branch head will be 
used as the end snapshot.
+          type: integer
+          format: int64
+
+    FieldName:

Review Comment:
   This is a great point @syun64 , since the protocol relies on field names 
instead of field IDs (which makes sense, otherwise it's too clunky for clients) 
then we need something to distinguish nested vs fields with a ".".
   1. logical separator ".". @jackye1995 mentioned `%1f`? 
   2. Some sort of function to wrap: I don't think I'd use a quoting function 
or any function which wraps any other special characters since that's a pain 
for users with those characters.
   3. Send the nested structure as an array of arrays. This would work but I 
think makes it a bit more difficult for clients (at least more difficult than 
approach 1 in my mind)
   
   I can get behind %1f since it's a standard delimiter and we already use it 
for namespace and it seems the easiest for clients to handle. I'm also OK with 
3.
   
    I'll think about it more though if there are other options.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

Reply via email to