Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

via GitHub Tue, 23 Jul 2024 14:10:53 -0700


rdblue commented on code in PR #9695:
URL: https://github.com/apache/iceberg/pull/9695#discussion_r1687128848



##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3647,6 +3818,176 @@ components:
             type: integer
           description: "List of equality field IDs"
 
+    PreplanTableRequest:
+      type: object
+      required:
+        - table-scan-context
+      properties:
+        table-scan-context:
+          $ref: '#/components/schemas/TableScanContext'
+
+    PlanTableRequest:
+      type: object
+      required:
+        - table-scan-context
+      properties:
+        table-scan-context:
+          $ref: '#/components/schemas/TableScanContext'
+        plan-task:
+          $ref: '#/components/schemas/PlanTask'
+        stats-fields:
+          description:
+            A list of fields that the client requests the server to send 
statistics
+            in each `FileScanTask` returned in the response
+          type: array
+          items:
+            $ref: '#/components/schemas/FieldName'
+
+    TableScanContext:
+      anyOf:
+        - $ref: '#/components/schemas/SnapshotScanContext'
+        - $ref: '#/components/schemas/IncrementalSnapshotScanContext'
+
+    BaseTableScanContext:
+      discriminator:
+        propertyName: type
+        mapping:
+          snapshot-scan: '#/components/schemas/SnapshotScanContext'
+          incremental-snapshot-scan: 
'#/components/schemas/IncrementalSnapshotScanContext'
+      type: object
+      required:
+        - type
+      properties:
+        type:
+          type: string
+
+    SnapshotScanContext:
+      description: context for scanning data in a specific snapshot
+      type: object
+      allOf:
+        - $ref: '#/components/schemas/BaseTableScanContext'
+      required:
+        - type

Review Comment:
   Right now, we don't require `snapshot-id`. The reason I suggested that 
originally was to avoid requiring an extra call to the service to load the 
table to get the snapshot. However, I think that not requiring it can lead to 
some odd behavior when the client and server are out of sync with what the 
latest snapshot is.
   
   The client might think (and could possibly log) that it is reading the 
latest snapshot it knows about, `s1`, and the service may know about a newer 
snapshot, `s2`, and plan with that instead. That could lead to a confusing 
situation where a user may not know what version of a table was actually read.
   
   That kind of problem is avoided by requiring a specific snapshot ID in the 
request, but because snapshot IDs are unique it would require that the client 
has loaded the table metadata. I think that is okay and I would rather not have 
cases where the client and service can get out of sync. What do you think, 
@rahil-c, @amogh-jahagirdar, @danielcweeks?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

Reply via email to