jackye1995 commented on code in PR #9695:
URL: https://github.com/apache/iceberg/pull/9695#discussion_r1623316927


##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -537,6 +537,124 @@ paths:
         5XX:
           $ref: '#/components/responses/ServerErrorResponse'
 
+  /v1/{prefix}/namespaces/{namespace}/tables/{table}/preplan:
+    parameters:
+      - $ref: '#/components/parameters/prefix'
+      - $ref: '#/components/parameters/namespace'
+      - $ref: '#/components/parameters/table'
+    post:
+      tags:
+        - Catalog API
+      summary: Prepare a list of tasks that can be used to distribute table 
scan planning
+      description:
+        Prepare a list of tasks that can be used to distribute table scan 
planning based on a set of table scan criteria 
+        such as selected columns, filters, snapshot range, case sensitivity, 
etc.
+        
+        This API returns a list of `plan-task`s, and each of them can be used 
in the `PlanTable` API
+        to request a subset of all file scan tasks in a table scan.
+        This mechanism allows clients to distribute and parallelize the entire 
table scan planning process.
+      operationId: PreplanTable
+      requestBody:
+        content:
+          application/json:
+            schema:
+              $ref: '#/components/schemas/PreplanTableRequest'
+      responses:
+        200:
+          $ref: '#/components/responses/PreplanTableResponse'
+        400:
+          $ref: '#/components/responses/BadRequestErrorResponse'
+        401:
+          $ref: '#/components/responses/UnauthorizedResponse'
+        403:
+          $ref: '#/components/responses/ForbiddenResponse'
+        404:
+          description:
+            Not Found
+            - NoSuchTableException, the table does not exist
+            - NoSuchNamespaceException, the namespace does not exist
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/ErrorModel'
+              examples:
+                TableDoesNotExist:
+                  $ref: '#/components/examples/NoSuchTableError'
+                NamespaceDoesNotExist:
+                  $ref: '#/components/examples/NoSuchNamespaceError'
+        406:
+          $ref: '#/components/responses/UnsupportedOperationResponse'
+        419:
+          $ref: '#/components/responses/AuthenticationTimeoutResponse'
+        503:
+          $ref: '#/components/responses/ServiceUnavailableResponse'
+        5XX:
+          $ref: '#/components/responses/ServerErrorResponse'
+
+  /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan:
+    parameters:
+      - $ref: '#/components/parameters/prefix'
+      - $ref: '#/components/parameters/namespace'
+      - $ref: '#/components/parameters/table'
+    post:
+      tags:
+        - Catalog API
+      summary: Perform scan planning against a table
+      operationId: PlanTable
+      description:
+        Perform scan planning against a table based on a set of table scan 
criteria such as selected columns, filters, 
+        snapshot range, case sensitivity, etc.
+        
+        An optional `plan-task` can be provided to request only a subset of 
file scan tasks.
+        The `plan-task` can be retrieved by invoking the `PreplanTable` 
endpoint.
+      
+        If preplanning using the `PreplanTable` endpoint is required before 
hitting this endpoint but the client fails 
+        to supply a `plan-task` in the request, then a `421 Misdirected 
Request` response should be returned to 
+        indicate this requirement.
+      
+        If planning a table scan produces too many file scan tasks and the 
server is unable to return them within its
+        response size limit, then a `422 Unprocessable Content` response 
should be returned to indicate that the client 
+        should first attempt to preplan the specific table scan to distribute 
the planning process and make the content
+        processable by the server.
+      requestBody:
+        content:
+          application/json:
+            schema:
+              $ref: '#/components/schemas/PlanTableRequest'
+      responses:
+        200:
+          $ref: '#/components/responses/PlanTableResponse'
+        400:
+          $ref: '#/components/responses/BadRequestErrorResponse'
+        401:
+          $ref: '#/components/responses/UnauthorizedResponse'
+        403:
+          $ref: '#/components/responses/ForbiddenResponse'
+        404:
+          description:
+            Not Found
+            - NoSuchTableException, the table does not exist
+            - NoSuchNamespaceException, the namespace does not exist
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/ErrorModel'
+              examples:
+                TableDoesNotExist:
+                  $ref: '#/components/examples/NoSuchTableError'
+                NamespaceDoesNotExist:
+                  $ref: '#/components/examples/NoSuchNamespaceError'
+        419:
+          $ref: '#/components/responses/AuthenticationTimeoutResponse'
+        421:

Review Comment:
   looks like there are currently multiple ways to express errors:
   
   1. the API section points to a response model in the responses section, 
which is an error response dedicated for a specific error code. The error 
response model directly uses the generic IcebergErrorResponse schema, the error 
code is not strictly enforced in the model, but described through example. Most 
of the error responses are like that, for example:
   
   In API: 
   ```yaml
   419:
       $ref: '#/components/responses/AuthenticationTimeoutResponse'
   ```
   
   In Response schema:
   
   ```yaml
       AuthenticationTimeoutResponse:
         description:
           Credentials have timed out. If possible, the client should refresh 
credentials and retry.
         content:
           application/json:
             schema:
               $ref: '#/components/schemas/IcebergErrorResponse'
             example: {
               "error": {
                 "message": "Credentials have timed out",
                 "type": "AuthenticationTimeoutException",
                 "code": 419
               }
             }
   ```
   
   2. the APi section has a specific description of the exceptions of an error 
code, and it's schema is the IcebergErrorResponse, and there are examples 
describing specific error shapes. These typically map to the final Java 
exceptions. For example:
   
   ```yaml
           404:
             description:
               Not Found
               - NoSuchTableException, the table does not exist
               - NoSuchNamespaceException, the namespace does not exist
             content:
               application/json:
                 schema:
                   $ref: '#/components/schemas/IcebergErrorResponse'
                 examples:
                   TableDoesNotExist:
                     $ref: '#/components/examples/NoSuchTableError'
                   NamespaceDoesNotExist:
                     $ref: '#/components/examples/NoSuchNamespaceError'
   ```
   
   I am now just following the convention for 404, and use approach 1 to 
describe other errors including the new ones 421 and 422.
   
   But I feel both ways being not really ideal. 
   
   In approach 1, the error response model created for each error code does not 
enforce it, as the comment suggests "The fields `message` and `type` as 
indicated here are not presently prescriptive." Why didn't we use a 
discriminator to enforce the error code for each response model?
   
   In approach 2, I feel at least we should use something like:
   
   ```yaml
   content:
               application/json:
                 schema:
                   $ref: '#/components/schemas/ResourceNotFoundResponse'
   ```
   
   Instead of just a generic IcebergErrorResponse.
   
   I also think we should have more of approach 2 instead of approach 1, spend 
more efforts documenting known error types that maps to known Java exceptions. 
Seems like today only 404 uses approach 2. 
   
   For the PlanTable API here, I think we will definitely throw dedicated 
exceptions like `PreplanTableRequiredException` for 421, 
`TooManyScanTasksException` for 422 in the implementation, that can be 
documented.
   
   If we agree with these points, I can start to raise separated PRs to address 
those issues.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to