rdblue commented on code in PR #9695:
URL: https://github.com/apache/iceberg/pull/9695#discussion_r1494924023


##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -2068,6 +2162,145 @@ components:
           items:
             $ref: '#/components/schemas/PartitionStatisticsFile'
 
+    PlanTask:
+      description:
+        A JSON object that contains information provided by the server,
+        to be utilized by clients for distributed planning, should be supplied
+        as is for input in PlanTable operation.
+      type: object
+
+    FileScanTask:
+      type: object
+      required:
+        - schema
+        - spec
+        - start
+        - length
+        - data-file
+      properties:
+        data-file:
+          $ref: '#/components/schemas/ContentFile'
+        partition:
+          type: object
+          additionalProperties:
+            type: string
+        size-bytes:
+          type: number
+        start:
+          type: number
+        length:
+          type: number
+        estimated-rows-count:
+          type: number
+        delete-files:
+          type: array
+          items:
+            $ref: '#/components/schemas/ContentFile'
+        schema:
+          $ref: '#/components/schemas/Schema'
+        spec:
+          $ref: '#/components/schemas/PartitionSpec'
+        residual-filter:
+          $ref: '#/components/schemas/Expression'

Review Comment:
   I'm not sure that we want to do this on the service. This would cause the 
service to run fairly expensive analysis and would inflate the response size. 
In some cases, that response size could get really large. For instance, if you 
send an IN predicate with a large key set.
   
   This is also not widely used. In Spark, the original predicate is run for 
every task instead of the residual.
   
   I think the solution here is to document that `residual-filter` is optional. 
If it is not present, the residual should be calculated on the client or the 
original filter should be used to filter the rows from the file. That way the 
service can decide whether it makes sense to send residuals.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to