rdblue commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1494924023
########## open-api/rest-catalog-open-api.yaml: ########## @@ -2068,6 +2162,145 @@ components: items: $ref: '#/components/schemas/PartitionStatisticsFile' + PlanTask: + description: + A JSON object that contains information provided by the server, + to be utilized by clients for distributed planning, should be supplied + as is for input in PlanTable operation. + type: object + + FileScanTask: + type: object + required: + - schema + - spec + - start + - length + - data-file + properties: + data-file: + $ref: '#/components/schemas/ContentFile' + partition: + type: object + additionalProperties: + type: string + size-bytes: + type: number + start: + type: number + length: + type: number + estimated-rows-count: + type: number + delete-files: + type: array + items: + $ref: '#/components/schemas/ContentFile' + schema: + $ref: '#/components/schemas/Schema' + spec: + $ref: '#/components/schemas/PartitionSpec' + residual-filter: + $ref: '#/components/schemas/Expression' Review Comment: I'm not sure that we want to do this on the service. This would cause the service to run fairly expensive analysis and would inflate the response size. In some cases, that response size could get really large. For instance, if you send an IN predicate with a large key set. This is also not widely used. In Spark, the original predicate is run for every task instead of the residual. I think the solution here is to document that `residual-filter` is optional. If it is not present, the residual should be calculated on the client or the original filter should be used to filter the rows from the file. That way the service can decide whether it makes sense to send residuals. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org