rdblue commented on code in PR #9728: URL: https://github.com/apache/iceberg/pull/9728#discussion_r1490201923
########## format/spec.md: ########## @@ -1239,15 +1239,34 @@ Content file (data or delete) is serialized as a JSON object according to the fo ### File Scan Task Serialization -File scan task is serialized as a JSON object according to the following table. - -| Metadata field |JSON representation|Example| -|--------------------------|--- |--- | -| **`schema`** |`JSON object`|`See above, read schemas instead`| -| **`spec`** |`JSON object`|`See above, read partition specs instead`| -| **`data-file`** |`JSON object`|`See above, read content file instead`| -| **`delete-files`** |`JSON list of objects`|`See above, read content file instead`| -| **`residual-filter`** |`JSON object: residual filter expression`|`{"type":"eq","term":"id","value":1}`| +There could be different implementations of file scan task, +e.g., `BaseFileScanTask` and `StaticDataTask` in Java. +A enum `task-type` field is needed to distinguish different task types. + +| Metadata field | JSON representation | Example | +|-----------------|---------------------|---------------------------------------------------------------------------------------| +| **`task-type`** | `JSON string` | `base-file-task`, `static-data-task`. Absence of this field should be interpreted as `base-file-task` | Review Comment: This should not have "base" in it. That's a Java-specific prefix we use for classes that are the base implementation of an interface. I'm not sure about having a task type here if this is a file scan task. A file scan task should be distinct from a data task. In the Java API, a data task is a file scan task, but that is just for API compatibility. They are conceptually separate so we should keep them separate here. Maybe that just means this section should be "Task Serialization" and we have a different set of fields for "file-scan-task" or "data-task"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org