rdblue commented on code in PR #9728:
URL: https://github.com/apache/iceberg/pull/9728#discussion_r1495060431


##########
format/spec.md:
##########
@@ -1237,17 +1237,36 @@ Content file (data or delete) is serialized as a JSON 
object according to the fo
 | **`equality-ids`**       |`JSON list of int: Field ids used to determine row 
equality in equality delete files`|`[1]`|
 | **`sort-order-id`**      |`JSON int`|`1`|
 
-### File Scan Task Serialization
-
-File scan task is serialized as a JSON object according to the following table.
-
-| Metadata field       |JSON representation|Example|
-|--------------------------|--- |--- |
-| **`schema`**          |`JSON object`|`See above, read schemas instead`|
-| **`spec`**            |`JSON object`|`See above, read partition specs 
instead`|
-| **`data-file`**       |`JSON object`|`See above, read content file instead`|
-| **`delete-files`**    |`JSON list of objects`|`See above, read content file 
instead`|
-| **`residual-filter`** |`JSON object: residual filter 
expression`|`{"type":"eq","term":"id","value":1}`|
+### Task Serialization

Review Comment:
   I would definitely prefer it if this were _not_ part of the spec. I don't 
think there is any reason to require a specific JSON serialization and I was 
surprised to see it in the spec. It's great to have documentation on exactly 
what the parsers produce, but we have many parsers that are not covered by the 
table spec and are instead in other documents like the Puffin spec, View spec, 
or REST spec.
   
   To me, state serialization is a concern internal to Flink. It's harder to 
adhere to a spec for that, plus make guarantees about forward and backward 
compatibility. And without context for how this is used and why it is here, we 
can't make decisions about how to evolve this. For example, if you wanted to 
remove a field that Flink doesn't use, how do you know whether that is safe in 
the table spec? What does it mean for this to evolve "safely"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to