szehon-ho commented on code in PR #9661: URL: https://github.com/apache/iceberg/pull/9661#discussion_r1484906498
########## format/spec.md: ########## @@ -301,12 +301,14 @@ Tables are configured with a **partition spec** that defines how to produce a tu * A **transform** that is applied to the source column(s) to produce a partition value * A **partition name** -The source column, selected by id, must be a primitive type and cannot be contained in a map or list, but may be nested in a struct. For details on how to serialize a partition spec to JSON, see Appendix C. +The source column(s), selected by id(s), must be a primitive type and cannot be contained in a map or list, but may be nested in a struct. The ability to have multiple source columns is added in V3, with a flag to allow tables in V2 to use this feature. For serialization and backward compatibility details, see Appendix C. Partition specs capture the transform from table data to partition values. This is used to transform predicates to partition predicates, in addition to transforming data values. Deriving partition predicates from column predicates on the table data is used to separate the logical queries from physical storage: the partitioning can change and the correct partition filters are always derived from column predicates. This simplifies queries because users don’t have to supply both logical predicates and partition predicates. For more information, see Scan Planning below. Two partition specs are considered equivalent with each other if they have the same number of fields and for each corresponding field, the fields have the same source column ID, transform definition and partition name. Writers must not create a new parition spec if there already exists a compatible partition spec defined in the table. + Review Comment: Removed ########## format/spec.md: ########## @@ -1130,14 +1144,11 @@ Each partition field in the fields list is stored as an object. See the table fo |**`hour`**|`JSON string: "hour"`|`"hour"`| |**`Partition Field`** [1,2]|`JSON object: {`<br /> `"source-id": <id int>,`<br /> `"field-id": <field id int>,`<br /> `"name": <name string>,`<br /> `"transform": <transform JSON>`<br />`}`|`{`<br /> `"source-id": 1,`<br /> `"field-id": 1000,`<br /> `"name": "id_bucket",`<br /> `"transform": "bucket[16]"`<br />`}`| -In some cases partition specs are stored using only the field list instead of the object format that includes the spec ID, like the deprecated `partition-spec` field in table metadata. The object format should be used unless otherwise noted in this spec. - -The `field-id` property was added for each partition field in v2. In v1, the reference implementation assigned field ids sequentially in each spec starting at 1,000. See Partition Evolution for more details. - Notes: - -1. For partition fields with a transform with a single argument, the ID of the source field is set on `source-id`, and `source-ids` is omitted. -2. For partition fields with a transform of multiple arguments, the IDs of the source fields are set on `source-ids`. To preserve backward compatibility, `source-id` is set to -1. +1. In some cases partition specs are stored using only the field list instead of the object format that includes the spec ID, like the deprecated `partition-spec` field in table metadata. The object format should be used unless otherwise noted in this spec. +2. The `field-id` property was added for each partition field in v2. In v1, the reference implementation assigned field ids sequentially in each spec starting at 1,000. See Partition Evolution for more details. +3. For partition fields with a transform with a single argument, the ID of the source field is set on `source-id`, and `source-ids` is omitted. +2. For partition fields with a transform with multiple arguments, the IDs of the source fields are set on `source-ids`, and `source-id` is set to -1. This is only allowed in tables of version >= V3, or in tables of version >= V2 where compatibility.multi-arg-transform.enabled is true. In the latter case, no guarantees are made that all implementations will successfully read/write this table metadata. Review Comment: Good catch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org