aihuaxu commented on code in PR #12658: URL: https://github.com/apache/iceberg/pull/12658#discussion_r2051252315
########## format/spec.md: ########## @@ -648,6 +648,21 @@ Notes: 5. The `content_offset` and `content_size_in_bytes` fields are used to reference a specific blob for direct access to a deletion vector. For deletion vectors, these values are required and must exactly match the `offset` and `length` stored in the Puffin footer for the deletion vector blob. 6. The following field ids are reserved on `data_file`: 141. +For Variant, values in the `lower_bounds` and `upper_bounds` maps store the a serialized Variant object that contains lower and upper bounds for fields within the Variant. The object keys are normalized JSON path expressions that uniquely identify a Variant field. The object values are primitive Variant representations of the lower or upper bound for the field. Including bounds for any field is optional and the bounds must have the same Variant type. + +Bounds for a field must be accurate for all non-null values of the field in a data file. Bounds for values within arrays must be accurate all values in the array. Bounds must not be written to describe values with mixed Variant types (other than null). For example, a "measurement" field that contains int64 and null values may have bounds, but a string value such as "n/a" or "0" in any record would cause the bounds to be skipped. + +The Variant bounds objects are serialized by concatenating the [Variant encoding](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md) of the metadata (containing the normalized field paths) and the bounds object. +Field paths follow the JSON path format to use normalized path, such as `$['location']['latitude']` or `$['user.name']`. The special path `$` represents bounds for the variant root, indicating that the variant data consists of uniform primitive types, such as strings. + +Examples of valid field paths using normalized JSON path format are: + +* `$` -- the Variant root value +* `$['user.name']` -- the field `"user.name"` in the root value that is a Variant object +* `$['location']['latitude']` -- the field `latitude` in a nested `location` object +* `$['ids']` -- the `ids` array Review Comment: Let me do that to match as much as possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org