aihuaxu commented on code in PR #12658:
URL: https://github.com/apache/iceberg/pull/12658#discussion_r2051252315


##########
format/spec.md:
##########
@@ -648,6 +648,21 @@ Notes:
 5. The `content_offset` and `content_size_in_bytes` fields are used to 
reference a specific blob for direct access to a deletion vector. For deletion 
vectors, these values are required and must exactly match the `offset` and 
`length` stored in the Puffin footer for the deletion vector blob.
 6. The following field ids are reserved on `data_file`: 141.
 
+For Variant, values in the `lower_bounds` and `upper_bounds` maps store the a 
serialized Variant object that contains lower and upper bounds for fields 
within the Variant. The object keys are normalized JSON path expressions that 
uniquely identify a Variant field. The object values are primitive Variant 
representations of the lower or upper bound for the field. Including bounds for 
any field is optional and the bounds must have the same Variant type.
+
+Bounds for a field must be accurate for all non-null values of the field in a 
data file. Bounds for values within arrays must be accurate all values in the 
array. Bounds must not be written to describe values with mixed Variant types 
(other than null). For example, a "measurement" field that contains int64 and 
null values may have bounds, but a string value such as "n/a" or "0" in any 
record would cause the bounds to be skipped.
+
+The Variant bounds objects are serialized by concatenating the [Variant 
encoding](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md)
 of the metadata (containing the normalized field paths) and the bounds object.
+Field paths follow the JSON path format to use normalized path, such as 
`$['location']['latitude']` or `$['user.name']`. The special path `$` 
represents bounds for the variant root, indicating that the variant data 
consists of uniform primitive types, such as strings.
+
+Examples of valid field paths using normalized JSON path format are:
+
+* `$` -- the Variant root value
+* `$['user.name']` -- the field `"user.name"` in the root value that is a 
Variant object
+* `$['location']['latitude']` -- the field `latitude` in a nested `location` 
object
+* `$['ids']` -- the `ids` array

Review Comment:
   Let me do that to match as much as possible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to