rdblue commented on code in PR #10831:
URL: https://github.com/apache/iceberg/pull/10831#discussion_r1815741890


##########
format/spec.md:
##########
@@ -178,6 +178,11 @@ A **`list`** is a collection of values with some element 
type. The element field
 
 A **`map`** is a collection of key-value pairs with a key type and a value 
type. Both the key field and value field each have an integer id that is unique 
in the table schema. Map keys are required and map values can be either 
optional or required. Both map keys and map values may be any type, including 
nested types.
 
+### Semi-structured Types
+
+A **`variant`** is a type to represent semi-structured data. A variant value 
can store a value of any other type, including `null`, any primitive, struct, 
list or map value. The variant encoding is defined the [Apache Parquet 
Project](https://github.com/apache/parquet-format/blob/4f208158dba80ff4bff4afaa4441d7270103dff6/VariantEncoding.md).
 Variant type is added in [v3](#version-3).

Review Comment:
   The other cases are more specific about what is present, rather than what is 
being represented. Also, I don't think that the description is accurate. A 
variant cannot store maps. I would rather state clearly what a variant stores 
so that there is no ambiguity.
   
   How about this instead?
   > A **`variant`** is a binary value that encodes semi-structured data. The 
structure and data types in a variant are not necessarily consistent across 
rows in a table or data file. The variant type and binary encoding are defined 
in the Parquet project. Support for Variant is added in Iceberg v3.
   >
   > Variants are similar to JSON with a wider set of primitive values 
including date, timestamp, timestamptz, 
   binary, and floating points.
   >
   > Variant values may contain nested types:
   > * An array is an ordered collection of variant values
   > * An object is a collection of fields that are a string key and a variant 
value
   >
   > As a semi-structured type, there are important differences between variant 
and Iceberg's other types:
   > * Variant arrays are similar to lists, but may contain any variant value 
rather than a fixed element type
   > * Variant objects are similar to structs, but may contain variable fields 
identified by name and field values may be any variant value rather than a 
fixed field type
   > * Variant primitives are narrower than Iceberg's primitive types: uuid, 
time, fixed(L), and nanosecond precision timestamp(tz) are not supported
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to