rdblue commented on code in PR #10831:
URL: https://github.com/apache/iceberg/pull/10831#discussion_r1815755187


##########
format/spec.md:
##########
@@ -1025,28 +1033,29 @@ Values should be stored in Parquet using the types and 
logical type annotations
 
 Lists must use the [3-level 
representation](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists).
 
-| Type               | Parquet physical type                                   
           | Logical type                                | Notes                
                                          |
-|--------------------|--------------------------------------------------------------------|---------------------------------------------|----------------------------------------------------------------|
-| **`unknown`**      | None                                                    
           |                                             | Omit from data files 
                                          |
-| **`boolean`**      | `boolean`                                               
           |                                             |                      
                                          |
-| **`int`**          | `int`                                                   
           |                                             |                      
                                          |
-| **`long`**         | `long`                                                  
           |                                             |                      
                                          |
-| **`float`**        | `float`                                                 
           |                                             |                      
                                          |
-| **`double`**       | `double`                                                
           |                                             |                      
                                          |
-| **`decimal(P,S)`** | `P <= 9`: `int32`,<br />`P <= 18`: `int64`,<br 
/>`fixed` otherwise | `DECIMAL(P,S)`                              | Fixed must 
use the minimum number of bytes that can store `P`. |
-| **`date`**         | `int32`                                                 
           | `DATE`                                      | Stores days from 
1970-01-01.                                   |
-| **`time`**         | `int64`                                                 
           | `TIME_MICROS` with `adjustToUtc=false`      | Stores microseconds 
from midnight.                             |
-| **`timestamp`**    | `int64`                                                 
           | `TIMESTAMP_MICROS` with `adjustToUtc=false` | Stores microseconds 
from 1970-01-01 00:00:00.000000.           |
-| **`timestamptz`**  | `int64`                                                 
           | `TIMESTAMP_MICROS` with `adjustToUtc=true`  | Stores microseconds 
from 1970-01-01 00:00:00.000000 UTC.       |
-| **`timestamp_ns`**   | `int64`                                               
           | `TIMESTAMP_NANOS` with `adjustToUtc=false`  | Stores nanoseconds 
from 1970-01-01 00:00:00.000000000.         |
-| **`timestamptz_ns`** | `int64`                                               
           | `TIMESTAMP_NANOS` with `adjustToUtc=true`   | Stores nanoseconds 
from 1970-01-01 00:00:00.000000000 UTC.     |
-| **`string`**       | `binary`                                                
           | `UTF8`                                      | Encoding must be 
UTF-8.                                        |
-| **`uuid`**         | `fixed_len_byte_array[16]`                              
           | `UUID`                                      |                      
                                          |
-| **`fixed(L)`**     | `fixed_len_byte_array[L]`                               
           |                                             |                      
                                          |
-| **`binary`**       | `binary`                                                
           |                                             |                      
                                          |
-| **`struct`**       | `group`                                                 
           |                                             |                      
                                          |
-| **`list`**         | `3-level list`                                          
           | `LIST`                                      | See Parquet docs for 
3-level representation.                   |
-| **`map`**          | `3-level map`                                           
           | `MAP`                                       | See Parquet docs for 
3-level representation.                   |
+| Type                 | Parquet physical type                                 
             | Logical type                          | Notes                    
                                      |
+|----------------------|--------------------------------------------------------------------|---------------------------------------|----------------------------------------------------------------|
+| **`unknown`**        | None                                                  
             |                                       | Omit from data files     
                                      |
+| **`boolean`**        | `boolean`                                             
             |                                       |                          
                                      |
+| **`int`**            | `int`                                                 
             |                                       |                          
                                      |
+| **`long`**           | `long`                                                
             |                                       |                          
                                      |
+| **`float`**          | `float`                                               
             |                                       |                          
                                      |
+| **`double`**         | `double`                                              
             |                                       |                          
                                      |
+| **`decimal(P,S)`**   | `P <= 9`: `int32`,<br />`P <= 18`: `int64`,<br 
/>`fixed` otherwise | `DECIMAL(P,S)`                        | Fixed must use 
the minimum number of bytes that can store `P`. |
+| **`date`**           | `int32`                                               
             | `DATE`                                | Stores days from 
1970-01-01.                                   |
+| **`time`**           | `int64`                                               
             | `TIME_MICROS` with `adjustToUtc=false` | Stores microseconds 
from midnight.                             |
+| **`timestamp`**      | `int64`                                               
             | `TIMESTAMP_MICROS` with `adjustToUtc=false` | Stores 
microseconds from 1970-01-01 00:00:00.000000.           |
+| **`timestamptz`**    | `int64`                                               
             | `TIMESTAMP_MICROS` with `adjustToUtc=true` | Stores microseconds 
from 1970-01-01 00:00:00.000000 UTC.       |
+| **`timestamp_ns`**   | `int64`                                               
             | `TIMESTAMP_NANOS` with `adjustToUtc=false` | Stores nanoseconds 
from 1970-01-01 00:00:00.000000000.         |
+| **`timestamptz_ns`** | `int64`                                               
             | `TIMESTAMP_NANOS` with `adjustToUtc=true` | Stores nanoseconds 
from 1970-01-01 00:00:00.000000000 UTC.     |
+| **`string`**         | `binary`                                              
             | `UTF8`                                | Encoding must be UTF-8.  
                                      |
+| **`uuid`**           | `fixed_len_byte_array[16]`                            
             | `UUID`                                |                          
                                      |
+| **`fixed(L)`**       | `fixed_len_byte_array[L]`                             
             |                                       |                          
                                      |
+| **`binary`**         | `binary`                                              
             |                                       |                          
                                      |
+| **`struct`**         | `group`                                               
             |                                       |                          
                                      |
+| **`list`**           | `3-level list`                                        
             | `LIST`                                | See Parquet docs for 
3-level representation.                   |
+| **`map`**            | `3-level map`                                         
             | `MAP`                                 | See Parquet docs for 
3-level representation.                   |
+| **`variant`**        | `group` with `Data` and `Metadata` fields of `binary` 
type         |                                       | See Parquet docs for 
Variant encoding.                         |

Review Comment:
   This is inaccurate: the field names are case sensitive and one is named 
`value` not `data`.
   
   I think this should link to the Parquet docs, one for shredded and one for 
non-shredded values. The physical type should be:
   > `group` of `metadata` and `value` fields
   
   The logical type should be `VARIANT` (See 
https://github.com/apache/parquet-format/pull/460).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to