RussellSpitzer commented on code in PR #10981:
URL: https://github.com/apache/iceberg/pull/10981#discussion_r1767640808


##########
format/spec.md:
##########
@@ -1000,59 +1010,66 @@ Values should be stored in Parquet using the types and 
logical type annotations
 
 Lists must use the [3-level 
representation](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists).
 
-| Type               | Parquet physical type                                   
           | Logical type                                | Notes                
                                          |
-|--------------------|--------------------------------------------------------------------|---------------------------------------------|----------------------------------------------------------------|
-| **`boolean`**      | `boolean`                                               
           |                                             |                      
                                          |
-| **`int`**          | `int`                                                   
           |                                             |                      
                                          |
-| **`long`**         | `long`                                                  
           |                                             |                      
                                          |
-| **`float`**        | `float`                                                 
           |                                             |                      
                                          |
-| **`double`**       | `double`                                                
           |                                             |                      
                                          |
-| **`decimal(P,S)`** | `P <= 9`: `int32`,<br />`P <= 18`: `int64`,<br 
/>`fixed` otherwise | `DECIMAL(P,S)`                              | Fixed must 
use the minimum number of bytes that can store `P`. |
-| **`date`**         | `int32`                                                 
           | `DATE`                                      | Stores days from 
1970-01-01.                                   |
-| **`time`**         | `int64`                                                 
           | `TIME_MICROS` with `adjustToUtc=false`      | Stores microseconds 
from midnight.                             |
-| **`timestamp`**    | `int64`                                                 
           | `TIMESTAMP_MICROS` with `adjustToUtc=false` | Stores microseconds 
from 1970-01-01 00:00:00.000000.           |
-| **`timestamptz`**  | `int64`                                                 
           | `TIMESTAMP_MICROS` with `adjustToUtc=true`  | Stores microseconds 
from 1970-01-01 00:00:00.000000 UTC.       |
-| **`timestamp_ns`**   | `int64`                                               
           | `TIMESTAMP_NANOS` with `adjustToUtc=false`  | Stores nanoseconds 
from 1970-01-01 00:00:00.000000000.         |
-| **`timestamptz_ns`** | `int64`                                               
           | `TIMESTAMP_NANOS` with `adjustToUtc=true`   | Stores nanoseconds 
from 1970-01-01 00:00:00.000000000 UTC.     |
-| **`string`**       | `binary`                                                
           | `UTF8`                                      | Encoding must be 
UTF-8.                                        |
-| **`uuid`**         | `fixed_len_byte_array[16]`                              
           | `UUID`                                      |                      
                                          |
-| **`fixed(L)`**     | `fixed_len_byte_array[L]`                               
           |                                             |                      
                                          |
-| **`binary`**       | `binary`                                                
           |                                             |                      
                                          |
-| **`struct`**       | `group`                                                 
           |                                             |                      
                                          |
-| **`list`**         | `3-level list`                                          
           | `LIST`                                      | See Parquet docs for 
3-level representation.                   |
-| **`map`**          | `3-level map`                                           
           | `MAP`                                       | See Parquet docs for 
3-level representation.                   |
+| Type               | Parquet physical type                                   
           | Logical type                                | Notes                
                                                                                
   |
+|--------------------|--------------------------------------------------------------------|---------------------------------------------|---------------------------------------------------------------------------------------------------------|
+| **`boolean`**      | `boolean`                                               
           |                                             |                      
                                                                                
   |
+| **`int`**          | `int`                                                   
           |                                             |                      
                                                                                
   |
+| **`long`**         | `long`                                                  
           |                                             |                      
                                                                                
   |
+| **`float`**        | `float`                                                 
           |                                             |                      
                                                                                
   |
+| **`double`**       | `double`                                                
           |                                             |                      
                                                                                
   |
+| **`decimal(P,S)`** | `P <= 9`: `int32`,<br />`P <= 18`: `int64`,<br 
/>`fixed` otherwise | `DECIMAL(P,S)`                              | Fixed must 
use the minimum number of bytes that can store `P`.                             
             |
+| **`date`**         | `int32`                                                 
           | `DATE`                                      | Stores days from 
1970-01-01.                                                                     
       |
+| **`time`**         | `int64`                                                 
           | `TIME_MICROS` with `adjustToUtc=false`      | Stores microseconds 
from midnight.                                                                  
    |
+| **`timestamp`**    | `int64`                                                 
           | `TIMESTAMP_MICROS` with `adjustToUtc=false` | Stores microseconds 
from 1970-01-01 00:00:00.000000.                                                
    |
+| **`timestamptz`**  | `int64`                                                 
           | `TIMESTAMP_MICROS` with `adjustToUtc=true`  | Stores microseconds 
from 1970-01-01 00:00:00.000000 UTC.                                            
    |
+| **`timestamp_ns`**   | `int64`                                               
           | `TIMESTAMP_NANOS` with `adjustToUtc=false`  | Stores nanoseconds 
from 1970-01-01 00:00:00.000000000.                                             
     |
+| **`timestamptz_ns`** | `int64`                                               
           | `TIMESTAMP_NANOS` with `adjustToUtc=true`   | Stores nanoseconds 
from 1970-01-01 00:00:00.000000000 UTC.                                         
     |
+| **`string`**       | `binary`                                                
           | `UTF8`                                      | Encoding must be 
UTF-8.                                                                          
       |
+| **`uuid`**         | `fixed_len_byte_array[16]`                              
           | `UUID`                                      |                      
                                                                                
   |
+| **`fixed(L)`**     | `fixed_len_byte_array[L]`                               
           |                                             |                      
                                                                                
   |
+| **`binary`**       | `binary`                                                
           |                                             |                      
                                                                                
   |
+| **`struct`**       | `group`                                                 
           |                                             |                      
                                                                                
   |
+| **`list`**         | `3-level list`                                          
           | `LIST`                                      | See Parquet docs for 
3-level representation.                                                         
   |
+| **`map`**          | `3-level map`                                           
           | `MAP`                                       | See Parquet docs for 
3-level representation.                                                         
   |
+| **`geometry`**     | `binary`                                                
           | `GEOMETRY`                                  | WKB format, see 
Appendix G. Logical type annotation optional for supported Parquet format 
versions [1]. |
+
+Notes:
 
+1. 
[https://github.com/apache/parquet-format/pull/240](https://github.com/apache/parquet-format/pull/240))
 
 ### ORC
 
 **Data Type Mappings**
 
-| Type               | ORC type            | ORC type attributes               
                   | Notes                                                      
                             |
-|--------------------|---------------------|------------------------------------------------------|-----------------------------------------------------------------------------------------|
-| **`boolean`**      | `boolean`           |                                   
                   |                                                            
                             |
-| **`int`**          | `int`               |                                   
                   | ORC `tinyint` and `smallint` would also map to **`int`**.  
                             |
-| **`long`**         | `long`              |                                   
                   |                                                            
                             |
-| **`float`**        | `float`             |                                   
                   |                                                            
                             |
-| **`double`**       | `double`            |                                   
                   |                                                            
                             |
-| **`decimal(P,S)`** | `decimal`           |                                   
                   |                                                            
                             |
-| **`date`**         | `date`              |                                   
                   |                                                            
                             |
-| **`time`**         | `long`              | `iceberg.long-type`=`TIME`        
                   | Stores microseconds from midnight.                         
                             |
-| **`timestamp`**    | `timestamp`         | `iceberg.timestamp-unit`=`MICROS` 
                   | Stores microseconds from 2015-01-01 00:00:00.000000. [1], 
[2]                           |
-| **`timestamptz`**  | `timestamp_instant` | `iceberg.timestamp-unit`=`MICROS` 
                   | Stores microseconds from 2015-01-01 00:00:00.000000 UTC. 
[1], [2]                       |
-| **`timestamp_ns`** | `timestamp`         | `iceberg.timestamp-unit`=`NANOS`  
                   | Stores nanoseconds from 2015-01-01 00:00:00.000000000. [1] 
                             |
-| **`timestamptz_ns`** | `timestamp_instant` | 
`iceberg.timestamp-unit`=`NANOS`                   | Stores nanoseconds from 
2015-01-01 00:00:00.000000000 UTC. [1]                          |
-| **`string`**       | `string`            |                                   
                   | ORC `varchar` and `char` would also map to **`string`**.   
                             |
-| **`uuid`**         | `binary`            | `iceberg.binary-type`=`UUID`      
                   |                                                            
                             |
-| **`fixed(L)`**     | `binary`            | `iceberg.binary-type`=`FIXED` & 
`iceberg.length`=`L` | The length would not be checked by the ORC reader and 
should be checked by the adapter. |
-| **`binary`**       | `binary`            |                                   
                   |                                                            
                             |
-| **`struct`**       | `struct`            |                                   
                   |                                                            
                             |
-| **`list`**         | `array`             |                                   
                   |                                                            
                             |
-| **`map`**          | `map`               |                                   
                   |                                                            
                             |
+
+| Type               | ORC type            | ORC type attributes               
                   | Notes                                                      
                              |
+|--------------------|---------------------|------------------------------------------------------|------------------------------------------------------------------------------------------|
+| **`boolean`**      | `boolean`           |                                   
                   |                                                            
                              |
+| **`int`**          | `int`               |                                   
                   | ORC `tinyint` and `smallint` would also map to **`int`**.  
                              |
+| **`long`**         | `long`              |                                   
                   |                                                            
                              |
+| **`float`**        | `float`             |                                   
                   |                                                            
                              |
+| **`double`**       | `double`            |                                   
                   |                                                            
                              |
+| **`decimal(P,S)`** | `decimal`           |                                   
                   |                                                            
                              |
+| **`date`**         | `date`              |                                   
                   |                                                            
                              |
+| **`time`**         | `long`              | `iceberg.long-type`=`TIME`        
                   | Stores microseconds from midnight.                         
                              |
+| **`timestamp`**    | `timestamp`         | `iceberg.timestamp-unit`=`MICROS` 
                   | Stores microseconds from 2015-01-01 00:00:00.000000. [1], 
[2]                            |
+| **`timestamptz`**  | `timestamp_instant` | `iceberg.timestamp-unit`=`MICROS` 
                   | Stores microseconds from 2015-01-01 00:00:00.000000 UTC. 
[1], [2]                        |
+| **`timestamp_ns`** | `timestamp`         | `iceberg.timestamp-unit`=`NANOS`  
                   | Stores nanoseconds from 2015-01-01 00:00:00.000000000. [1] 
                              |
+| **`timestamptz_ns`** | `timestamp_instant` | 
`iceberg.timestamp-unit`=`NANOS`                   | Stores nanoseconds from 
2015-01-01 00:00:00.000000000 UTC. [1]                           |
+| **`string`**       | `string`            |                                   
                   | ORC `varchar` and `char` would also map to **`string`**.   
                              |
+| **`uuid`**         | `binary`            | `iceberg.binary-type`=`UUID`      
                   |                                                            
                              |
+| **`fixed(L)`**     | `binary`            | `iceberg.binary-type`=`FIXED` & 
`iceberg.length`=`L` | The length would not be checked by the ORC reader and 
should be checked by the adapter.  |
+| **`binary`**       | `binary`            |                                   
                   |                                                            
                              |
+| **`struct`**       | `struct`            |                                   
                   |                                                            
                              |
+| **`list`**         | `array`             |                                   
                   |                                                            
                              |
+| **`map`**          | `map`               |                                   
                   |                                                            
                              |
+| **`geometry`**     | `binary`            |                                   
                   | WKB format, see Appendix G; Optional geometry type for 
supported ORC format versions [3] |
 
 Notes:
 
 1. ORC's 
[TimestampColumnVector](https://orc.apache.org/api/hive-storage-api/org/apache/hadoop/hive/ql/exec/vector/TimestampColumnVector.html)
 consists of a time field (milliseconds since epoch) and a nanos field 
(nanoseconds within the second). Hence the milliseconds within the second are 
reported twice; once in the time field and again in the nanos field. The read 
adapter should only use milliseconds within the second from one of these 
fields. The write adapter should also report milliseconds within the second 
twice; once in the time field and again in the nanos field. ORC writer is 
expected to correctly consider millis information from one of the fields. More 
details at https://issues.apache.org/jira/browse/ORC-546
 2. ORC `timestamp` and `timestamp_instant` values store nanosecond precision. 
Iceberg ORC writers for Iceberg types `timestamp` and `timestamptz` **must** 
truncate nanoseconds to microseconds. `iceberg.timestamp-unit` is assumed to be 
`MICROS` if not present.
+3 
[https://github.com/apache/orc-format/pull/18](https://github.com/apache/orc-format/pull/18)

Review Comment:
   nit: missing . after 3



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to