xxchan commented on code in PR #11749:
URL: https://github.com/apache/iceberg/pull/11749#discussion_r1881671025


##########
format/spec.md:
##########
@@ -454,7 +454,7 @@ Partition field IDs must be reused if an existing partition 
spec contains an equ
 | **`truncate[W]`** | Value truncated to width `W` (see below)                 
    | `int`, `long`, `decimal`, `string`, `binary`                              
                                | Source type |
 | **`year`**        | Extract a date or timestamp year, as years from 1970     
    | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`      
                                | `int`       |
 | **`month`**       | Extract a date or timestamp month, as months from 
1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, 
`timestamptz_ns`                                      | `int`       |
-| **`day`**         | Extract a date or timestamp day, as days from 1970-01-01 
    | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`      
                                | `int`       |
+| **`day`**         | Extract a date or timestamp day, as days from 1970-01-01 
    | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`      
                                | `int` (the physical type should be an `int`, 
but the the logical type should be a `date`) |

Review Comment:
   > I think the spec is clear here for day transform. days from 1970-01-01 is 
a int.
   
   I believe this is not clear enough, and has lead to problems repeately in 
the wild like https://github.com/apache/iceberg-rust/issues/478.
   
   As also mentioned by Fokko, what is now persisted is really an "Avro Date". 
Parse it by assuming it's an Avro Int will lead to error.
   
   > When it inserts data, the reference Java Iceberg implementation writes the 
Avro manifest files, using an Avro type of Date for the partition struct value.
   
   ---
   
   Actually this looks a case of **abstraction leak** to me: We didn't specify 
`date` is `int` (`days from 1970-01-01`). 
   
   But the `day` transform here requires:
   1. The value is `int` (`days from 1970-01-01`)
   2. The value should be serialized/displayed as `Date` (This is not mentioned 
in the spec here, but is in the reference implementation.)
   
   This implicitly forces `date` to be `int`. (And then `day` transform's 
return should also be `date`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to