xxchan commented on code in PR #11749: URL: https://github.com/apache/iceberg/pull/11749#discussion_r1881671025
########## format/spec.md: ########## @@ -454,7 +454,7 @@ Partition field IDs must be reused if an existing partition spec contains an equ | **`truncate[W]`** | Value truncated to width `W` (see below) | `int`, `long`, `decimal`, `string`, `binary` | Source type | | **`year`** | Extract a date or timestamp year, as years from 1970 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | | **`month`** | Extract a date or timestamp month, as months from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`day`** | Extract a date or timestamp day, as days from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | +| **`day`** | Extract a date or timestamp day, as days from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` (the physical type should be an `int`, but the the logical type should be a `date`) | Review Comment: > I think the spec is clear here for day transform. days from 1970-01-01 is a int. I believe this is not clear enough, and has lead to problems repeately in the wild like https://github.com/apache/iceberg-rust/issues/478. As also mentioned by Fokko, what is now persisted is really an "Avro Date". Parse it by assuming it's an Avro Int will lead to error. > When it inserts data, the reference Java Iceberg implementation writes the Avro manifest files, using an Avro type of Date for the partition struct value. --- Actually this looks a case of abstraction leak to me: We didn't specify `date` is `int` (`days from 1970-01-01`). But the `day` transform here requires: 1. The value is `int` (`days from 1970-01-01`) 2. The value should be serialized/displayed as `Date` This implicitly forces `date` to be `int`. (And then `day` transform's return should also be `date`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org