rdblue commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1768943555
########## format/spec.md: ########## @@ -323,16 +327,17 @@ Partition field IDs must be reused if an existing partition spec contains an equ #### Partition Transforms -| Transform name | Description | Source types | Result type | -|-------------------|--------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-------------| -| **`identity`** | Source value, unmodified | Any | Source type | -| **`bucket[N]`** | Hash of value, mod `N` (see below) | `int`, `long`, `decimal`, `date`, `time`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`, `string`, `uuid`, `fixed`, `binary` | `int` | -| **`truncate[W]`** | Value truncated to width `W` (see below) | `int`, `long`, `decimal`, `string`, `binary` | Source type | -| **`year`** | Extract a date or timestamp year, as years from 1970 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`month`** | Extract a date or timestamp month, as months from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`day`** | Extract a date or timestamp day, as days from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`hour`** | Extract a timestamp hour, as hours from 1970-01-01 00:00:00 | `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`void`** | Always produces `null` | Any | Source type or `int` | +| Transform name | Description | Source types | Result type | +|-------------------|--------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|----------------------| +| **`identity`** | Source value, unmodified | Any | Source type | +| **`bucket[N]`** | Hash of value, mod `N` (see below) | `int`, `long`, `decimal`, `date`, `time`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`, `string`, `uuid`, `fixed`, `binary` | `int` | Review Comment: We may want to change this to be like `identity`, using `Any except [...]`. I would not include geo as a source column for bucketing because there is not a clear definition of equality for geo. The hash would depend on the structure of the object and weird things happen when two objects are "equal" (for some definition) but have different hash values. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org