flyrain commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1724083771
########## format/spec.md: ########## @@ -323,16 +327,17 @@ Partition field IDs must be reused if an existing partition spec contains an equ #### Partition Transforms -| Transform name | Description | Source types | Result type | -|-------------------|--------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-------------| -| **`identity`** | Source value, unmodified | Any | Source type | -| **`bucket[N]`** | Hash of value, mod `N` (see below) | `int`, `long`, `decimal`, `date`, `time`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`, `string`, `uuid`, `fixed`, `binary` | `int` | -| **`truncate[W]`** | Value truncated to width `W` (see below) | `int`, `long`, `decimal`, `string`, `binary` | Source type | -| **`year`** | Extract a date or timestamp year, as years from 1970 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`month`** | Extract a date or timestamp month, as months from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`day`** | Extract a date or timestamp day, as days from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`hour`** | Extract a timestamp hour, as hours from 1970-01-01 00:00:00 | `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`void`** | Always produces `null` | Any | Source type or `int` | +| Transform name | Description | Source types | Result type | +|-------------------|--------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|----------------------| +| **`identity`** | Source value, unmodified | Any | Source type | Review Comment: Maybe that's fine if it is comparable, but practically people will always use `xz2`, right? I'm not sure, but wondering if there is some implications, e.g., too expensive, or super high cardinality, so that we don't recommend user to use the original GEO value as the partition spec. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org