advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1456810557
########## format/spec.md: ########## @@ -1060,6 +1076,27 @@ The types below are not currently valid for bucketing, and so are not hashed. Ho | **`float`** | `hashLong(doubleToLongBits(double(v))` [4]| `1.0F` → `-142385009`, `0.0F` → `1669671676`, `-0.0F` → `1669671676` | | **`double`** | `hashLong(doubleToLongBits(v))` [4]| `1.0D` → `-142385009`, `0.0D` → `1669671676`, `-0.0D` → `1669671676` | +For multiple arguments, hashBytes() is applied on the concatenated byte representation of each argument: + +| Primitive type | Bytes representation | +|----------------------|------------------------------------------------| +| **`int`** | `littleEndianBytes(long(v))` | +| **`long`** | `littleEndianBytes(v)` | +| **`decimal(P,S)`** | `minBigEndian(unscaled(v))` | +| **`date`** | `littleEndianBytes(daysFromUnixEpoch(v))` | +| **`time`** | `littleEndianBytes(microsecsFromMidnight(v))` | +| **`timestamp`** | `littleEndianBytes(microsecsFromUnixEpoch(v))` | +| **`timestamptz`** | `littleEndianBytes(microsecsFromUnixEpoch(v))` | +| **`timestamp_ns`** | `littleEndianBytes(nanosecsFromUnixEpoch(v))` | +| **`timestamptz_ns`** | `littleEndianBytes(nanosecsFromUnixEpoch(v))` | +| **`string`** | `utf8Bytes(v)` | +| **`uuid`** | `uuidBytes(v)` | +| **`fixed(L)`** | `v` | +| **`binary`** | `v` | + +For example, the hash representation of `(a:int, b:string)` will be `hashBytes(concatenation(littleEndianBytes(long(v)), utf8Bytes(b))` Review Comment: Thanks, fixed. ########## format/spec.md: ########## @@ -1060,6 +1076,27 @@ The types below are not currently valid for bucketing, and so are not hashed. Ho | **`float`** | `hashLong(doubleToLongBits(double(v))` [4]| `1.0F` → `-142385009`, `0.0F` → `1669671676`, `-0.0F` → `1669671676` | | **`double`** | `hashLong(doubleToLongBits(v))` [4]| `1.0D` → `-142385009`, `0.0D` → `1669671676`, `-0.0D` → `1669671676` | +For multiple arguments, hashBytes() is applied on the concatenated byte representation of each argument: Review Comment: addressed. ########## format/spec.md: ########## @@ -1119,21 +1156,30 @@ Partition specs are serialized as a JSON object with the following fields: Each partition field in the fields list is stored as an object. See the table for more detail: -|Transform or Field|JSON representation|Example| -|--- |--- |--- | -|**`identity`**|`JSON string: "identity"`|`"identity"`| -|**`bucket[N]`**|`JSON string: "bucket[<N>]"`|`"bucket[16]"`| -|**`truncate[W]`**|`JSON string: "truncate[<W>]"`|`"truncate[20]"`| -|**`year`**|`JSON string: "year"`|`"year"`| -|**`month`**|`JSON string: "month"`|`"month"`| -|**`day`**|`JSON string: "day"`|`"day"`| -|**`hour`**|`JSON string: "hour"`|`"hour"`| -|**`Partition Field`**|`JSON object: {`<br /> `"source-id": <id int>,`<br /> `"field-id": <field id int>,`<br /> `"name": <name string>,`<br /> `"transform": <transform JSON>`<br />`}`|`{`<br /> `"source-id": 1,`<br /> `"field-id": 1000,`<br /> `"name": "id_bucket",`<br /> `"transform": "bucket[16]"`<br />`}`| +| Transform or Field | JSON representation | Example | Review Comment: fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org