wgtmac commented on code in PR #12644: URL: https://github.com/apache/iceberg/pull/12644#discussion_r2015866843
########## format/spec.md: ########## @@ -540,7 +540,7 @@ Notes: 2. The width, `W`, used to truncate decimal values is applied using the scale of the decimal column to avoid additional (and potentially conflicting) parameters. 3. Strings are truncated to a valid UTF-8 string with no more than `L` code points. 4. In contrast to strings, binary values do not have an assumed encoding and are truncated to `L` bytes. - +5. For multi-argument bucketing, the hashes are `xor`'ed: `hash(col1) ⊕ hash(col2) ⊕ ... ⊕ hash(colN)) % W`. Review Comment: Is there a specific reason to choose `xor` but not other arithmetic operations? ########## format/spec.md: ########## @@ -540,7 +540,7 @@ Notes: 2. The width, `W`, used to truncate decimal values is applied using the scale of the decimal column to avoid additional (and potentially conflicting) parameters. 3. Strings are truncated to a valid UTF-8 string with no more than `L` code points. 4. In contrast to strings, binary values do not have an assumed encoding and are truncated to `L` bytes. - +5. For multi-argument bucketing, the hashes are `xor`'ed: `hash(col1) ⊕ hash(col2) ⊕ ... ⊕ hash(colN)) % W`. Review Comment: Based on the fact that bucket(null) returns null, if any value is null then the entire hash value should be null? I agree that this should be explicit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org