Fokko commented on code in PR #12644: URL: https://github.com/apache/iceberg/pull/12644#discussion_r2016178968
########## format/spec.md: ########## @@ -540,7 +540,7 @@ Notes: 2. The width, `W`, used to truncate decimal values is applied using the scale of the decimal column to avoid additional (and potentially conflicting) parameters. 3. Strings are truncated to a valid UTF-8 string with no more than `L` code points. 4. In contrast to strings, binary values do not have an assumed encoding and are truncated to `L` bytes. - +5. For multi-argument bucketing, the hashes are `xor`'ed: `hash(col1) ⊕ hash(col2) ⊕ ... ⊕ hash(colN)) % W`. Review Comment: Great to meet you as well @sfc-gh-bhannel, thanks for jumping in here. > It also might be worth adding parentheses for those who don't recall the precedence order between XOR and modulo. Thanks, there was a bracket missing actually ;) The spec states (just below the table): > All transforms must return `null` for a `null` input value. I think we can amend this for the multi-arg transforms that we leave out `null` values, but we can still hash the other not-null fields. ```suggestion 5. For multi-argument bucketing, the hashes for the not-null input values are `xor`'ed: `(hash(col1) ⊕ hash(col2) ⊕ ... ⊕ hash(colN)) % W`. The transform will return `null` when all input values are `null`. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org