Fokko commented on code in PR #12644:
URL: https://github.com/apache/iceberg/pull/12644#discussion_r2016178968


##########
format/spec.md:
##########
@@ -540,7 +540,7 @@ Notes:
 2. The width, `W`, used to truncate decimal values is applied using the scale 
of the decimal column to avoid additional (and potentially conflicting) 
parameters.
 3. Strings are truncated to a valid UTF-8 string with no more than `L` code 
points.
 4. In contrast to strings, binary values do not have an assumed encoding and 
are truncated to `L` bytes.
-
+5. For multi-argument bucketing, the hashes are `xor`'ed: `hash(col1) ⊕ 
hash(col2) ⊕ ... ⊕ hash(colN)) % W`.

Review Comment:
   Great to meet you as well @sfc-gh-bhannel, thanks for jumping in here.
   
   > It also might be worth adding parentheses for those who don't recall the 
precedence order between XOR and modulo.
   
   Thanks, there was a bracket missing actually ;)
   
   The spec states (just below the table):
   
   > All transforms must return `null` for a `null` input value.
   
   I think we can amend this for the multi-arg transforms that we leave out 
`null` values, but we can still hash the other not-null fields.
   
   ```suggestion
   5. For multi-argument bucketing, the hashes for the not-null input values 
are `xor`'ed: `(hash(col1) ⊕ hash(col2) ⊕ ... ⊕ hash(colN)) % W`. The transform 
will return `null` when all input values are `null`.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to