aokolnychyi commented on PR #8579:
URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1899444592

   @rdblue recently pointed me to the Bloom filter 
[spec](https://github.com/apache/parquet-format/blob/master/BloomFilter.md) in 
Parquet. I think it contains a few interesting ideas that may be applicable to 
us. First of all, we should evaluate other hash functions apart from Murmur3. 
Parquet, for instance, uses xxHash that is supposed to be much faster. Second, 
Parquet avoids the modulo operator for performance reasons. Given all this 
information, I suggest we make this PR about multi-arg transforms in general 
(like how they are stored, how they are serialized, what happens during schema 
evolution, compatibility etc) and submit another one with `bucketV2` that will 
not only support multiple input elements but also be faster. If we merge a 
general change about multi-arg transforms, we can start working on changes to 
the expression API.
   
   @advancedxy @szehon-ho, how does this sound?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to