I think I've found a neat trick for making smaller bloom filters:
https://github.com/apache/arrow-rs/pull/9628

The idea is that you choose a largeish initial bloom filter size and once
you're done populating it you compress it by folding it onto itself if it
is sparse.

Does anyone know if this trick is used in any other Parquet implementation?
As far as I can tell it is compatible with the spec and should cause no
issues, but I haven't heard of anyone doing this before.

Reply via email to