Re: Compressing bloom filters

Steve Loughran Tue, 31 Mar 2026 08:37:02 -0700

Assuming it compresses before writing, you wouldn't be able to tell when
you read a file how it was actually created, would you?


On Tue, 31 Mar 2026 at 00:57, Micah Kornfield <[email protected]> wrote:

> Hi Adrian,
> Very interesting idea, I don't recall seeing this used in any of the
> reference implementations.  On the surface I agree it looks compatible but
> I need to think a little bit more deeply about it.
>
> Cheers,
> Micah
>
> On Mon, Mar 30, 2026 at 3:27 PM Adrian Garcia Badaracco <
> [email protected]>
> wrote:
>
> > I think I've found a neat trick for making smaller bloom filters:
> > https://github.com/apache/arrow-rs/pull/9628
> >
> > The idea is that you choose a largeish initial bloom filter size and once
> > you're done populating it you compress it by folding it onto itself if it
> > is sparse.
> >
> > Does anyone know if this trick is used in any other Parquet
> implementation?
> > As far as I can tell it is compatible with the spec and should cause no
> > issues, but I haven't heard of anyone doing this before.
> >
>

Re: Compressing bloom filters

Reply via email to