Assuming it compresses before writing, you wouldn't be able to tell when you read a file how it was actually created, would you?
On Tue, 31 Mar 2026 at 00:57, Micah Kornfield <[email protected]> wrote: > Hi Adrian, > Very interesting idea, I don't recall seeing this used in any of the > reference implementations. On the surface I agree it looks compatible but > I need to think a little bit more deeply about it. > > Cheers, > Micah > > On Mon, Mar 30, 2026 at 3:27 PM Adrian Garcia Badaracco < > [email protected]> > wrote: > > > I think I've found a neat trick for making smaller bloom filters: > > https://github.com/apache/arrow-rs/pull/9628 > > > > The idea is that you choose a largeish initial bloom filter size and once > > you're done populating it you compress it by folding it onto itself if it > > is sparse. > > > > Does anyone know if this trick is used in any other Parquet > implementation? > > As far as I can tell it is compatible with the spec and should cause no > > issues, but I haven't heard of anyone doing this before. > > >
