I think from a readers perspective there would be no indication of how the bloom filters were created. The folded versions are identical to having started with that size in the first place.
> On Mar 31, 2026, at 10:36 AM, Steve Loughran <[email protected]> wrote: > > Assuming it compresses before writing, you wouldn't be able to tell when > you read a file how it was actually created, would you? > > On Tue, 31 Mar 2026 at 00:57, Micah Kornfield <[email protected]> wrote: > >> Hi Adrian, >> Very interesting idea, I don't recall seeing this used in any of the >> reference implementations. On the surface I agree it looks compatible but >> I need to think a little bit more deeply about it. >> >> Cheers, >> Micah >> >> On Mon, Mar 30, 2026 at 3:27 PM Adrian Garcia Badaracco < >> [email protected]> >> wrote: >> >>> I think I've found a neat trick for making smaller bloom filters: >>> https://github.com/apache/arrow-rs/pull/9628 >>> >>> The idea is that you choose a largeish initial bloom filter size and once >>> you're done populating it you compress it by folding it onto itself if it >>> is sparse. >>> >>> Does anyone know if this trick is used in any other Parquet >> implementation? >>> As far as I can tell it is compatible with the spec and should cause no >>> issues, but I haven't heard of anyone doing this before. >>> >>
