I think from a readers perspective there would be no indication of how the 
bloom filters were created. The folded versions are identical to having started 
with that size in the first place.

> On Mar 31, 2026, at 10:36 AM, Steve Loughran <[email protected]> wrote:
> 
> Assuming it compresses before writing, you wouldn't be able to tell when
> you read a file how it was actually created, would you?
> 
> On Tue, 31 Mar 2026 at 00:57, Micah Kornfield <[email protected]> wrote:
> 
>> Hi Adrian,
>> Very interesting idea, I don't recall seeing this used in any of the
>> reference implementations.  On the surface I agree it looks compatible but
>> I need to think a little bit more deeply about it.
>> 
>> Cheers,
>> Micah
>> 
>> On Mon, Mar 30, 2026 at 3:27 PM Adrian Garcia Badaracco <
>> [email protected]>
>> wrote:
>> 
>>> I think I've found a neat trick for making smaller bloom filters:
>>> https://github.com/apache/arrow-rs/pull/9628
>>> 
>>> The idea is that you choose a largeish initial bloom filter size and once
>>> you're done populating it you compress it by folding it onto itself if it
>>> is sparse.
>>> 
>>> Does anyone know if this trick is used in any other Parquet
>> implementation?
>>> As far as I can tell it is compatible with the spec and should cause no
>>> issues, but I haven't heard of anyone doing this before.
>>> 
>> 

Reply via email to