GitHub user gfphoenix78 added a comment to the discussion: PAX Storage: 
Questions for PAX developers

Issue 1

> 1.  Is there a bloom filter metadata structure that grows non-linearly with:
> Number of bloom filters?

Yes, the storage of bloom filter grows linearly with the number of columns that 
uses bloom filter.

> Data type complexity (TEXT vs VARCHAR vs UUID)?

No, bloom filter stores no differently for all types, there are the same.

> Cardinality ranges?

No, as the memory size is determined, the size of bits is not changed. They are 
not dynamically changed.

> Does Z-order clustering store bloom filter data differently than no-cluster 
> variant?

No, it only depends how many bloom filter meta structure stores.
FYI, bloom filter is not compressed now.


> 2. Why does clustering amplify bloom filter overhead?
> Is metadata duplicated during clustering?

No.

> Are TEXT/UUID bloom filters implemented differently from VARCHAR?

No, the input for bloom filters is treated as byte stream.

> 3. Why do they cause 45-58% overhead vs 0.2% for VARCHAR?
> Can TEXT bloom filter implementation be optimized?

Not yet now.

> 4. What is the recommended bloom filter limit?
> Is there a hard limit before bloat becomes unacceptable?

No hard limit now. We'll add this issue in our action items.

> Should PAX enforce a limit (e.g., max 3 bloom filters)?

Not planned right now.

GitHub link: 
https://github.com/apache/cloudberry/discussions/1421#discussioncomment-14827031

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to