GitHub user gfphoenix78 added a comment to the discussion: PAX Storage: Questions for PAX developers
Issue 1 > 1. Is there a bloom filter metadata structure that grows non-linearly with: > Number of bloom filters? Yes, the storage of bloom filter grows linearly with the number of columns that uses bloom filter. > Data type complexity (TEXT vs VARCHAR vs UUID)? No, bloom filter stores no differently for all types, there are the same. > Cardinality ranges? No, as the memory size is determined, the size of bits is not changed. They are not dynamically changed. > Does Z-order clustering store bloom filter data differently than no-cluster > variant? No, it only depends how many bloom filter meta structure stores. FYI, bloom filter is not compressed now. > 2. Why does clustering amplify bloom filter overhead? > Is metadata duplicated during clustering? No. > Are TEXT/UUID bloom filters implemented differently from VARCHAR? No, the input for bloom filters is treated as byte stream. > 3. Why do they cause 45-58% overhead vs 0.2% for VARCHAR? > Can TEXT bloom filter implementation be optimized? Not yet now. > 4. What is the recommended bloom filter limit? > Is there a hard limit before bloat becomes unacceptable? No hard limit now. We'll add this issue in our action items. > Should PAX enforce a limit (e.g., max 3 bloom filters)? Not planned right now. GitHub link: https://github.com/apache/cloudberry/discussions/1421#discussioncomment-14827031 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
