2010YOUY01 commented on PR #21817: URL: https://github.com/apache/datafusion/pull/21817#issuecomment-4310190781
This is great! I got some questions. > 1. O(1) membership testing - Roaring bitmaps offer fast contains() lookups - Wasn't that require first a binary search on container, then probe into the container? It's not obvious to me why roaring bitmap can be faster than hash set. I think adding some comment to explain the idea in the code comment would help a lot, for example TLDRs for the idea behind roaring bitmap, and its tradeoff between hashmap. - Is there any pathological path for this index? For example certain key distributions, it would become slower or uses more memory. Additionally, it would be great to show some sql micro benchmarks to demonstrate the improvement, perhaps we can add some target workload to https://github.com/apache/datafusion/blob/main/benchmarks/src/hj.rs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
