2010YOUY01 commented on PR #21817:
URL: https://github.com/apache/datafusion/pull/21817#issuecomment-4310190781

   This is great! I got some questions.
   
   > 1. O(1) membership testing - Roaring bitmaps offer fast contains() lookups
   
   - Wasn't that require first a binary search on container, then probe into 
the container? It's not obvious to me why roaring bitmap can be faster than 
hash set.
   I think adding some comment to explain the idea in the code comment would 
help a lot, for example TLDRs for the idea behind roaring bitmap, and its 
tradeoff between hashmap.
   
   - Is there any pathological path for this index? For example certain key 
distributions, it would become slower or uses more memory.
   
   Additionally, it would be great to show some sql micro benchmarks to 
demonstrate the improvement, perhaps we can add some target workload to 
https://github.com/apache/datafusion/blob/main/benchmarks/src/hj.rs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to