lakshmanan-v opened a new issue #7014: URL: https://github.com/apache/incubator-pinot/issues/7014
DISTINCTCOUNTHLL accuracy and memory footprint can be improved through latest HLL algorithms. We have a choice either replace the existing implementation with a better one or leave the existing DISTINCTCOUNTHLL to implement original HLL and create separate functions (ex: DISTINCTCOUNTHLLPLUSPLUS). [Google's HLL++](http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40671.pdf) -- a popular algorithm amongst the community offers lot of improvements over original HLL. There are multiple java implementations of HLL++. Most of them have variations in performance due to the register size and other implementation choices. Clearspring [stream-lib](https://github.com/addthis/stream-lib) used for current HyperLogLog function, implements HLL++ as [HyperLogPlus](https://github.com/addthis/stream-lib/commits/master/src/main/java/com/clearspring/analytics/stream/cardinality/HyperLogLogPlus.java). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org