tangyong opened a new issue #6420:
URL: https://github.com/apache/incubator-pinot/issues/6420


   The following is the discussion with Mayank on slack:
   
   Mark: Hi Team, I have seen that in 0.4.0, pinot has implemented the initial 
version of theta-sketch based distinct count aggregation function, utilizing 
the Apache DataSketches library.  Compared to Druid the latest release which 
has also included DataSketches extension(Theta sketch, Tuple sketch, Quantiles 
sketch ,HLL sketch),  pinot has any plan to implement other sketchs other than 
Theta sketch).  Thanks.
   
   Mayank: Pinot already supports HLL and TDigest based percentiles. If there's 
a specific case where you would find DataSketch based implementations more 
useful, we can definitely explore that. If so, would recommend filing an issue 
for that.
   
   Mayank: For HLL we use 
com.clearspring.analytics.stream.cardinality.HyperLogLog,And for TDigest, we 
use com.tdunning.math.stats.TDigest
   
   Mark: we maybe need to pay attention to KLL sketch vs t-digest(pinot 
impmentation) and seeing the following comparison by datasketches, 
https://datasketches.apache.org/docs/Quantiles/KllSketchVsTDigest.html
   
   Mayank: Thanks for sharing @Mark.Tang. We can definitely explore adding 
these if needed.
   
   Mark: 
appendix(https://github.com/apache/datasketches-website/blob/master/docs/pdf/DataSketches_deck.pdf):
 HLL 
   
![pinot1](https://user-images.githubusercontent.com/187414/103863413-c65e1500-50fb-11eb-9c6a-b1b9677b69a7.png)
   
   Also noting that DataSketches includes a latest CPC Sketch: Estimating 
Stream Cardinalities more efficiently than the famous HLL sketch, which is from 
https://arxiv.org/pdf/1708.06839.pdf


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to