cbalci opened a new pull request, #10643: URL: https://github.com/apache/pinot/pull/10643
Introducing a new approximate percentile calculation function `PercentileKLL` and its variations (MV & Raw), using Apache Datasketches libraries 'KLL'. This is part of a proposal to improve Apache Datasketches support in Pinot: [(Google Docs Link) [Proposal] Improved Apache DataSketches Support in Pinot](https://docs.google.com/document/d/1ctmKVRi67lpO6x1RYKDvDYf05EZx2Vbs2OnUudYP-bU/edit ) Some advantages listed and discussed in the linked document: - [Well defined](https://datasketches.apache.org/docs/KLL/KLLAccuracyAndSize.html) error bound ([comparison](https://datasketches.apache.org/docs/QuantilesStudies/KllSketchVsTDigest.html) to t-Digest) - Faster updates, serialization/deserialization - Binary compatibility with external systems, hence the ability to use Pinot as sketch store - Ability to compute ‘Rank’ and ‘Histogram’ besides ‘Percentile’ - Feature parity with Druid Please leave design related comments on the linked document and code related comments in this PR. **Testing** - Added unit tests to cover basic use cases that call `PercentileKLL`, `PercentileKLLMV`, `PercentileRawKLL`, `PercentileRawKLLMV` - Added tests to cover group by scenarios - Manually tested ingesting raw (externally generated) data sketches `feature` `performance` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org