cbalci opened a new pull request, #11098: URL: https://github.com/apache/pinot/pull/11098
Introducing two approximate aggregation function `FrequentStringsSketch` and `FrequentLongsSketch` for estimating the frequencies of items a dataset in a memory efficient way. Functions are based on the [Apache Datasketches](https://datasketches.apache.org/docs/Frequency/FrequencySketchesOverview.html) library. **Signature:** ``` FREQUENTLONGSSKETCH(col, maxMapSize=256) -> Base64 encoded sketch object FREQUENTSTRINGSSKETCH(col, maxMapSize=256) -> Base64 encoded sketch object ``` **Example usage:** ``` select FREQUENTSTRINGSSKETCH(AirlineID, 16) from airlineStats ``` | frequentstringssketch(AirlineID) | | ----------- | | BAEKAwMAAAADAAAAAA... | Which can be used, for example in Java as: ```java byte[] byteArr = Base64.getDecoder().decode(encodedSketch); ItemsSketch<String> sketch = ItemsSketch.getInstance(Memory.wrap(byteArr), new ArrayOfStringsSerDe()); ItemsSketch.Row[] items = sketch.getFrequentItems(ErrorType.NO_FALSE_NEGATIVES); for (int i = 0; i < items.length; i++) { ItemsSketch.Row item = items[i]; System.out.printf("Airline: %s, Frequency: %d %n", item.getItem(), item.getEstimate()); } ``` **Testing:** Basic aggregation and group by query tests are included in the PR. **Design:** This is a part of a larger effort to improve Datasketches support for Pinot as discussed in [this document](https://docs.google.com/document/d/1ctmKVRi67lpO6x1RYKDvDYf05EZx2Vbs2OnUudYP-bU/edit#heading=h.nctch2wugvub). Feel free to add design related comments on the document as well. `feature` `performance` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org