cbalci opened a new pull request, #11098:
URL: https://github.com/apache/pinot/pull/11098

   Introducing two approximate aggregation function `FrequentStringsSketch` and 
`FrequentLongsSketch` for estimating the frequencies of items a dataset in a 
memory efficient way. Functions are based on the [Apache 
Datasketches](https://datasketches.apache.org/docs/Frequency/FrequencySketchesOverview.html)
 library.
   
   **Signature:**
   ```
   FREQUENTLONGSSKETCH(col, maxMapSize=256) -> Base64 encoded sketch object
   FREQUENTSTRINGSSKETCH(col, maxMapSize=256) -> Base64 encoded sketch object
   ```
   
   **Example usage:**
   ```
   select FREQUENTSTRINGSSKETCH(AirlineID, 16) from airlineStats
   ```
   
   | frequentstringssketch(AirlineID)      |
   | ----------- |
   | BAEKAwMAAAADAAAAAA...      |
   
   Which can be used, for example in Java as:
   ```java
   byte[] byteArr = Base64.getDecoder().decode(encodedSketch);
   ItemsSketch<String> sketch = ItemsSketch.getInstance(Memory.wrap(byteArr), 
new ArrayOfStringsSerDe());
   
   ItemsSketch.Row[] items = 
sketch.getFrequentItems(ErrorType.NO_FALSE_NEGATIVES);
   for (int i = 0; i < items.length; i++) {
     ItemsSketch.Row item = items[i];
     System.out.printf("Airline: %s, Frequency: %d %n", item.getItem(), 
item.getEstimate());
   }
   ```
   
   **Testing:**
   Basic aggregation and group by query tests are included in the PR.
   
   **Design:**
   This is a part of a larger effort to improve Datasketches support for Pinot 
as discussed in [this 
document](https://docs.google.com/document/d/1ctmKVRi67lpO6x1RYKDvDYf05EZx2Vbs2OnUudYP-bU/edit#heading=h.nctch2wugvub).
 Feel free to add design related comments on the document as well.
   
   
   `feature` `performance`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to