[GitHub] [incubator-doris] e0c9 opened a new issue #6419: Support exact percentile aggregate function

GitBox Tue, 10 Aug 2021 05:36:21 -0700


e0c9 opened a new issue #6419:
URL: https://github.com/apache/incubator-doris/issues/6419



   **Is your feature request related to a problem? Please describe.**
   Doris currently supports approximate percentage calculations, but there are 
some business scenarios that require accurate percentage calculation. Hive, 
Spark and Alicloud MaxCompute all support exact percentile aggregate.
   https://spark.apache.org/docs/latest/sql-ref-functions-builtin.html
   https://help.aliyun.com/document_detail/48975.html#title-x4d-jao-van
   
   **Describe the solution you'd like**
   refer to: 
https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/udf/UDAFPercentile.java
   1. calculate the cumulative number of occurrences of each value. `<Value, 
count>`
   > 19,2,1,1,7,5,7,9,9,1 => <1,3> <2,1> <5,1> <7,2> <9,2> <19,1>
   2. sort by value and calculate cumulative rank
   > <1,3> <2,4> <5,5> <7,7> <9,9> <19,10>
   3. Linear exploration to calculate the exact percentile (linear 
interpolation calculation if necessary)
   > percentile(value, 0.25)  = (3-2.25)*1 + (2.25 - 2)*2 = 1.25
   ```python
   import numpy as np
   a = np.array([1,1,1,2,5,7,7,9,9,19])
   print(np.percentile(a, 25))
   1.25
   ```
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features 
you've considered.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

[GitHub] [incubator-doris] e0c9 opened a new issue #6419: Support exact percentile aggregate function

Reply via email to