e0c9 opened a new issue #6419: URL: https://github.com/apache/incubator-doris/issues/6419
**Is your feature request related to a problem? Please describe.** Doris currently supports approximate percentage calculations, but there are some business scenarios that require accurate percentage calculation. Hive, Spark and Alicloud MaxCompute all support exact percentile aggregate. https://spark.apache.org/docs/latest/sql-ref-functions-builtin.html https://help.aliyun.com/document_detail/48975.html#title-x4d-jao-van **Describe the solution you'd like** refer to: https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/udf/UDAFPercentile.java 1. calculate the cumulative number of occurrences of each value. `<Value, count>` > 19,2,1,1,7,5,7,9,9,1 => <1,3> <2,1> <5,1> <7,2> <9,2> <19,1> 2. sort by value and calculate cumulative rank > <1,3> <2,4> <5,5> <7,7> <9,9> <19,10> 3. Linear exploration to calculate the exact percentile (linear interpolation calculation if necessary) > percentile(value, 0.25) = (3-2.25)*1 + (2.25 - 2)*2 = 1.25 ```python import numpy as np a = np.array([1,1,1,2,5,7,7,9,9,19]) print(np.percentile(a, 25)) 1.25 ``` **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org