[jira] [Commented] (STATISTICS-94) Support median and quantile for long datatype

Alex Herbert (Jira) Fri, 03 Apr 2026 11:33:09 -0700


    [ 
https://issues.apache.org/jira/browse/STATISTICS-94?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18070889#comment-18070889
 ]


Alex Herbert commented on STATISTICS-94:
----------------------------------------

I have created an implementation for the long[] version of median and quantile.
h2. API

The API is:
{code:java}
Median
    public StatisticResult evaluate(long[] values)
    public StatisticResult evaluateRange(long[] values, int from, int to)

Quantile
    public StatisticResult evaluate(long[] values, double p)
    public StatisticResult evaluateRange(long[] values, int from, int to, 
double p)
    public StatisticResult[] evaluate(long[] values, double... p)
    public StatisticResult[] evaluateRange(long[] values, int from, int to, 
double... p)
    // pre-sorted data
    public StatisticResult evaluateAsLong(int n, IntToLongFunction values, 
double p)
    public StatisticResult[] evaluateAsLong(int n, IntToLongFunction values, 
double... p)
{code}
Note that the method for a sorted array uses a different name from the double 
version:
{code:java}
    public double evaluate(int n, IntToDoubleFunction values, double p)
    public double evaluate(int n, IntToDoubleFunction values, double... p)
{code}
The sorted methods are called with a function that converts an index into a 
double/long value. This would typically be a lambda function: i -> 
sortedArray[i]. If the long variant uses the same name then all source code 
using a lambda for the existing method will be source incompatible due to an 
ambiguous method for the lambda. This would have to be fixed by specifying the 
type of the lambda, for example:
{code:java}
short[] data = ...
Arrays.sort(data);
double x evaluate(data.length, (IntToDoubleFunction) i -> data, 0.25);
{code}
I used the AsLong naming convention found in the JDKs naming for functions 
using primitive types. The double variant could be renamed to evaluateAsDouble 
but I have not done this to avoid API clutter as the existing method will have 
to be maintained for binary compatibility anyway.
h2. Implementation

Given the selection of sorted indices is in Commons Numbers the Statistics 
component is mainly concerned with the interpolation.
h3. Median

The median of two long values is (a + b) / 2.

When the result is an integer there is a recipe for this in the [Hackers 
Delight|https://en.wikipedia.org/wiki/Hacker%27s_Delight] for computing the 
ceiling, floor or rounded-down-to-zero mean. To fit with the behaviour of 
StatisticResult for double values this requires using the ceiling 
implementation for ties to infinity.

The double value result can be obtained by multiplying the sum by 0.5. This 
works until (a + b) overflows. Overflow can be detected and the mean composed 
using the high and low 32-bits of the sum divided by 2, then added together. 
The high 32 bits must have the sign bit appropriately restored after the 
overflow.

Both implementations are fast and can be computed using standard long and 
double arithmetic.
h3. Quantile

This requires the interpolation: a + t * (b - a) with a <= b as longs and t a 
double. The positive delta (b - a) has 64-bits and so the product with a 53-bit 
double is potentially 117-bits. This is added to a signed 64-bit value.

The interpolation can be performed with double arithmetic when the delta is 
less than 2^53 with a few ULP error on the double value. However rounding to a 
long is frequently incorrect. Using random 52-bit long values and a random 
interpolant t the rounding error frequency is approximately 9%.

The simple solution to the extended bit range of the intermediate is to use 
BigDecimal to support exact computation in extended precision. Benchmarks show 
this is a significant overhead of a quantile computation even if only one of 
the long or double value results is computed.

I used the double-double (DD) class from Commons Numbers to support extended 
precision computation. This is not always exact but is much faster than using 
BigDecimal. When the delta is less than 2^53 the multiplication by the 
interpolant t is exact to 106-bits. In this case the DD computation can support 
exact rounding to the long result. Test cases do not show any error in the 
double result but there is a potential for error given the addition of a DD 
representing t * (b - a) to another DD representing a is not exact.

When the delta is more than 2^53 the multiplication cannot represent the 
117-bit result. However the bits in 107-117 do not effect rounding of ties at 
0.5. The fractional part of the ties result can be represented as a DD value 
equal to (0.5, 0). This is a 1-bit followed by 105 zeros. It is not possible 
with a multiplication of a 64-bit and 53-bit number to have 105 zeros and then 
a non-zero bit in bits 107-117. The maximum gap between 1 bits is 62 zeros.
h2. Benchmarking

I created various interpolation implementations for benchmarking:
||Method||(b - a) < 2^53||(b - a) >= 2^53||
|BigDecimal|a + t * (b - a) |Same|
|DD|a + t * (b - a)|Same|
|DD2|t * (b - a). Split to integer and fraction parts.
Add integer part to a and round using fraction part. |Same.|
|DD3|t * (b - a). Split to integer and fraction parts.
Add integer part to a and round using fraction part f.
double value is integer parts plus f.hi (53-bit).|Splits (b - a) into two 
parts. Computes each multiplied by t exactly. The integer parts are extracted 
and the remaining fraction parts combined to a double-double (106-bits) for use 
in rounding with loss of bits 107-117.|
|DD4|t * (b - a). Split to integer and fraction parts.
Add integer part to a and round using fraction part f.
double value is integer parts plus f (106-bit).|Splits (b - a) into two parts. 
Computes each multiplied by t exactly. The integer parts are extracted and the 
remaining fraction parts combined to an exact triple-double (159-bits) for use 
in rounding.|
|DD5|t * (b - a) as a double.|As DD3|
|Partial|Same as DD3|Same as BigDecimal but the result is stored as a 
BigDecimal and not yet rounded.|
|double|Same as Partial|Same as Partial then converted to a double result|
|long|Same as Partial|Same as Partial then converted to a long result|
h3. Notes

The DD implementation computes a + t * (b - a). The integer part of this is 
subtracted from the result to obtain the fraction part to use in rounding. This 
is inexact.

The DD2 implementation splits the computation to the result t * (b - a) and 
attempts to add the integer parts to a while maintaining the remaining fraction 
part exactly. The result may be inexact if the delta (b - a) is above 2^53.

The DD3 implementation improves on DD2 by handling the case of < 106-bits 
exactly. The other case may be inexact but by splitting the multiplication t * 
(b - a) to two multiplications the errors are reduced.

The DD4 implementation should be exact for all cases of rounding to a long 
value. It may be 1 ULP inexact in the double value. This is the limit of DD 
computation without a special routine to sum double values with a carry bit for 
exact rounding when adding values of increasing magnitude.

The DD5 implementation uses double arithmetic for reference. It is less precise 
than other methods.

The partial implementation does some of the computation in BigDecimal. The 
result could be used in lazy evaluation of the long or double result. The 
benchmark omits the time of this evaluation.

The long or double implementations include the lazy evaluation cost of the 
partial implementation.
h3. Results

The benchmark creates data to sample the range of possible interpolations. It 
creates a positive delta (b - a) with a random long of a given bit depth. This 
interval is then placed in a position in the full range of a long. Thus a and b 
can be negative or positive. The data here is generated per invocation of the 
interpolate method. A method that does no interpolation computation has been 
used to subtract the JMH overhead for the timings.

Result on OpenJDK 25.0.2 using an Mac M2 Pro (results in ns per operation):

!evaluate.png|width=800,height=600!

Selected data
||Bit Depth||BigDecimal||long||double||Partial||DD||DD4||DD2||DD3||DD5||
|45|434.3|13.6|13.2|13.0|34.4|16.9|20.5|14.3|1.3|
|54|438.3|162.5|168.9|96.6|33.8|23.8|29.9|20.9|17.1|
|63|421.2|322.1|322.1|179.2|33.1|33.3|18.4|25.8|25.6|
h3.  Notes

The BigDecimal method is slow.

The double method (DD5) is fast for small delta.

There is a shift in performance for the methods that switch the implementation 
when (b - a) > 2^53.

Computing the partial result as a BigDecimal shows that the exact rounding to 
either a long or a double is a significant part of the BigDecimal computation. 
Given that the consumer of the result will likely require either a double or a 
long but not both this method will allow a performance gain over fully 
evaluating the result but at the memory cost of storing the BigDecimal 
intermediate.

The simple DD implementation has constant performance across all bit depths. It 
is the least precise and slowest DD implementation.

The DD2 implementation is faster than the DD method. It has a noticeable 
slowdown when the bit depth is just above 2^53. This is point where the method 
has its only branch statement. Thus when the branch is predictable it runs 
fastest. When the branch is closer to a random 50:50 decision the code is 
slowest.

The best possible DD implementation DD4 is the same speed as the simple DD 
implementation when delta > 2^53. However this implementation uses code that is 
not in the public API of the DD class. I extracted parts of the extended 
precision routines in the package-private DD codebase to create this method.

The DD3 implementation has exact rounding of the long result. It should be with 
1 ULP for the double result. This is faster than DD4 and uses the public API of 
the DD class.
h2. Performance of Median/Quantile

I implemented the new quantile method using the DD3 method for delta above 2^53 
and DD4 for delta below 2^53. This will always generate the exact long value. I 
added a long[] benchmark to the existing median and quantile benchmarks. These 
benchmarks use the Bentley and McIlroy (BM) data by default. This is a suite of 
data that is very hard for single pivot sorting or selection routines. The 
Commons Numbers selection routine uses a composite method with a dual-pivot 
selection that falls back to heap select or an optimised median-of-medians for 
difficult data. It handles the BM dataset without issue. For length 1000 and 
100000 the BM data has 624 and 1044 arrays. I also used 20 arrays filled with 
random data.

The methods are shown for a full sort (JDK) or using the Commons Statistics 
implementation.

Results are shown for OpenJDK 25.0.2 using an Mac M2 Pro (results in ns per 
array).
h3. Median
| | |JDK| | |Statistics| | |
|Distribution|Length|double|int|long|double|int|long|
|bm|1000|12594|10058|10193|3928|2894|3185|
|bm|100000|1404608|1145256|1166235|332187|239822|277255|
|random|1000|28869|24050|24152|3424|1990|2970|
|random|100000|5441266|4490290|4549560|682022|482376|504323|
h3. Quantile

Computing various quantiles:
| | | |JDK| | |Statistics| | |
|Data|Length|Quantiles|double|int|long|double|int|long|
|bm|1000|0.001:0.005:0.01:0.02:0.05:0.1:0.5|12660|10043|10273|5839|4581|4753|
|bm|100000|0.001:0.005:0.01:0.02:0.05:0.1:0.5|1415699|1138653|1161232|431495|319206|338749|
|bm|1000|0.01:0.05:0.1:0.5:0.9:0.95:0.99|12615|10060|10153|7185|5576|5871|
|bm|100000|0.01:0.05:0.1:0.5:0.9:0.95:0.99|1421606|1141469|1167143|537938|398421|425427|
|bm|1000|0.01:0.99|12722|10282|10225|3361|2505|2644|
|bm|100000|0.01:0.99|1424965|1139162|1162230|318431|222719|246482|
|bm|1000|0.25:0.5:0.75|12905|10026|10244|6379|4948|5292|
|bm|100000|0.25:0.5:0.75|1410152|1140742|1146925|483497|363503|393353|
|bm|1000|0.25:0.75|12702|10033|10408|5338|4173|4376|
|bm|100000|0.25:0.75|1407330|1147472|1156427|412597|303660|331209|
|bm|1000|1e-100:1.0|12643|10055|10279|2678|1904|2046|
|bm|100000|1e-100:1.0|1413822|1157652|1160009|241537|163857|174586|
|random|1000|0.001:0.005:0.01:0.02:0.05:0.1:0.5|28656|24262|24107|8258|6565|6719|
|random|100000|0.001:0.005:0.01:0.02:0.05:0.1:0.5|5384698|4479761|4562645|1161895|889640|902307|
|random|1000|0.01:0.05:0.1:0.5:0.9:0.95:0.99|28442|23940|23996|11884|9313|9579|
|random|100000|0.01:0.05:0.1:0.5:0.9:0.95:0.99|5403978|4519200|4532510|1451195|1141546|1130822|
|random|1000|0.01:0.99|28396|24301|23939|3815|3446|3250|
|random|100000|0.01:0.99|5415234|4509842|4519895|706670|530774|558530|
|random|1000|0.25:0.5:0.75|28537|23919|24088|9890|7249|7953|
|random|100000|0.25:0.5:0.75|5379536|4545991|4529099|1275707|980957|983066|
|random|1000|0.25:0.75|28596|24024|24111|7449|5691|6290|
|random|100000|0.25:0.75|5399353|4521529|4526954|1062445|810876|809598|
|random|1000|1e-100:1.0|28714|24038|24142|2621|2438|2084|
|random|100000|1e-100:1.0|5406622|4519134|4555534|595941|433261|428265|

Computing 100 uniform quantiles:
| |JDK|JDK| | |Statistics| | |
|Data|Length|doubleQuantileRange|intQuantileRange|longQuantileRange|doubleQuantileRange|intQuantileRange|longQuantileRange|
|bm|1000|12615.50401|10179.66066|10406.46085|15197.27513|12488.50133|13250.94753|
|bm|100000|1394867.122|1133394.117|1160270.131|1072559.682|859074.7205|905359.89|
|random|1000|29497.65023|26576.47445|24184.19089|31921.38|26556.05462|27779.45443|
|random|100000|5415234.661|4547732.872|4538116.304|3211675.887|2492517.823|2564023.876|
h3. Notes

Do not fully sort the data unless required. This is slower than a partial sort 
for most data. Only data with extended runs of ascending or descending data are 
sorted faster than the Commons Numbers selection routine as it does not detect 
and then merge sort sequence runs as is done by the JDK sort method.

The long implementation is similar in speed to the int implementation. 
Conversion of a long array to a double array would be slower, add an overhead 
to the user, and generate less accurate results if the long values exceed 
53-bits of precision.

Given the input data would be integers of a 32-bit range and the long quantile 
uses the DD4 implementation and the estimated interpolation time for each of 
the 100 quantiles would be 16.9ns for a total of 1690ns. This is a small 
fraction of the run time for length 1000 and negligible for length 100000. 
Switching to a faster and less precise interpolation method would not provide a 
noticeable performance gain.

 

 

> Support median and quantile for long datatype
> ---------------------------------------------
>
>                 Key: STATISTICS-94
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-94
>             Project: Commons Statistics
>          Issue Type: Improvement
>          Components: descriptive
>            Reporter: Alex Herbert
>            Assignee: Alex Herbert
>            Priority: Minor
>         Attachments: evaluate.png
>
>
> Add support to the {{Median}} and {{Quantile}} class for the {{long}} 
> datatype. Current support is for {{double[]}} and {{{}int[]{}}}. Via 
> primitive conversion these support float, short, char and byte if an array is 
> appropriately converted. However {{long}} cannot be supported by primitive 
> conversion.
> The current median and quantile result is returned as a {{{}double{}}}. This 
> is not appropriate for the result on a long array as the closest 
> representable double may be outside the range of the input data if the data 
> have more than 53-bits in their integer representation. However if the 
> integer value is representable as less than 53-bits then the double will 
> provide a fractional part that cannot be returned if the result was a long. I 
> suggest using the {{StatisticResult}} for the return value allowing both 
> double and long representations:
> {code:java}
> StatisticResult r = Median.of(new long[] {Long.MIN_VALUE, Long.MAX_VALUE});
> r.getAsDouble();  // -0.5
> r.getAsLong();    // 0{code}
> h2. Rounding
> The current behaviour for {{StatisticResult}} is to return the integer 
> representations using the closest representation of the floating-point value 
> with ties towards positive infinity. So in the example above the median as a 
> long is 0.
> This behaviour could be changed with a rounding mode parameter. Unfortunately 
> the current rounding behaviour is not represented by any mode in 
> [java.math.RoundingMode (Javadoc JDK 
> 21)|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/math/RoundingMode.html]
>  so this enum cannot be reused to set the mode. The rounding behaviour 
> matches [java.lang.Math.round(double) (Javadoc JDK 
> 21)|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Math.html#round(double)].
>  The closest match is RoundingMode.HALF_UP when above zero and 
> RoundingMode.HALF_DOWN when below zero. If rounding to nearest is only 
> concerned with the 0.5 boundary then support for rounding could add modes for 
> rounding ties towards: negative infinity; positive infinity: or zero 
> (floating-point truncation).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (STATISTICS-94) Support median and quantile for long datatype

Reply via email to