[
https://issues.apache.org/jira/browse/STATISTICS-94?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18070889#comment-18070889
]
Alex Herbert commented on STATISTICS-94:
----------------------------------------
I have created an implementation for the long[] version of median and quantile.
h2. API
The API is:
{code:java}
Median
public StatisticResult evaluate(long[] values)
public StatisticResult evaluateRange(long[] values, int from, int to)
Quantile
public StatisticResult evaluate(long[] values, double p)
public StatisticResult evaluateRange(long[] values, int from, int to,
double p)
public StatisticResult[] evaluate(long[] values, double... p)
public StatisticResult[] evaluateRange(long[] values, int from, int to,
double... p)
// pre-sorted data
public StatisticResult evaluateAsLong(int n, IntToLongFunction values,
double p)
public StatisticResult[] evaluateAsLong(int n, IntToLongFunction values,
double... p)
{code}
Note that the method for a sorted array uses a different name from the double
version:
{code:java}
public double evaluate(int n, IntToDoubleFunction values, double p)
public double evaluate(int n, IntToDoubleFunction values, double... p)
{code}
The sorted methods are called with a function that converts an index into a
double/long value. This would typically be a lambda function: i ->
sortedArray[i]. If the long variant uses the same name then all source code
using a lambda for the existing method will be source incompatible due to an
ambiguous method for the lambda. This would have to be fixed by specifying the
type of the lambda, for example:
{code:java}
short[] data = ...
Arrays.sort(data);
double x evaluate(data.length, (IntToDoubleFunction) i -> data, 0.25);
{code}
I used the AsLong naming convention found in the JDKs naming for functions
using primitive types. The double variant could be renamed to evaluateAsDouble
but I have not done this to avoid API clutter as the existing method will have
to be maintained for binary compatibility anyway.
h2. Implementation
Given the selection of sorted indices is in Commons Numbers the Statistics
component is mainly concerned with the interpolation.
h3. Median
The median of two long values is (a + b) / 2.
When the result is an integer there is a recipe for this in the [Hackers
Delight|https://en.wikipedia.org/wiki/Hacker%27s_Delight] for computing the
ceiling, floor or rounded-down-to-zero mean. To fit with the behaviour of
StatisticResult for double values this requires using the ceiling
implementation for ties to infinity.
The double value result can be obtained by multiplying the sum by 0.5. This
works until (a + b) overflows. Overflow can be detected and the mean composed
using the high and low 32-bits of the sum divided by 2, then added together.
The high 32 bits must have the sign bit appropriately restored after the
overflow.
Both implementations are fast and can be computed using standard long and
double arithmetic.
h3. Quantile
This requires the interpolation: a + t * (b - a) with a <= b as longs and t a
double. The positive delta (b - a) has 64-bits and so the product with a 53-bit
double is potentially 117-bits. This is added to a signed 64-bit value.
The interpolation can be performed with double arithmetic when the delta is
less than 2^53 with a few ULP error on the double value. However rounding to a
long is frequently incorrect. Using random 52-bit long values and a random
interpolant t the rounding error frequency is approximately 9%.
The simple solution to the extended bit range of the intermediate is to use
BigDecimal to support exact computation in extended precision. Benchmarks show
this is a significant overhead of a quantile computation even if only one of
the long or double value results is computed.
I used the double-double (DD) class from Commons Numbers to support extended
precision computation. This is not always exact but is much faster than using
BigDecimal. When the delta is less than 2^53 the multiplication by the
interpolant t is exact to 106-bits. In this case the DD computation can support
exact rounding to the long result. Test cases do not show any error in the
double result but there is a potential for error given the addition of a DD
representing t * (b - a) to another DD representing a is not exact.
When the delta is more than 2^53 the multiplication cannot represent the
117-bit result. However the bits in 107-117 do not effect rounding of ties at
0.5. The fractional part of the ties result can be represented as a DD value
equal to (0.5, 0). This is a 1-bit followed by 105 zeros. It is not possible
with a multiplication of a 64-bit and 53-bit number to have 105 zeros and then
a non-zero bit in bits 107-117. The maximum gap between 1 bits is 62 zeros.
h2. Benchmarking
I created various interpolation implementations for benchmarking:
||Method||(b - a) < 2^53||(b - a) >= 2^53||
|BigDecimal|a + t * (b - a) |Same|
|DD|a + t * (b - a)|Same|
|DD2|t * (b - a). Split to integer and fraction parts.
Add integer part to a and round using fraction part. |Same.|
|DD3|t * (b - a). Split to integer and fraction parts.
Add integer part to a and round using fraction part f.
double value is integer parts plus f.hi (53-bit).|Splits (b - a) into two
parts. Computes each multiplied by t exactly. The integer parts are extracted
and the remaining fraction parts combined to a double-double (106-bits) for use
in rounding with loss of bits 107-117.|
|DD4|t * (b - a). Split to integer and fraction parts.
Add integer part to a and round using fraction part f.
double value is integer parts plus f (106-bit).|Splits (b - a) into two parts.
Computes each multiplied by t exactly. The integer parts are extracted and the
remaining fraction parts combined to an exact triple-double (159-bits) for use
in rounding.|
|DD5|t * (b - a) as a double.|As DD3|
|Partial|Same as DD3|Same as BigDecimal but the result is stored as a
BigDecimal and not yet rounded.|
|double|Same as Partial|Same as Partial then converted to a double result|
|long|Same as Partial|Same as Partial then converted to a long result|
h3. Notes
The DD implementation computes a + t * (b - a). The integer part of this is
subtracted from the result to obtain the fraction part to use in rounding. This
is inexact.
The DD2 implementation splits the computation to the result t * (b - a) and
attempts to add the integer parts to a while maintaining the remaining fraction
part exactly. The result may be inexact if the delta (b - a) is above 2^53.
The DD3 implementation improves on DD2 by handling the case of < 106-bits
exactly. The other case may be inexact but by splitting the multiplication t *
(b - a) to two multiplications the errors are reduced.
The DD4 implementation should be exact for all cases of rounding to a long
value. It may be 1 ULP inexact in the double value. This is the limit of DD
computation without a special routine to sum double values with a carry bit for
exact rounding when adding values of increasing magnitude.
The DD5 implementation uses double arithmetic for reference. It is less precise
than other methods.
The partial implementation does some of the computation in BigDecimal. The
result could be used in lazy evaluation of the long or double result. The
benchmark omits the time of this evaluation.
The long or double implementations include the lazy evaluation cost of the
partial implementation.
h3. Results
The benchmark creates data to sample the range of possible interpolations. It
creates a positive delta (b - a) with a random long of a given bit depth. This
interval is then placed in a position in the full range of a long. Thus a and b
can be negative or positive. The data here is generated per invocation of the
interpolate method. A method that does no interpolation computation has been
used to subtract the JMH overhead for the timings.
Result on OpenJDK 25.0.2 using an Mac M2 Pro (results in ns per operation):
!evaluate.png|width=800,height=600!
Selected data
||Bit Depth||BigDecimal||long||double||Partial||DD||DD4||DD2||DD3||DD5||
|45|434.3|13.6|13.2|13.0|34.4|16.9|20.5|14.3|1.3|
|54|438.3|162.5|168.9|96.6|33.8|23.8|29.9|20.9|17.1|
|63|421.2|322.1|322.1|179.2|33.1|33.3|18.4|25.8|25.6|
h3. Notes
The BigDecimal method is slow.
The double method (DD5) is fast for small delta.
There is a shift in performance for the methods that switch the implementation
when (b - a) > 2^53.
Computing the partial result as a BigDecimal shows that the exact rounding to
either a long or a double is a significant part of the BigDecimal computation.
Given that the consumer of the result will likely require either a double or a
long but not both this method will allow a performance gain over fully
evaluating the result but at the memory cost of storing the BigDecimal
intermediate.
The simple DD implementation has constant performance across all bit depths. It
is the least precise and slowest DD implementation.
The DD2 implementation is faster than the DD method. It has a noticeable
slowdown when the bit depth is just above 2^53. This is point where the method
has its only branch statement. Thus when the branch is predictable it runs
fastest. When the branch is closer to a random 50:50 decision the code is
slowest.
The best possible DD implementation DD4 is the same speed as the simple DD
implementation when delta > 2^53. However this implementation uses code that is
not in the public API of the DD class. I extracted parts of the extended
precision routines in the package-private DD codebase to create this method.
The DD3 implementation has exact rounding of the long result. It should be with
1 ULP for the double result. This is faster than DD4 and uses the public API of
the DD class.
h2. Performance of Median/Quantile
I implemented the new quantile method using the DD3 method for delta above 2^53
and DD4 for delta below 2^53. This will always generate the exact long value. I
added a long[] benchmark to the existing median and quantile benchmarks. These
benchmarks use the Bentley and McIlroy (BM) data by default. This is a suite of
data that is very hard for single pivot sorting or selection routines. The
Commons Numbers selection routine uses a composite method with a dual-pivot
selection that falls back to heap select or an optimised median-of-medians for
difficult data. It handles the BM dataset without issue. For length 1000 and
100000 the BM data has 624 and 1044 arrays. I also used 20 arrays filled with
random data.
The methods are shown for a full sort (JDK) or using the Commons Statistics
implementation.
Results are shown for OpenJDK 25.0.2 using an Mac M2 Pro (results in ns per
array).
h3. Median
| | |JDK| | |Statistics| | |
|Distribution|Length|double|int|long|double|int|long|
|bm|1000|12594|10058|10193|3928|2894|3185|
|bm|100000|1404608|1145256|1166235|332187|239822|277255|
|random|1000|28869|24050|24152|3424|1990|2970|
|random|100000|5441266|4490290|4549560|682022|482376|504323|
h3. Quantile
Computing various quantiles:
| | | |JDK| | |Statistics| | |
|Data|Length|Quantiles|double|int|long|double|int|long|
|bm|1000|0.001:0.005:0.01:0.02:0.05:0.1:0.5|12660|10043|10273|5839|4581|4753|
|bm|100000|0.001:0.005:0.01:0.02:0.05:0.1:0.5|1415699|1138653|1161232|431495|319206|338749|
|bm|1000|0.01:0.05:0.1:0.5:0.9:0.95:0.99|12615|10060|10153|7185|5576|5871|
|bm|100000|0.01:0.05:0.1:0.5:0.9:0.95:0.99|1421606|1141469|1167143|537938|398421|425427|
|bm|1000|0.01:0.99|12722|10282|10225|3361|2505|2644|
|bm|100000|0.01:0.99|1424965|1139162|1162230|318431|222719|246482|
|bm|1000|0.25:0.5:0.75|12905|10026|10244|6379|4948|5292|
|bm|100000|0.25:0.5:0.75|1410152|1140742|1146925|483497|363503|393353|
|bm|1000|0.25:0.75|12702|10033|10408|5338|4173|4376|
|bm|100000|0.25:0.75|1407330|1147472|1156427|412597|303660|331209|
|bm|1000|1e-100:1.0|12643|10055|10279|2678|1904|2046|
|bm|100000|1e-100:1.0|1413822|1157652|1160009|241537|163857|174586|
|random|1000|0.001:0.005:0.01:0.02:0.05:0.1:0.5|28656|24262|24107|8258|6565|6719|
|random|100000|0.001:0.005:0.01:0.02:0.05:0.1:0.5|5384698|4479761|4562645|1161895|889640|902307|
|random|1000|0.01:0.05:0.1:0.5:0.9:0.95:0.99|28442|23940|23996|11884|9313|9579|
|random|100000|0.01:0.05:0.1:0.5:0.9:0.95:0.99|5403978|4519200|4532510|1451195|1141546|1130822|
|random|1000|0.01:0.99|28396|24301|23939|3815|3446|3250|
|random|100000|0.01:0.99|5415234|4509842|4519895|706670|530774|558530|
|random|1000|0.25:0.5:0.75|28537|23919|24088|9890|7249|7953|
|random|100000|0.25:0.5:0.75|5379536|4545991|4529099|1275707|980957|983066|
|random|1000|0.25:0.75|28596|24024|24111|7449|5691|6290|
|random|100000|0.25:0.75|5399353|4521529|4526954|1062445|810876|809598|
|random|1000|1e-100:1.0|28714|24038|24142|2621|2438|2084|
|random|100000|1e-100:1.0|5406622|4519134|4555534|595941|433261|428265|
Computing 100 uniform quantiles:
| |JDK|JDK| | |Statistics| | |
|Data|Length|doubleQuantileRange|intQuantileRange|longQuantileRange|doubleQuantileRange|intQuantileRange|longQuantileRange|
|bm|1000|12615.50401|10179.66066|10406.46085|15197.27513|12488.50133|13250.94753|
|bm|100000|1394867.122|1133394.117|1160270.131|1072559.682|859074.7205|905359.89|
|random|1000|29497.65023|26576.47445|24184.19089|31921.38|26556.05462|27779.45443|
|random|100000|5415234.661|4547732.872|4538116.304|3211675.887|2492517.823|2564023.876|
h3. Notes
Do not fully sort the data unless required. This is slower than a partial sort
for most data. Only data with extended runs of ascending or descending data are
sorted faster than the Commons Numbers selection routine as it does not detect
and then merge sort sequence runs as is done by the JDK sort method.
The long implementation is similar in speed to the int implementation.
Conversion of a long array to a double array would be slower, add an overhead
to the user, and generate less accurate results if the long values exceed
53-bits of precision.
Given the input data would be integers of a 32-bit range and the long quantile
uses the DD4 implementation and the estimated interpolation time for each of
the 100 quantiles would be 16.9ns for a total of 1690ns. This is a small
fraction of the run time for length 1000 and negligible for length 100000.
Switching to a faster and less precise interpolation method would not provide a
noticeable performance gain.
> Support median and quantile for long datatype
> ---------------------------------------------
>
> Key: STATISTICS-94
> URL: https://issues.apache.org/jira/browse/STATISTICS-94
> Project: Commons Statistics
> Issue Type: Improvement
> Components: descriptive
> Reporter: Alex Herbert
> Assignee: Alex Herbert
> Priority: Minor
> Attachments: evaluate.png
>
>
> Add support to the {{Median}} and {{Quantile}} class for the {{long}}
> datatype. Current support is for {{double[]}} and {{{}int[]{}}}. Via
> primitive conversion these support float, short, char and byte if an array is
> appropriately converted. However {{long}} cannot be supported by primitive
> conversion.
> The current median and quantile result is returned as a {{{}double{}}}. This
> is not appropriate for the result on a long array as the closest
> representable double may be outside the range of the input data if the data
> have more than 53-bits in their integer representation. However if the
> integer value is representable as less than 53-bits then the double will
> provide a fractional part that cannot be returned if the result was a long. I
> suggest using the {{StatisticResult}} for the return value allowing both
> double and long representations:
> {code:java}
> StatisticResult r = Median.of(new long[] {Long.MIN_VALUE, Long.MAX_VALUE});
> r.getAsDouble(); // -0.5
> r.getAsLong(); // 0{code}
> h2. Rounding
> The current behaviour for {{StatisticResult}} is to return the integer
> representations using the closest representation of the floating-point value
> with ties towards positive infinity. So in the example above the median as a
> long is 0.
> This behaviour could be changed with a rounding mode parameter. Unfortunately
> the current rounding behaviour is not represented by any mode in
> [java.math.RoundingMode (Javadoc JDK
> 21)|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/math/RoundingMode.html]
> so this enum cannot be reused to set the mode. The rounding behaviour
> matches [java.lang.Math.round(double) (Javadoc JDK
> 21)|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Math.html#round(double)].
> The closest match is RoundingMode.HALF_UP when above zero and
> RoundingMode.HALF_DOWN when below zero. If rounding to nearest is only
> concerned with the 0.5 boundary then support for rounding could add modes for
> rounding ties towards: negative infinity; positive infinity: or zero
> (floating-point truncation).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)