andygrove opened a new issue, #4719:
URL: https://github.com/apache/datafusion-comet/issues/4719

   ### Describe the bug
   
   Comet's native `percentile` aggregate (PR #4542) maps to DataFusion's 
`percentile_cont`, which computes the linear interpolation weight with a 
quantization step:
   
   ```rust
   const INTERPOLATION_PRECISION: f64 = 1_000_000.0;
   let fraction = index - (lower_index as f64);
   let scaled = (fraction * INTERPOLATION_PRECISION) as usize;
   let weight = scaled as f64 / INTERPOLATION_PRECISION;
   let interpolated_f = lower_f + (upper_f - lower_f) * weight;
   ```
   
   The interpolation weight is truncated to 6 decimal places. Spark's exact 
`Percentile` interpolates with the full-precision fraction (`(position - lower) 
* higherValue + (higher - position) * lowerValue`), so a deeply-interpolated 
value can differ from Spark by up to roughly `(upper - lower) * 1e-6`.
   
   ### Affected versions
   
   Spark 3.4 / 3.5 / 4.0 / 4.1, wherever `percentile(col, p)` (or `median`, or 
`percentile_cont ... WITHIN GROUP`) maps to the native path.
   
   ### Impact
   
   Minor. The difference only appears when `p * (n - 1)` has a fractional part 
not representable in 6 decimal places, and is bounded by `(upper - lower) * 
1e-6`. The cases tested in `percentile.sql` match Spark exactly.
   
   ### Possible fix
   
   Either contribute a higher-precision (or unquantized) interpolation upstream 
to DataFusion's `percentile_cont`, or implement a Comet-specific accumulator 
that matches Spark's interpolation exactly.
   
   Surfaced by the `percentile` audit accompanying #4542.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to