Re: [PR] feat: Support Spark expression: percentile_cont [datafusion-comet]

via GitHub Thu, 02 Apr 2026 07:52:09 -0700


andygrove commented on PR #3757:
URL: 
https://github.com/apache/datafusion-comet/pull/3757#issuecomment-4178458553


   Some test suggestions for edge cases that could reveal incompatibilities. 
The main risk is that Comet casts all inputs to f64 before accumulation, while 
Spark stores original typed values in its HashMap.
   
   **Large integers that lose f64 precision** - Spark keeps these as distinct 
keys, Comet would merge them into one count after f64 cast:
   
   ```sql
   SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY v)
   FROM (VALUES (9007199254740992), (9007199254740993)) AS t(v)
   ```
   
   **NaN handling** - Spark includes NaN in the sort (sorts high via Java's 
Double.compare). Worth checking Comet matches:
   
   ```sql
   SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY v)
   FROM (VALUES (1.0), (2.0), (CAST('NaN' AS DOUBLE))) AS t(v)
   ```
   
   **Negative zero**:
   
   ```sql
   SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY v)
   FROM (VALUES (CAST(-0.0 AS DOUBLE)), (CAST(0.0 AS DOUBLE)), (1.0)) AS t(v)
   ```
   
   **Infinity**:
   
   ```sql
   SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY v)
   FROM (VALUES (1.0), (CAST('Infinity' AS DOUBLE)), (CAST('-Infinity' AS 
DOUBLE))) AS t(v)
   ```
   
   If any of these fail, they could be added as `ignore` tests with a linked 
issue, and `getSupportLevel` could be updated to note the incompatibility.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Support Spark expression: percentile_cont [datafusion-comet]

Reply via email to