andygrove commented on PR #3757:
URL:
https://github.com/apache/datafusion-comet/pull/3757#issuecomment-4178458553
Some test suggestions for edge cases that could reveal incompatibilities.
The main risk is that Comet casts all inputs to f64 before accumulation, while
Spark stores original typed values in its HashMap.
**Large integers that lose f64 precision** - Spark keeps these as distinct
keys, Comet would merge them into one count after f64 cast:
```sql
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY v)
FROM (VALUES (9007199254740992), (9007199254740993)) AS t(v)
```
**NaN handling** - Spark includes NaN in the sort (sorts high via Java's
Double.compare). Worth checking Comet matches:
```sql
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY v)
FROM (VALUES (1.0), (2.0), (CAST('NaN' AS DOUBLE))) AS t(v)
```
**Negative zero**:
```sql
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY v)
FROM (VALUES (CAST(-0.0 AS DOUBLE)), (CAST(0.0 AS DOUBLE)), (1.0)) AS t(v)
```
**Infinity**:
```sql
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY v)
FROM (VALUES (1.0), (CAST('Infinity' AS DOUBLE)), (CAST('-Infinity' AS
DOUBLE))) AS t(v)
```
If any of these fail, they could be added as `ignore` tests with a linked
issue, and `getSupportLevel` could be updated to note the incompatibility.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]