andygrove opened a new issue, #21514:
URL: https://github.com/apache/datafusion/issues/21514

   ### Describe the bug
   
   The `datafusion-spark` `mod` and `pmod` functions return `NaN` for 
floating-point modulo by zero, while Apache Spark returns `NULL`. The integer 
case is handled correctly (returns NULL), but the float case falls through to 
IEEE 754 behavior.
   
   ### To Reproduce
   
   **PySpark (correct behavior):**
   ```sql
   SELECT MOD(CAST(10.5 AS DOUBLE), CAST(0.0 AS DOUBLE));  -- NULL
   SELECT MOD(CAST(10.5 AS FLOAT), CAST(0.0 AS FLOAT));    -- NULL
   SELECT MOD(CAST(10 AS INT), CAST(0 AS INT));             -- NULL
   SELECT pmod(CAST(10.5 AS DOUBLE), CAST(0.0 AS DOUBLE)); -- NULL
   ```
   
   Spark returns NULL for all types consistently.
   
   **DataFusion-spark (incorrect behavior):**
   ```sql
   SELECT MOD(10.5::DOUBLE, 0.0::DOUBLE);  -- NaN (should be NULL)
   SELECT MOD(10.5::FLOAT, 0.0::FLOAT);    -- NaN (should be NULL)
   SELECT MOD(10::INT, 0::INT);             -- NULL (correct)
   SELECT pmod(10.5::DOUBLE, 0.0::DOUBLE); -- NaN (should be NULL)
   ```
   
   ### Expected behavior
   
   `mod` and `pmod` should return NULL for division by zero across all numeric 
types, matching Spark behavior.
   
   ### Additional context
   
   **Root cause:** In `datafusion/spark/src/function/math/modulus.rs`, the 
`try_rem` function (line 35) handles divide-by-zero by catching 
`ArrowError::DivideByZero` and nulling out zero divisors. However, Arrow's 
`rem` for float types does not throw `DivideByZero` — it returns `NaN` per IEEE 
754. So the zero-check path is never triggered for floats, and `NaN` passes 
through as the result.
   
   **Fix:** The function needs to explicitly check for zero divisors in float 
columns before or after calling `rem`, rather than relying on the 
`DivideByZero` error which only fires for integer types. One approach is to 
always null out zero-divisor positions before calling `rem`, regardless of type.
   
   The `.slt` tests in `math/mod.slt` and `math/pmod.slt` also have incorrect 
expected values (`NaN` instead of `NULL`) for float div-by-zero cases and will 
need to be updated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to