andygrove opened a new issue, #21514: URL: https://github.com/apache/datafusion/issues/21514
### Describe the bug The `datafusion-spark` `mod` and `pmod` functions return `NaN` for floating-point modulo by zero, while Apache Spark returns `NULL`. The integer case is handled correctly (returns NULL), but the float case falls through to IEEE 754 behavior. ### To Reproduce **PySpark (correct behavior):** ```sql SELECT MOD(CAST(10.5 AS DOUBLE), CAST(0.0 AS DOUBLE)); -- NULL SELECT MOD(CAST(10.5 AS FLOAT), CAST(0.0 AS FLOAT)); -- NULL SELECT MOD(CAST(10 AS INT), CAST(0 AS INT)); -- NULL SELECT pmod(CAST(10.5 AS DOUBLE), CAST(0.0 AS DOUBLE)); -- NULL ``` Spark returns NULL for all types consistently. **DataFusion-spark (incorrect behavior):** ```sql SELECT MOD(10.5::DOUBLE, 0.0::DOUBLE); -- NaN (should be NULL) SELECT MOD(10.5::FLOAT, 0.0::FLOAT); -- NaN (should be NULL) SELECT MOD(10::INT, 0::INT); -- NULL (correct) SELECT pmod(10.5::DOUBLE, 0.0::DOUBLE); -- NaN (should be NULL) ``` ### Expected behavior `mod` and `pmod` should return NULL for division by zero across all numeric types, matching Spark behavior. ### Additional context **Root cause:** In `datafusion/spark/src/function/math/modulus.rs`, the `try_rem` function (line 35) handles divide-by-zero by catching `ArrowError::DivideByZero` and nulling out zero divisors. However, Arrow's `rem` for float types does not throw `DivideByZero` — it returns `NaN` per IEEE 754. So the zero-check path is never triggered for floats, and `NaN` passes through as the result. **Fix:** The function needs to explicitly check for zero divisors in float columns before or after calling `rem`, rather than relying on the `DivideByZero` error which only fires for integer types. One approach is to always null out zero-divisor positions before calling `rem`, regardless of type. The `.slt` tests in `math/mod.slt` and `math/pmod.slt` also have incorrect expected values (`NaN` instead of `NULL`) for float div-by-zero cases and will need to be updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
