Re: [PR] [datafusion-spark] Support 2-argument ceil(value, scale) [datafusion]

via GitHub Wed, 22 Apr 2026 23:17:31 -0700


kosiew commented on code in PR #21710:
URL: https://github.com/apache/datafusion/pull/21710#discussion_r3128712978



##########
datafusion/spark/src/function/math/ceil.rs:
##########
@@ -301,4 +443,95 @@ mod tests {
         };
         assert_eq!(result, ScalarValue::Int64(Some(48)));
     }
+
+    #[test]

Review Comment:
   These tests call `spark_ceil()` directly, which is great for coverage, but 
they don’t verify that the declared schema matches the produced values.
   
   It would be really helpful to add a small `return_type()` test or an 
end-to-end planner test, especially for cases like `ceil(..., 0)` and decimal 
inputs. That would make it much easier to catch type contract regressions like 
the ones above.



##########
datafusion/spark/src/function/math/ceil.rs:
##########
@@ -79,6 +125,8 @@ impl ScalarUDFImpl for SparkCeil {
                     Ok(DataType::Decimal128(*p, *s))
                 }
             }
+            DataType::Float32 if has_scale => Ok(DataType::Float32),

Review Comment:
   `return_type()` now reports `Float32` or `Float64` for all 2-argument float 
calls, but the execution path still returns `Int64` when `scale == 0` in 
`spark_ceil_scalar` and `spark_ceil_array`.
   
   That means something like `ceil(col, 0)` will plan with one schema and 
produce another at runtime. That mismatch is going to cause trouble.
   
   Could we either make the 2-arg form return a stable type regardless of 
scale, or remove the special-casing for zero in the execution path?



##########
datafusion/spark/src/function/math/ceil.rs:
##########
@@ -121,11 +224,21 @@ fn decimal128_ceil_precision(precision: u8, scale: i8) -> 
u8 {
     ((precision as i64) - (scale as i64) + 1).clamp(1, 38) as u8
 }
 
-fn spark_ceil_scalar(value: &ScalarValue) -> Result<ColumnarValue> {
+fn spark_ceil_scalar(value: &ScalarValue, scale: i32) -> Result<ColumnarValue> 
{
     let result = match value {
-        ScalarValue::Float32(v) => ScalarValue::Int64(v.map(|x| x.ceil() as 
i64)),
-        ScalarValue::Float64(v) => ScalarValue::Int64(v.map(|x| x.ceil() as 
i64)),
+        // Floats without scale (scale=0) -> Int64 (original behaivour)
+        ScalarValue::Float32(v) if scale == 0 => {
+            ScalarValue::Int64(v.map(|x| x.ceil() as i64))
+        }
+        ScalarValue::Float64(v) if scale == 0 => {
+            ScalarValue::Int64(v.map(|x| x.ceil() as i64))
+        }
+        // Floats with scale -> preserve type
+        ScalarValue::Float32(v) => ScalarValue::Float32(v.map(|x| 
ceil_float(x, scale))),
+        ScalarValue::Float64(v) => ScalarValue::Float64(v.map(|x| 
ceil_float(x, scale))),
+        // Integers: negative scale rounds, positive is no-op
         v if v.data_type().is_integer() => v.cast_to(&DataType::Int64)?,

Review Comment:
   The new signature advertises `ceil(integer, scale)`, but this branch ignores 
`scale` entirely and just casts to `Int64`.
   
   So calls like `ceil(3345, -2)` behave exactly like the 1-arg version, 
instead of rounding at the requested decimal position. That makes part of the 
new API effectively non-functional right now.
   
   We should either implement the scale-aware behavior here or avoid exposing 
this signature until it is supported.



##########
datafusion/spark/src/function/math/ceil.rs:
##########
@@ -97,12 +145,67 @@ impl ScalarUDFImpl for SparkCeil {
     }
 }
 
+/// Extract the scale (decimal places) from the second argument.
+/// Returns `Some(0)` if no second argument is provided.
+/// Returns `None` if the scale argument is NULL (Spark returns NULL for 
`round(expr, NULL)`).
+fn get_scale(args: &[ColumnarValue]) -> Result<Option<i32>> {

Review Comment:
   `get_scale` looks like a direct copy of the helper from `round.rs`, 
including the "round scale" error message.
   
   Since we are already touching this area, would it make sense to extract this 
into a shared helper? It would help keep `round` and `ceil` consistent and 
reduce the chance of copy/paste drift later.



##########
datafusion/spark/src/function/math/ceil.rs:
##########
@@ -121,11 +224,21 @@ fn decimal128_ceil_precision(precision: u8, scale: i8) -> 
u8 {
     ((precision as i64) - (scale as i64) + 1).clamp(1, 38) as u8
 }
 
-fn spark_ceil_scalar(value: &ScalarValue) -> Result<ColumnarValue> {
+fn spark_ceil_scalar(value: &ScalarValue, scale: i32) -> Result<ColumnarValue> 
{
     let result = match value {
-        ScalarValue::Float32(v) => ScalarValue::Int64(v.map(|x| x.ceil() as 
i64)),
-        ScalarValue::Float64(v) => ScalarValue::Int64(v.map(|x| x.ceil() as 
i64)),
+        // Floats without scale (scale=0) -> Int64 (original behaivour)
+        ScalarValue::Float32(v) if scale == 0 => {
+            ScalarValue::Int64(v.map(|x| x.ceil() as i64))
+        }
+        ScalarValue::Float64(v) if scale == 0 => {
+            ScalarValue::Int64(v.map(|x| x.ceil() as i64))
+        }
+        // Floats with scale -> preserve type
+        ScalarValue::Float32(v) => ScalarValue::Float32(v.map(|x| 
ceil_float(x, scale))),
+        ScalarValue::Float64(v) => ScalarValue::Float64(v.map(|x| 
ceil_float(x, scale))),
+        // Integers: negative scale rounds, positive is no-op
         v if v.data_type().is_integer() => v.cast_to(&DataType::Int64)?,
+        // Decimal128 with positive scalar
         ScalarValue::Decimal128(v, p, s) if *s > 0 => {

Review Comment:
   The decimal path is still using the input type’s native scale (`*s`) instead 
of the provided second argument.
   
   As a result, `ceil(decimal_expr, scale)` behaves like the 1-arg version and 
always rounds to scale 0. This also conflicts with `return_type()` for 2-arg 
decimals, which advertises the original decimal type.
   
   This probably needs a dedicated implementation that applies the requested 
scale, including negative values. Otherwise it might be safer to leave the 
2-arg decimal signature disabled for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [datafusion-spark] Support 2-argument ceil(value, scale) [datafusion]

Reply via email to