kosiew commented on code in PR #21710:
URL: https://github.com/apache/datafusion/pull/21710#discussion_r3128712978
##########
datafusion/spark/src/function/math/ceil.rs:
##########
@@ -301,4 +443,95 @@ mod tests {
};
assert_eq!(result, ScalarValue::Int64(Some(48)));
}
+
+ #[test]
Review Comment:
These tests call `spark_ceil()` directly, which is great for coverage, but
they don’t verify that the declared schema matches the produced values.
It would be really helpful to add a small `return_type()` test or an
end-to-end planner test, especially for cases like `ceil(..., 0)` and decimal
inputs. That would make it much easier to catch type contract regressions like
the ones above.
##########
datafusion/spark/src/function/math/ceil.rs:
##########
@@ -79,6 +125,8 @@ impl ScalarUDFImpl for SparkCeil {
Ok(DataType::Decimal128(*p, *s))
}
}
+ DataType::Float32 if has_scale => Ok(DataType::Float32),
Review Comment:
`return_type()` now reports `Float32` or `Float64` for all 2-argument float
calls, but the execution path still returns `Int64` when `scale == 0` in
`spark_ceil_scalar` and `spark_ceil_array`.
That means something like `ceil(col, 0)` will plan with one schema and
produce another at runtime. That mismatch is going to cause trouble.
Could we either make the 2-arg form return a stable type regardless of
scale, or remove the special-casing for zero in the execution path?
##########
datafusion/spark/src/function/math/ceil.rs:
##########
@@ -121,11 +224,21 @@ fn decimal128_ceil_precision(precision: u8, scale: i8) ->
u8 {
((precision as i64) - (scale as i64) + 1).clamp(1, 38) as u8
}
-fn spark_ceil_scalar(value: &ScalarValue) -> Result<ColumnarValue> {
+fn spark_ceil_scalar(value: &ScalarValue, scale: i32) -> Result<ColumnarValue>
{
let result = match value {
- ScalarValue::Float32(v) => ScalarValue::Int64(v.map(|x| x.ceil() as
i64)),
- ScalarValue::Float64(v) => ScalarValue::Int64(v.map(|x| x.ceil() as
i64)),
+ // Floats without scale (scale=0) -> Int64 (original behaivour)
+ ScalarValue::Float32(v) if scale == 0 => {
+ ScalarValue::Int64(v.map(|x| x.ceil() as i64))
+ }
+ ScalarValue::Float64(v) if scale == 0 => {
+ ScalarValue::Int64(v.map(|x| x.ceil() as i64))
+ }
+ // Floats with scale -> preserve type
+ ScalarValue::Float32(v) => ScalarValue::Float32(v.map(|x|
ceil_float(x, scale))),
+ ScalarValue::Float64(v) => ScalarValue::Float64(v.map(|x|
ceil_float(x, scale))),
+ // Integers: negative scale rounds, positive is no-op
v if v.data_type().is_integer() => v.cast_to(&DataType::Int64)?,
Review Comment:
The new signature advertises `ceil(integer, scale)`, but this branch ignores
`scale` entirely and just casts to `Int64`.
So calls like `ceil(3345, -2)` behave exactly like the 1-arg version,
instead of rounding at the requested decimal position. That makes part of the
new API effectively non-functional right now.
We should either implement the scale-aware behavior here or avoid exposing
this signature until it is supported.
##########
datafusion/spark/src/function/math/ceil.rs:
##########
@@ -97,12 +145,67 @@ impl ScalarUDFImpl for SparkCeil {
}
}
+/// Extract the scale (decimal places) from the second argument.
+/// Returns `Some(0)` if no second argument is provided.
+/// Returns `None` if the scale argument is NULL (Spark returns NULL for
`round(expr, NULL)`).
+fn get_scale(args: &[ColumnarValue]) -> Result<Option<i32>> {
Review Comment:
`get_scale` looks like a direct copy of the helper from `round.rs`,
including the "round scale" error message.
Since we are already touching this area, would it make sense to extract this
into a shared helper? It would help keep `round` and `ceil` consistent and
reduce the chance of copy/paste drift later.
##########
datafusion/spark/src/function/math/ceil.rs:
##########
@@ -121,11 +224,21 @@ fn decimal128_ceil_precision(precision: u8, scale: i8) ->
u8 {
((precision as i64) - (scale as i64) + 1).clamp(1, 38) as u8
}
-fn spark_ceil_scalar(value: &ScalarValue) -> Result<ColumnarValue> {
+fn spark_ceil_scalar(value: &ScalarValue, scale: i32) -> Result<ColumnarValue>
{
let result = match value {
- ScalarValue::Float32(v) => ScalarValue::Int64(v.map(|x| x.ceil() as
i64)),
- ScalarValue::Float64(v) => ScalarValue::Int64(v.map(|x| x.ceil() as
i64)),
+ // Floats without scale (scale=0) -> Int64 (original behaivour)
+ ScalarValue::Float32(v) if scale == 0 => {
+ ScalarValue::Int64(v.map(|x| x.ceil() as i64))
+ }
+ ScalarValue::Float64(v) if scale == 0 => {
+ ScalarValue::Int64(v.map(|x| x.ceil() as i64))
+ }
+ // Floats with scale -> preserve type
+ ScalarValue::Float32(v) => ScalarValue::Float32(v.map(|x|
ceil_float(x, scale))),
+ ScalarValue::Float64(v) => ScalarValue::Float64(v.map(|x|
ceil_float(x, scale))),
+ // Integers: negative scale rounds, positive is no-op
v if v.data_type().is_integer() => v.cast_to(&DataType::Int64)?,
+ // Decimal128 with positive scalar
ScalarValue::Decimal128(v, p, s) if *s > 0 => {
Review Comment:
The decimal path is still using the input type’s native scale (`*s`) instead
of the provided second argument.
As a result, `ceil(decimal_expr, scale)` behaves like the 1-arg version and
always rounds to scale 0. This also conflicts with `return_type()` for 2-arg
decimals, which advertises the original decimal type.
This probably needs a dedicated implementation that applies the requested
scale, including negative values. Otherwise it might be safer to leave the
2-arg decimal signature disabled for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]