andygrove opened a new issue, #4729:
URL: https://github.com/apache/datafusion-comet/issues/4729

   ### Describe the bug
   
   `SUM(<decimal>)` over a **sliding** window frame (any frame whose lower 
bound is not `UNBOUNDED PRECEDING`) returns a wrapped, out-of-range 
`Decimal128` value when the running sum overflows the result decimal precision, 
instead of Spark's `NULL`.
   
   Native execution routes sliding-frame decimal `SUM` to DataFusion's built-in 
`sum`, whose accumulator uses `add_wrapping` / `sub_wrapping` 
(`datafusion-functions-aggregate` `sum.rs`) and never applies Spark's 
decimal-overflow-to-`NULL` semantics. Ever-expanding frames (lower bound 
`UNBOUNDED PRECEDING`) correctly use Comet's `SumDecimal` UDAF and return 
`NULL` on overflow, matching Spark.
   
   The Rust code already acknowledges this gap in a comment in 
`process_agg_func` (`native/core/src/execution/planner.rs`):
   
   > For sliding frames, those UDAFs can't be used (no retract_batch), so 
delegate to DataFusion's built-in `sum`, which supports retract but doesn't 
enforce Spark's decimal precision overflow-to-NULL.
   
   but `CometWindowExec.convert` does not fall back to Spark for this case, so 
the wrong result reaches the user. The wrapped value is even outside the 
declared `Decimal(38, 0)` range, which makes Spark's result decoding throw 
`EXPRESSION_DECODING_FAILED` / `NUMERIC_VALUE_OUT_OF_RANGE` when collecting the 
Comet result.
   
   This is decimal-specific. Bigint sliding `SUM` overflow matches Spark (both 
wrap in two's complement under non-ANSI), and ever-expanding decimal `SUM` 
overflow matches Spark (both `NULL`).
   
   ### Steps to reproduce
   
   ```sql
   statement
   CREATE TABLE dec_ovf(g int, v decimal(38,0)) USING parquet
   
   statement
   INSERT INTO dec_ovf VALUES
     (1, 90000000000000000000000000000000000000),
     (1, 90000000000000000000000000000000000000),
     (1, 90000000000000000000000000000000000000)
   
   -- sliding frame (lower bound = CURRENT ROW): routes to DataFusion built-in 
sum
   query
   SELECT v, SUM(v) OVER (PARTITION BY g ORDER BY v
     ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS tail_sum
   FROM dec_ovf
   ```
   
   Comet produces a wrapped value such as 
`-160282366920938463463374607431768211456`, which then fails result decoding:
   
   ```
   org.apache.spark.SparkRuntimeException: [EXPRESSION_DECODING_FAILED] ...
   Caused by: org.apache.spark.SparkArithmeticException: 
[NUMERIC_VALUE_OUT_OF_RANGE.WITHOUT_SUGGESTION]
     The -160282366920938463463374607431768211456 rounded half up ... cannot be 
represented as Decimal(38, 0).
   ```
   
   ### Expected behavior
   
   Comet returns `NULL` for window rows whose decimal sum overflows the result 
precision, matching Spark non-ANSI semantics (and throwing under ANSI). The 
most direct fix is to fall back to Spark for decimal `SUM` with a 
non-ever-expanding frame in `CometWindowExec`, mirroring the existing 
RANGE-frame DATE/DECIMAL fallbacks.
   
   ### Additional context
   
   - Surfaced by an audit of the native window support added in #4209.
   - The non-overflow sliding decimal `SUM` path is correct and should stay 
native; only the overflow case diverges.
   - The same DataFusion-built-in routing applies to sliding decimal `AVG`, but 
on Spark 4.x decimal `AVG` over a window currently falls back to Spark for an 
unrelated serde reason, so it is not reachable there. It may be reachable on 
Spark 3.x and should be checked as part of the fix.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to