andygrove opened a new issue, #4731:
URL: https://github.com/apache/datafusion-comet/issues/4731
### Describe the bug
`AVG(<decimal>)` over a window falls back to Spark on Spark 4.x, even though
`CometWindowExec` reports `AVG` as a natively supported window aggregate and
`process_agg_func` has an `AvgDecimal` window branch.
On Spark 4.x, `Average` of a decimal is represented in the physical plan as
a `Cast(avg(UnscaledValue(child)) / pow10 AS decimal(p, s))` wrapping the
`WindowExpression`, for example:
```
Window [cast((avg(UnscaledValue(v#39)) windowspecdefinition(...) / 100.0) as
decimal(14,6)) AS run_avg#58]
```
`CometWindowExec.convert` only unwraps two shapes:
```scala
case Alias(w: WindowExpression, _) => w
case Alias(MakeDecimal(w: WindowExpression, _, _, _), _) => w
case other => withFallbackReason(...); return None
```
The `Cast(Divide(...))` form matches neither, so the whole `WindowExec`
falls back to Spark. Results stay correct (Spark computes them), but:
- The `AvgDecimal` native window branch in `process_agg_func`
(`native/core/src/execution/planner.rs`) is effectively dead on Spark 4.x.
- `AVG` decimal window aggregates never run natively on Spark 4.x, contrary
to the coverage implied by #4209.
### Steps to reproduce
```sql
statement
CREATE TABLE dec_avg(g int, v decimal(10,2)) USING parquet
statement
INSERT INTO dec_avg VALUES (1, 10.10), (1, 20.25), (1, 30.33), (1, 41.00)
-- runs natively on integer/double, falls back on decimal under Spark 4.x
SELECT v, AVG(v) OVER (PARTITION BY g ORDER BY v
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS run_avg
FROM dec_avg
```
Comet falls back: `Unsupported window expression:
cast((avg(UnscaledValue(v)) ... / 100.0) as decimal(14,6))`.
### Expected behavior
`AVG(<decimal>)` over an ever-expanding window frame runs natively via the
`AvgDecimal` UDAF and matches Spark, the same way `SUM(<decimal>)` already
does. This needs `CometWindowExec.convert` to recognize the Spark 4.x
`Cast(Divide(WindowExpression, ...))` average shape (in addition to the
existing `MakeDecimal` shape).
### Additional context
- Surfaced by an audit of the native window support added in #4209.
- The `MakeDecimal` unwrap suggests decimal `AVG` is reachable on some Spark
versions (likely 3.x); the divergent shape is the 4.x `Cast(Divide(...))` form.
Behavior should be confirmed across 3.4 / 3.5 / 4.0 / 4.1 when fixing.
- Once native decimal `AVG` is reachable, the sliding-frame overflow guard
added for #4729 (which also covers `AVG`) becomes the relevant safeguard for
sliding frames.
- This is a coverage gap, not a correctness divergence: fallback produces
correct results.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]