andygrove commented on PR #4538:
URL:
https://github.com/apache/datafusion-comet/pull/4538#issuecomment-4626837504
## Follow-up: make codegen-dispatch enrollment opt-in (commit 3a9ec2d41)
While reviewing the codegen-dispatch wiring I found a soundness issue with
how expressions were enrolled onto the dispatcher, and reworked it.
### The issue
Routing an `Incompatible` result through the JVM codegen dispatcher was
**opt-out**: `CometExpressionSerde.allowIncompatCodegenDispatch` defaulted to
`true`, so *every* expression that reports `Incompatible` (and is not
explicitly allowed via `allowIncompatible`) was routed through the dispatcher
unless it remembered to override the flag to `false`. Only 5 serdes did so.
That default silently enrolled a dozen-plus pre-existing native expressions
that this PR never intended to touch. Checked against Spark source for real
`doGenCode` (the dispatcher refuses `CodegenFallback`), the implicitly-enrolled
set included `array_except`, `array_intersect`, `array_join`, `concat`,
`from_utc_timestamp`, `to_utc_timestamp`, `convert_timezone`, and `sort_array`.
None had deliberate test coverage, the change was invisible in review, and it
was most concerning for the timezone-sensitive expressions where dispatch
correctness depends on state surviving closure serialization.
### The new approach
Enrollment is now **opt-in** via a marker trait:
```scala
trait CodegenDispatchFallback { self: CometExpressionSerde[_] => }
```
The `QueryPlanSerde` `Incompatible` branch routes through the dispatcher
only when the handler mixes in `CodegenDispatchFallback`, otherwise it falls
back to Spark. The `allowIncompatCodegenDispatch` flag and its 5 `= false`
overrides are removed (those serdes simply do not opt in, so their behavior is
unchanged).
The 8 serdes this PR deliberately dispatches are enrolled explicitly, each
backed by the test coverage added here:
| Serde | Incompatible case routed to dispatch |
| --- | --- |
| `CometHour`, `CometMinute`, `CometSecond` | `TimestampNTZ` |
| `CometFromUnixTime` | always |
| `CometMapFromEntries` | `BinaryType` key/value |
| `CometReverse` | binary-element arrays, non-UTF8_BINARY collation |
| `CometTruncDate`, `CometTruncTimestamp` | non-literal format |
The accidentally-enrolled expressions above revert to Spark fallback,
matching `main`. The expressions that have no native implementation and extend
`CometCodegenDispatch` (`hypot`, `bround`, `sequence`, `elt`, ...) are
unaffected: they report `Compatible` and never reach this branch.
### Validation
- `CometArrayExpressionSuite`, `CometMapExpressionSuite`,
`CometTemporalExpressionSuite` all green (cover `reverse`/`map_from_entries`
binary and `trunc`/`date_trunc` dispatch).
- `CometSqlFileTestSuite` failures are identical to the pre-change branch
(pre-existing, unrelated), so no regressions from this change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]