andygrove opened a new issue, #4596: URL: https://github.com/apache/datafusion-comet/issues/4596
### Background PR #4538 introduced an opt-in `CodegenDispatchFallback` marker trait. When an expression reports `Incompatible` and the user has not enabled `allowIncompatible`, mixing in the trait routes it through the JVM codegen dispatcher (Spark's own `doGenCode` inside the Comet pipeline) so the projection stays native while matching Spark exactly, instead of falling the whole projection back to Spark. Before that, enrollment was opt-out, which silently routed a number of native `Incompatible` expressions through the dispatcher without deliberate review or test coverage. Those were reverted to Spark fallback (matching `main`) in PR #4538. This issue tracks deliberately opting them back in, one at a time, each with explicit test coverage proving the dispatched path matches Spark. ### Candidate expressions These report `Incompatible` for some inputs, have a real Spark `doGenCode` (so the dispatcher can accept them), and currently fall back to Spark: - [ ] `concat` (`CometConcat`) - non-UTF8_BINARY collation - [ ] `array_intersect` (`CometArrayIntersect`) - [ ] `array_except` (`CometArrayExcept`) - [ ] `array_join` (`CometArrayJoin`) - [ ] `from_utc_timestamp` (`CometFromUTCTimestamp`) - [ ] `to_utc_timestamp` (`CometToUTCTimestamp`) - [ ] `convert_timezone` (`CometConvertTimezone`) - [ ] `sort_array` (`CometSortArray`) - verify dispatcher eligibility first; this one may be `CodegenFallback` in Spark, in which case the dispatcher refuses it and it should stay on Spark fallback ### What to do per expression 1. Confirm the dispatcher accepts it (not `CodegenFallback`, supported input/output types per `CometBatchKernelCodegen.canHandle`). If refused, leave it as Spark fallback and note it. 2. Mix `CodegenDispatchFallback` into the serde that is **registered** in `QueryPlanSerde` (not a delegate; see how `Reverse` registers `CometReverse`, which delegates to `CometArrayReverse`). 3. Add tests asserting native execution that matches Spark for the `Incompatible` inputs (`checkSparkAnswerAndOperator` / `query` in SQL file tests, replacing the `expect_fallback` assertion). 4. For the timezone-sensitive expressions, test across multiple session time zones, since dispatch correctness depends on the resolved `timeZoneId` surviving closure serialization. ### Notes The expressions that have no native implementation and extend `CometCodegenDispatch` (`hypot`, `bround`, `sequence`, `elt`, ...) are out of scope: they report `Compatible` and already dispatch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
