andygrove opened a new pull request, #4538: URL: https://github.com/apache/datafusion-comet/pull/4538
## Which issue does this PR close? Closes #. ## Rationale for this change Comet has a JVM codegen dispatcher (`CometCodegenDispatch` / `CometScalaUDF.emitJvmCodegenDispatch`) that runs a Spark expression's own `doGenCode` inside the Comet native pipeline. This lets Comet keep a query native even when there is no Rust implementation for an expression, while guaranteeing behavior matches Spark exactly across all supported Spark versions. When the dispatcher is disabled the operator falls back to Spark cleanly. A number of Spark built-in scalar expressions had no native Comet support and previously forced a fallback to Spark. Routing them through the dispatcher keeps those plans native without any compatibility risk. ## What changes are included in this PR? Registers `CometCodegenDispatch` serdes for the following scalar expressions and wires them into `QueryPlanSerde`: - math: `hypot`, `nanvl`, `bround`, `conv`, `log1p`, `pmod`, `width_bucket`, `positive` - string: `levenshtein`, `elt`, `find_in_set`, `format_number`, `format_string`, `overlay`, `soundex`, `locate`, `unbase64`, `to_char`, `to_number` - array: `sequence` Scope notes: - JSON and regexp functions are intentionally excluded (separate PRs are open for those). - Expressions that cannot use the dispatcher were excluded: generators, `RuntimeReplaceable` expressions (rewritten before serde), `CodegenFallback` expressions (e.g. xpath), higher-order functions, interval/null output types, and folded-at-plan-time expressions (`current_*`). - A few candidates were dropped after testing surfaced real issues: `map_concat` (dispatcher returns a wrong map key), `try_to_number` (throws instead of returning NULL on invalid input), and `encode` (lowers to `StaticInvoke` in the running Spark version so the class is never seen). These are noted as follow-ups. ## How are these changes tested? Each expression has a SQL file test under `spark/src/test/resources/sql-tests/expressions/` run by `CometSqlFileTestSuite`. The `query` mode uses `checkSparkAnswerAndOperator`, which asserts both answer parity with Spark and that the expression executed natively (a fallback fails the test). All new fixtures pass against Spark 3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
