andygrove opened a new pull request, #4538:
URL: https://github.com/apache/datafusion-comet/pull/4538

   ## Which issue does this PR close?
   
   Closes #.
   
   ## Rationale for this change
   
   Comet has a JVM codegen dispatcher (`CometCodegenDispatch` / 
`CometScalaUDF.emitJvmCodegenDispatch`) that runs a Spark expression's own 
`doGenCode` inside the Comet native pipeline. This lets Comet keep a query 
native even when there is no Rust implementation for an expression, while 
guaranteeing behavior matches Spark exactly across all supported Spark 
versions. When the dispatcher is disabled the operator falls back to Spark 
cleanly.
   
   A number of Spark built-in scalar expressions had no native Comet support 
and previously forced a fallback to Spark. Routing them through the dispatcher 
keeps those plans native without any compatibility risk.
   
   ## What changes are included in this PR?
   
   Registers `CometCodegenDispatch` serdes for the following scalar expressions 
and wires them into `QueryPlanSerde`:
   
   - math: `hypot`, `nanvl`, `bround`, `conv`, `log1p`, `pmod`, `width_bucket`, 
`positive`
   - string: `levenshtein`, `elt`, `find_in_set`, `format_number`, 
`format_string`, `overlay`, `soundex`, `locate`, `unbase64`, `to_char`, 
`to_number`
   - array: `sequence`
   
   Scope notes:
   
   - JSON and regexp functions are intentionally excluded (separate PRs are 
open for those).
   - Expressions that cannot use the dispatcher were excluded: generators, 
`RuntimeReplaceable` expressions (rewritten before serde), `CodegenFallback` 
expressions (e.g. xpath), higher-order functions, interval/null output types, 
and folded-at-plan-time expressions (`current_*`).
   - A few candidates were dropped after testing surfaced real issues: 
`map_concat` (dispatcher returns a wrong map key), `try_to_number` (throws 
instead of returning NULL on invalid input), and `encode` (lowers to 
`StaticInvoke` in the running Spark version so the class is never seen). These 
are noted as follow-ups.
   
   ## How are these changes tested?
   
   Each expression has a SQL file test under 
`spark/src/test/resources/sql-tests/expressions/` run by 
`CometSqlFileTestSuite`. The `query` mode uses `checkSparkAnswerAndOperator`, 
which asserts both answer parity with Spark and that the expression executed 
natively (a fallback fails the test). All new fixtures pass against Spark 3.5.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to