andygrove opened a new pull request, #4321:
URL: https://github.com/apache/datafusion-comet/pull/4321

   ## Which issue does this PR close?
   
   Part of #4311.
   
   ## Rationale for this change
   
   Comet's native Rust implementations of `hour`, `minute`, and `second` 
incorrectly apply timezone conversion to `TimestampNTZType` inputs (#3180). 
Today this means Comet falls back to Spark for the NTZ case, losing native 
execution for the rest of the stage.
   
   This PR prototypes the approach outlined in #4311: add an opt-in JVM UDF 
path that produces bit-exact Spark results for the affected expressions, in 
exchange for a JNI round-trip per native batch. It mirrors the engine-selection 
idiom proposed in the open regex PR (#4239), keeping the framing consistent 
across families. The default stays `rust`; users opt into `java` for full Spark 
compatibility.
   
   The scope is intentionally narrow: three closely related expressions sharing 
a serde shape and a UDF base class. Once this engine-config model is reviewed, 
it extends naturally to `TruncTimestamp`, `DateFormat`, and the 
field-extraction expressions in a follow-up.
   
   ## What changes are included in this PR?
   
   - New config `spark.comet.exec.datetime.engine` with values `rust` (default) 
and `java`. Lives in `CometConf.scala`.
   - New `DateTimeFieldUDF` abstract base in 
`common/src/main/scala/org/apache/comet/udf/`, implementing the Arrow vector 
iteration, null preservation, and TZ/NTZ branching using `java.time`. Concrete 
subclasses `HourUDF`, `MinuteUDF`, `SecondUDF` each supply a one-line 
`extract(LocalDateTime): Int`.
   - Fork in `CometHour`/`CometMinute`/`CometSecond` (in 
`spark/src/main/scala/org/apache/comet/serde/datetime.scala`): when 
`engine=java`, the serde builds a `JvmScalarUdf` proto pointing at the matching 
UDF class; when `engine=rust`, the existing native path is unchanged. A shared 
private `DateTimeFieldUdfHelper.buildProto` factors the proto construction.
   - The `engine=java` branch in `getSupportLevel` matches explicitly on 
`TimestampType | TimestampNTZType` and returns `Unsupported` for anything else, 
so a future input-type change in Spark falls back cleanly instead of crashing 
the UDF at runtime.
   - Compat doc: new "Engine Selection (experimental)" section in 
`docs/source/user-guide/latest/compatibility/expressions/datetime.md`.
   
   No native Rust changes; the `JvmScalarUdf` proto and `JvmScalarUdfExpr` 
planner integration were already merged in #4232.
   
   This PR was scaffolded with the project's `brainstorming` and 
`writing-plans` skills.
   
   ## How are these changes tested?
   
   Three test suites, totaling 17 new tests:
   
   - `DateTimeFieldUDFSuite` (Scala unit, in the spark module): 8 tests 
exercising each UDF directly on Arrow vectors. Covers TZ + NTZ, multiple 
session timezones, and null preservation.
   - `CometDatetimeJvmSuite` (Spark integration): 8 tests with `engine=java` 
set in `sparkConf`. Each of `hour`/`minute`/`second` is checked across UTC, 
`America/Los_Angeles`, and `Asia/Tokyo` on both `TimestampType` and 
`TimestampNTZ`, plus a DST fall-back boundary test in LA and a plan-inspection 
test asserting native execution. All use `checkSparkAnswerAndOperator` to 
verify both correctness against Spark and that the operator ran natively.
   - `CometDatetimeEngineDefaultSuite` (Spark integration): 1 test asserting 
the config defaults to `rust` and that the rust path still runs natively under 
defaults. Regression guard for the existing path.
   
   Also verified locally: `spotless:apply` is a no-op, `cargo clippy 
--all-targets --workspace -- -D warnings` is clean, and 
`CometTemporalExpressionSuite` (27 tests) still passes unchanged, confirming 
the rust default path is untouched.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to