andygrove opened a new pull request, #4321: URL: https://github.com/apache/datafusion-comet/pull/4321
## Which issue does this PR close? Part of #4311. ## Rationale for this change Comet's native Rust implementations of `hour`, `minute`, and `second` incorrectly apply timezone conversion to `TimestampNTZType` inputs (#3180). Today this means Comet falls back to Spark for the NTZ case, losing native execution for the rest of the stage. This PR prototypes the approach outlined in #4311: add an opt-in JVM UDF path that produces bit-exact Spark results for the affected expressions, in exchange for a JNI round-trip per native batch. It mirrors the engine-selection idiom proposed in the open regex PR (#4239), keeping the framing consistent across families. The default stays `rust`; users opt into `java` for full Spark compatibility. The scope is intentionally narrow: three closely related expressions sharing a serde shape and a UDF base class. Once this engine-config model is reviewed, it extends naturally to `TruncTimestamp`, `DateFormat`, and the field-extraction expressions in a follow-up. ## What changes are included in this PR? - New config `spark.comet.exec.datetime.engine` with values `rust` (default) and `java`. Lives in `CometConf.scala`. - New `DateTimeFieldUDF` abstract base in `common/src/main/scala/org/apache/comet/udf/`, implementing the Arrow vector iteration, null preservation, and TZ/NTZ branching using `java.time`. Concrete subclasses `HourUDF`, `MinuteUDF`, `SecondUDF` each supply a one-line `extract(LocalDateTime): Int`. - Fork in `CometHour`/`CometMinute`/`CometSecond` (in `spark/src/main/scala/org/apache/comet/serde/datetime.scala`): when `engine=java`, the serde builds a `JvmScalarUdf` proto pointing at the matching UDF class; when `engine=rust`, the existing native path is unchanged. A shared private `DateTimeFieldUdfHelper.buildProto` factors the proto construction. - The `engine=java` branch in `getSupportLevel` matches explicitly on `TimestampType | TimestampNTZType` and returns `Unsupported` for anything else, so a future input-type change in Spark falls back cleanly instead of crashing the UDF at runtime. - Compat doc: new "Engine Selection (experimental)" section in `docs/source/user-guide/latest/compatibility/expressions/datetime.md`. No native Rust changes; the `JvmScalarUdf` proto and `JvmScalarUdfExpr` planner integration were already merged in #4232. This PR was scaffolded with the project's `brainstorming` and `writing-plans` skills. ## How are these changes tested? Three test suites, totaling 17 new tests: - `DateTimeFieldUDFSuite` (Scala unit, in the spark module): 8 tests exercising each UDF directly on Arrow vectors. Covers TZ + NTZ, multiple session timezones, and null preservation. - `CometDatetimeJvmSuite` (Spark integration): 8 tests with `engine=java` set in `sparkConf`. Each of `hour`/`minute`/`second` is checked across UTC, `America/Los_Angeles`, and `Asia/Tokyo` on both `TimestampType` and `TimestampNTZ`, plus a DST fall-back boundary test in LA and a plan-inspection test asserting native execution. All use `checkSparkAnswerAndOperator` to verify both correctness against Spark and that the operator ran natively. - `CometDatetimeEngineDefaultSuite` (Spark integration): 1 test asserting the config defaults to `rust` and that the rust path still runs natively under defaults. Regression guard for the existing path. Also verified locally: `spotless:apply` is a no-op, `cargo clippy --all-targets --workspace -- -D warnings` is clean, and `CometTemporalExpressionSuite` (27 tests) still passes unchanged, confirming the rust default path is untouched. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
