andygrove opened a new pull request, #4181: URL: https://github.com/apache/datafusion-comet/pull/4181
## Summary Adds a new test suite (`CometTimeParserPolicySuite`) that verifies Comet matches Spark under non-default values of `spark.sql.legacy.timeParserPolicy` (`CORRECTED` and `LEGACY`) for: `CAST(string AS date/timestamp)`, `to_date`, `to_timestamp`, `unix_timestamp`, `from_unixtime`, and `date_format`. This is **draft / exploratory** — the goal is to document current behavior gaps rather than fix them. It's part of a broader audit of Spark configs whose non-default values may produce divergent results in Comet. ## Findings Running on Spark 4.0: | Test | Result | |---|---| | `CAST(string AS date)` | pass | | `CAST(string AS timestamp)` | **fail** → `ignore`d | | `to_date(s)` without pattern | pass | | `to_date(s, pattern)` | pass (falls back to Spark — no `ParseToDate` serde) | | `to_timestamp(s)` without pattern | **fail** → `ignore`d | | `to_timestamp(s, pattern)` | pass (falls back) | | `unix_timestamp(s, pattern)` | pass (falls back) | | `from_unixtime(long, pattern)` | pass (`Incompatible(None)` default fallback) | | `date_format(date, pattern)` | pass (format allowlist) | The two `ignore`d tests show concrete divergence: | Input | Spark (LEGACY) | Comet | |---|---|---| | `2020-1-1 1:2:3` | `2020-01-01 01:02:03.0` | `null` | Comet's native ISO parser in `native/spark-expr/src/conversion_funcs/string.rs` rejects the single-digit month/day/hour/minute/second formats that Spark's `SimpleDateFormat` accepts under LEGACY. The config is not read anywhere in Comet. Passing tests mostly represent cases where Comet falls back to Spark (pattern-based functions with no serde handler, or `Incompatible(None)` default). They're still useful as regression guards. ## Context Follow-up to a broader audit of Spark configs whose non-default values can silently produce wrong results in Comet — other candidates: `parquet.datetimeRebaseModeInRead`, `parquet.int96RebaseModeInRead`, `parquet.binaryAsString`, `mapKeyDedupPolicy`. ## Test plan - [x] Compile against `-Pspark-4.0 -Pscala-2.13` - [x] Run `CometTimeParserPolicySuite` locally — 7 pass, 2 `ignore` - [ ] Decide whether the 2 `ignore`d tests drive a fix (honor the policy) or explicit fallback (`timeParserPolicy != CORRECTED` → fall back to Spark) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
