andygrove opened a new pull request, #4181:
URL: https://github.com/apache/datafusion-comet/pull/4181

   ## Summary
   
   Adds a new test suite (`CometTimeParserPolicySuite`) that verifies Comet 
matches Spark under non-default values of `spark.sql.legacy.timeParserPolicy` 
(`CORRECTED` and `LEGACY`) for: `CAST(string AS date/timestamp)`, `to_date`, 
`to_timestamp`, `unix_timestamp`, `from_unixtime`, and `date_format`.
   
   This is **draft / exploratory** — the goal is to document current behavior 
gaps rather than fix them. It's part of a broader audit of Spark configs whose 
non-default values may produce divergent results in Comet.
   
   ## Findings
   
   Running on Spark 4.0:
   
   | Test | Result |
   |---|---|
   | `CAST(string AS date)` | pass |
   | `CAST(string AS timestamp)` | **fail** → `ignore`d |
   | `to_date(s)` without pattern | pass |
   | `to_date(s, pattern)` | pass (falls back to Spark — no `ParseToDate` 
serde) |
   | `to_timestamp(s)` without pattern | **fail** → `ignore`d |
   | `to_timestamp(s, pattern)` | pass (falls back) |
   | `unix_timestamp(s, pattern)` | pass (falls back) |
   | `from_unixtime(long, pattern)` | pass (`Incompatible(None)` default 
fallback) |
   | `date_format(date, pattern)` | pass (format allowlist) |
   
   The two `ignore`d tests show concrete divergence:
   
   | Input | Spark (LEGACY) | Comet |
   |---|---|---|
   | `2020-1-1 1:2:3` | `2020-01-01 01:02:03.0` | `null` |
   
   Comet's native ISO parser in 
`native/spark-expr/src/conversion_funcs/string.rs` rejects the single-digit 
month/day/hour/minute/second formats that Spark's `SimpleDateFormat` accepts 
under LEGACY. The config is not read anywhere in Comet.
   
   Passing tests mostly represent cases where Comet falls back to Spark 
(pattern-based functions with no serde handler, or `Incompatible(None)` 
default). They're still useful as regression guards.
   
   ## Context
   
   Follow-up to a broader audit of Spark configs whose non-default values can 
silently produce wrong results in Comet — other candidates: 
`parquet.datetimeRebaseModeInRead`, `parquet.int96RebaseModeInRead`, 
`parquet.binaryAsString`, `mapKeyDedupPolicy`.
   
   ## Test plan
   
   - [x] Compile against `-Pspark-4.0 -Pscala-2.13`
   - [x] Run `CometTimeParserPolicySuite` locally — 7 pass, 2 `ignore`
   - [ ] Decide whether the 2 `ignore`d tests drive a fix (honor the policy) or 
explicit fallback (`timeParserPolicy != CORRECTED` → fall back to Spark)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to