andygrove opened a new pull request, #4154:
URL: https://github.com/apache/datafusion-comet/pull/4154

   ## Which issue does this PR close?
   
   Related to #3720.
   
   ## Rationale for this change
   
   Issue #3720 documents that the `native_datafusion` scan can silently return 
incorrect timestamp values when a Parquet file stores INT96 timestamps and the 
read schema requests `TimestampNTZ`. Spark itself raises (SPARK-36182) to 
prevent the unsafe LTZ to NTZ reinterpretation. There is no regression test on 
`main` that captures the silent miscompute, so future changes could mask or 
unmask it without anyone noticing.
   
   This PR adds a single targeted test that demonstrates the bug as it exists 
on `main`, so we have a reproducer recorded in the test suite that PR #4087 (or 
any future fix) can convert into a correctness assertion.
   
   ## What changes are included in this PR?
   
   A new `ParquetInt96NtzCorrectnessSuite` containing one test:
   
   1. Configures `SESSION_LOCAL_TIMEZONE=America/Los_Angeles`, 
`PARQUET_OUTPUT_TIMESTAMP_TYPE=INT96`, and `USE_V1_SOURCE_LIST=parquet`.
   2. Writes `2020-01-01 12:00:00` as `TimestampType` (encoded as INT96).
   3. With Comet disabled, asserts `spark.read.schema("ts 
timestamp_ntz").parquet(...)` raises `SparkException` (Spark's reference 
behavior).
   4. With `spark.comet.scan.impl=native_datafusion`, reads the same file as 
`TimestampNTZ` and asserts the returned `LocalDateTime` does not equal the 
original `2020-01-01T12:00:00`, capturing the silent wall-clock divergence.
   
   ## How are these changes tested?
   
   The new suite is the test. Verified locally against `apache/main` at 
`050e1e2f7` with Spark 3.5: the test passes, confirming the divergence between 
Spark's behavior (throws) and `native_datafusion`'s behavior (returns shifted 
wall-clock value).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to