andygrove opened a new issue, #4218:
URL: https://github.com/apache/datafusion-comet/issues/4218

   ## Description
   
   When a Parquet file stores timestamps as INT96 (Spark's `TimestampType` with 
UTC-adjusted local-time semantics) and the read schema requests `TimestampNTZ`, 
the `native_datafusion` scan silently returns wall-clock values that disagree 
with what was written.
   
   Spark itself raises an error in this scenario (SPARK-36182) to prevent 
silent reinterpretation of an LTZ instant as NTZ. Comet's native scan should 
either match Spark's behavior by raising an error, or correctly handle the 
timezone conversion.
   
   ## Steps to Reproduce
   
   ```scala
   val sessionTz = "America/Los_Angeles"
   val written = "2020-01-01 12:00:00"
   
   withSQLConf(
     SQLConf.SESSION_LOCAL_TIMEZONE.key -> sessionTz,
     SQLConf.PARQUET_OUTPUT_TIMESTAMP_TYPE.key -> "INT96",
     SQLConf.USE_V1_SOURCE_LIST.key -> "parquet") {
     withTempPath { dir =>
       val path = dir.getCanonicalPath
   
       // Write "2020-01-01 12:00:00" America/Los_Angeles as INT96.
       // The bits encode the UTC instant 2020-01-01 20:00:00.
       Seq(Timestamp.valueOf(written)).toDF("ts").write.parquet(path)
   
       // Spark refuses to read INT96 as TimestampNTZ (SPARK-36182)
       withSQLConf(CometConf.COMET_ENABLED.key -> "false") {
         intercept[SparkException] {
           spark.read.schema("ts timestamp_ntz").parquet(path).collect()
         }
       }
   
       // native_datafusion silently returns a shifted value
       withSQLConf(CometConf.COMET_NATIVE_SCAN_IMPL.key -> 
CometConf.SCAN_NATIVE_DATAFUSION) {
         val rows = spark.read.schema("ts 
timestamp_ntz").parquet(path).collect()
         val actual = rows.head.getAs[LocalDateTime](0)
         // actual != LocalDateTime.parse("2020-01-01T12:00:00")
         // The value is silently wrong — shifted by the timezone offset
       }
     }
   }
   ```
   
   ## Expected Behavior
   
   Comet should match Spark's behavior and raise an error when asked to read 
INT96 timestamps as TimestampNTZ, since the LTZ→NTZ reinterpretation cannot be 
done safely without explicit conversion.
   
   ## Actual Behavior
   
   The native DataFusion scan returns a result without error, but the timestamp 
value is silently incorrect (shifted by the session timezone offset).
   
   ## Related
   
   - SPARK-36182
   - https://github.com/apache/datafusion-comet/issues/3720


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to