andygrove opened a new issue, #4089:
URL: https://github.com/apache/datafusion-comet/issues/4089

   ## Description
   
   When the `native_datafusion` scan reads a Parquet column written as 
`Decimal(10, 2)` under a requested read schema of `Decimal(5, 0)`, it silently 
succeeds. The lower precision/scale cannot represent values like `123.45`, so 
reading should either throw (matching Spark) or at minimum validate that values 
fit.
   
   ## Reproduction
   
   ```scala
   withSQLConf(
     CometConf.COMET_NATIVE_SCAN_IMPL.key -> CometConf.SCAN_NATIVE_DATAFUSION,
     SQLConf.USE_V1_SOURCE_LIST.key -> "parquet") {
     withTempPath { dir =>
       val path = dir.getCanonicalPath
       Seq(BigDecimal("123.45"), BigDecimal("67.89")).toDF("d")
         .selectExpr("cast(d as decimal(10,2)) as d")
         .write.parquet(path)
       val df = spark.read.schema("d decimal(5,0)").parquet(path)
       df.show() // succeeds; whether the returned values are sensible has not 
been verified
     }
   }
   ```
   
   `native_iceberg_compat` correctly throws `SparkException` for this case 
(matches Spark, see SPARK-34212).
   
   ## Affected versions
   
   All supported Spark profiles (3.4, 3.5, 4.0). Reproduced on Comet `main` 
while building #4087.
   
   ## Expected behavior
   
   Either:
   1. Throw a `SparkException` matching Spark's vectorized reader behavior when 
the read precision/scale cannot represent the file's precision/scale (preferred 
for parity).
   2. Validate values at read time and throw on overflow.
   
   ## Test coverage
   
   Documented in `ParquetSchemaMismatchSuite` (added in #4087) under the test 
name `decimal(10,2) read as decimal(5,0): native_datafusion`. The test 
currently asserts only that the read succeeds; values are not validated. When 
this is fixed, both the assertion and the matrix in the file header must be 
updated.
   
   ## Parent issue
   
   Split from #3720.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to