andygrove opened a new issue, #4089:
URL: https://github.com/apache/datafusion-comet/issues/4089
## Description
When the `native_datafusion` scan reads a Parquet column written as
`Decimal(10, 2)` under a requested read schema of `Decimal(5, 0)`, it silently
succeeds. The lower precision/scale cannot represent values like `123.45`, so
reading should either throw (matching Spark) or at minimum validate that values
fit.
## Reproduction
```scala
withSQLConf(
CometConf.COMET_NATIVE_SCAN_IMPL.key -> CometConf.SCAN_NATIVE_DATAFUSION,
SQLConf.USE_V1_SOURCE_LIST.key -> "parquet") {
withTempPath { dir =>
val path = dir.getCanonicalPath
Seq(BigDecimal("123.45"), BigDecimal("67.89")).toDF("d")
.selectExpr("cast(d as decimal(10,2)) as d")
.write.parquet(path)
val df = spark.read.schema("d decimal(5,0)").parquet(path)
df.show() // succeeds; whether the returned values are sensible has not
been verified
}
}
```
`native_iceberg_compat` correctly throws `SparkException` for this case
(matches Spark, see SPARK-34212).
## Affected versions
All supported Spark profiles (3.4, 3.5, 4.0). Reproduced on Comet `main`
while building #4087.
## Expected behavior
Either:
1. Throw a `SparkException` matching Spark's vectorized reader behavior when
the read precision/scale cannot represent the file's precision/scale (preferred
for parity).
2. Validate values at read time and throw on overflow.
## Test coverage
Documented in `ParquetSchemaMismatchSuite` (added in #4087) under the test
name `decimal(10,2) read as decimal(5,0): native_datafusion`. The test
currently asserts only that the read succeeds; values are not validated. When
this is fixed, both the assertion and the matrix in the file header must be
updated.
## Parent issue
Split from #3720.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]