andygrove opened a new pull request, #4090: URL: https://github.com/apache/datafusion-comet/pull/4090
## Which issue does this PR close? Closes #4089. ## Rationale for this change When the `native_datafusion` scan reads a Parquet column with a higher-precision decimal physical type under a requested read schema with smaller precision or different scale, the existing schema adapter falls through to Spark's `Cast` expression. `Cast` happily rescales or truncates, producing wrong values silently. Spark's vectorized reader rejects this with `SchemaColumnConvertNotSupportedException`, and `native_iceberg_compat` already does the same via `TypeUtil.checkParquetType`. The native scan should match. ## What changes are included in this PR? `native/core/src/parquet/schema_adapter.rs`: in `replace_with_spark_cast`, add a guard before the existing branches that returns a `DataFusionError::Plan` when both `physical_type` and `target_type` are `Decimal128` and either the target precision is smaller than the source precision or the scales differ. The rule mirrors Spark's `TypeUtil.isDecimalTypeMatched` (Spark 3.x path). The check is intentionally narrow: it only fires for decimal-to-decimal mismatches, leaving every other type path unchanged. The closed prior attempt at broad schema validation in #3311 broke unrelated tests; this one does not. Spark 4.0's decimal type widening (allowing strictly-larger target precision/scale per SPARK-37147) is not implemented here. For #4089's specific case (`Decimal(10,2)` read as `Decimal(5,0)`) Spark 4.0 also rejects, so behavior is correct on all supported versions. Implementing 4.0's wider widening is a follow-up. ## How are these changes tested? Added a focused test to `ParquetReadSuite`: `native_datafusion rejects incompatible decimal precision/scale`. It writes `Decimal(10, 2)` data, reads it under `Decimal(5, 0)`, forces `spark.comet.scan.impl=native_datafusion` and `spark.sql.sources.useV1SourceList=parquet`, and asserts that `collect()` raises `SparkException`. Verified against `ParquetReadV1Suite` (44 tests, all pass; 1 pre-existing test ignored). The behavior is also covered by the per-impl matrix added in #4087, which will need its `decimal(10,2) read as decimal(5,0): native_datafusion` assertion flipped from "succeeds" to "throws" in a follow-up once that PR merges. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
