andygrove opened a new issue, #4352:
URL: https://github.com/apache/datafusion-comet/issues/4352
## Description
Several Spark tests in `ParquetTypeWideningSuite` and one in
`ParquetQuerySuite` assert that Spark's parquet-mr reader silently truncates /
overflows / returns null when the requested schema is narrower than the file's
schema, and that this only happens on the *non-vectorized* path
(`PARQUET_VECTORIZED_READER_ENABLED = false`). When the vectorized reader is
on, the same conversions throw `SchemaColumnConvertNotSupportedException`.
`native_datafusion` always rejects these conversions (mirroring the
vectorized-reader branch via `schema_adapter.rs`), so the tests fail on the
non-vectorized branch where Spark's parquet-mr would have silently produced
wrong-but-tolerated output.
This is an architectural difference, not a fixable bug in the rejection
logic — Comet has no parquet-mr-equivalent backend that produces
silent-overflow results. The schema-adapter changes in #4297, #4343, #4344 are
correct; these tests just have to be ignored under `native_datafusion`
until/unless someone adds a permissive non-vectorized fallback.
## Affected tests (Spark 4.1.x)
`org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite`:
- `unsupported parquet conversion *Type -> DecimalType(...)` — 17 cases
(`expectError = vectorized`)
- `parquet decimal precision change Decimal(X, 2) -> Decimal(Y, 2)` — 6
narrowing cases
- `parquet decimal precision and scale change Decimal(X, Y) -> Decimal(A,
B)` — 12 cases
- `parquet decimal type change Decimal(5, 2) -> Decimal(3, 2) overflows with
parquet-mr` — 1 case (specifically tests parquet-mr's null-on-overflow)
Spark 3.4.x / 3.5.x / 4.0.x already carry the first three groups under
`IgnoreCometNativeDataFusion("https://github.com/apache/datafusion-comet/issues/3720")`;
4.1.1's diff unignored them prematurely as part of the schema-adapter work and
they need to be re-ignored. The `overflows with parquet-mr` test is unannotated
in 4.0.2/4.1.1 and needs the same treatment.
## Action
Re-add `IgnoreCometNativeDataFusion(<this issue's URL>)` to the affected
tests in `dev/diffs/4.1.1.diff`, and add it to the `parquet decimal type change
... overflows with parquet-mr` test in `dev/diffs/4.0.2.diff` and
`dev/diffs/4.1.1.diff`.
## Related
- #4297 / #4343 / #4344 — the rejection classes whose schema-adapter
implementation is correct; these tests fail despite the rejection logic working.
- #3720 — parent umbrella.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]