andygrove opened a new pull request, #4092:
URL: https://github.com/apache/datafusion-comet/pull/4092
## Which issue does this PR close?
Closes #3313.
Closes #3314.
## Rationale for this change
Issues #3313 and #3314 tracked four Spark 4 SQL tests ignored via
`IgnoreCometNativeDataFusion` because they failed with `native_datafusion`
in auto scan mode.
Three of the four now pass on `main` thanks to recent work:
- `DynamicPartitionPruningV1SuiteAEOff`: "Subquery reuse across the whole
plan"
passes after the non-AQE DPP / `CometSubqueryBroadcastExec` work in #4011
and
the subquery reuse fix in #4053.
- `FileBasedDataSourceSuite`: "Enabling/disabling ignoreMissingFiles using
parquet" passes because `ShimSparkErrorConverter` translates the native
`FileNotFound` JSON payload into Spark's
`FAILED_READ_FILE.FILE_NOT_EXIST` error class.
- `SimpleSQLViewSuite`: "alter temporary view should follow current
storeAnalyzedPlanForView config" passes for the same reason.
The fourth test (`ParquetV1QuerySuite`: "Enabling/disabling
ignoreCorruptFiles")
still failed because of a bug in `CometExecIterator` exposed by Spark 4's
strict error-parameter validation. When `native_datafusion` raises a parquet
error, the iterator wraps it with `_LEGACY_ERROR_TEMP_2254` and a `message`
parameter. That error class has zero placeholders, so Spark 4's
`SparkException` constructor raises `INTERNAL_ERROR` ("Found unused message
parameters of the error class '_LEGACY_ERROR_TEMP_2254'") before the intended
exception is ever thrown, hiding the cause-chain entry that carries
"is not a Parquet file" which the test asserts on.
## What changes are included in this PR?
- `CometExecIterator.scala`: drop the unused `message` map entry so the
`SparkException` constructs successfully under Spark 4's strict checks. The
cause-chain (which already carried the underlying error and the
"File is not a Parquet file." marker) is preserved.
- `dev/diffs/4.0.1.diff`: remove the four `IgnoreCometNativeDataFusion` tags
for the now-passing tests, regenerated against `v4.0.1`.
## How are these changes tested?
Verified locally with Spark 4.0.1 source plus the regenerated diff, running
each test under `ENABLE_COMET=true ENABLE_COMET_ONHEAP=true` so
`native_datafusion` is used in auto scan mode:
- `sql/testOnly org.apache.spark.sql.DynamicPartitionPruningV1SuiteAEOff --
-z "Subquery reuse across the whole plan"`
- `sql/testOnly org.apache.spark.sql.FileBasedDataSourceSuite -- -z
"Enabling/disabling ignoreMissingFiles using parquet"`
- `sql/testOnly org.apache.spark.sql.execution.SimpleSQLViewSuite -- -z
"alter temporary view should follow current storeAnalyzedPlanForView config"`
- `sql/testOnly
org.apache.spark.sql.execution.datasources.parquet.ParquetV1QuerySuite -- -z
"Enabling/disabling ignoreCorruptFiles"`
- `sql/testOnly
org.apache.spark.sql.execution.datasources.parquet.ParquetV2QuerySuite -- -z
"Enabling/disabling ignoreCorruptFiles"`
All five pass. CI Spark SQL test runs cover the same matrix.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]