[PR] fix: enable Spark 4 SQL tests previously ignored for issues #3313 and #3314 [datafusion-comet]

via GitHub Sun, 26 Apr 2026 06:38:10 -0700


andygrove opened a new pull request, #4092:
URL: https://github.com/apache/datafusion-comet/pull/4092


   ## Which issue does this PR close?
   
   Closes #3313.
   Closes #3314.
   
   ## Rationale for this change
   
   Issues #3313 and #3314 tracked four Spark 4 SQL tests ignored via
   `IgnoreCometNativeDataFusion` because they failed with `native_datafusion`
   in auto scan mode.
   
   Three of the four now pass on `main` thanks to recent work:
   
   - `DynamicPartitionPruningV1SuiteAEOff`: "Subquery reuse across the whole 
plan"
     passes after the non-AQE DPP / `CometSubqueryBroadcastExec` work in #4011 
and
     the subquery reuse fix in #4053.
   - `FileBasedDataSourceSuite`: "Enabling/disabling ignoreMissingFiles using
     parquet" passes because `ShimSparkErrorConverter` translates the native
     `FileNotFound` JSON payload into Spark's
     `FAILED_READ_FILE.FILE_NOT_EXIST` error class.
   - `SimpleSQLViewSuite`: "alter temporary view should follow current
     storeAnalyzedPlanForView config" passes for the same reason.
   
   The fourth test (`ParquetV1QuerySuite`: "Enabling/disabling 
ignoreCorruptFiles")
   still failed because of a bug in `CometExecIterator` exposed by Spark 4's
   strict error-parameter validation. When `native_datafusion` raises a parquet
   error, the iterator wraps it with `_LEGACY_ERROR_TEMP_2254` and a `message`
   parameter. That error class has zero placeholders, so Spark 4's
   `SparkException` constructor raises `INTERNAL_ERROR` ("Found unused message
   parameters of the error class '_LEGACY_ERROR_TEMP_2254'") before the intended
   exception is ever thrown, hiding the cause-chain entry that carries
   "is not a Parquet file" which the test asserts on.
   
   ## What changes are included in this PR?
   
   - `CometExecIterator.scala`: drop the unused `message` map entry so the
     `SparkException` constructs successfully under Spark 4's strict checks. The
     cause-chain (which already carried the underlying error and the
     "File is not a Parquet file." marker) is preserved.
   - `dev/diffs/4.0.1.diff`: remove the four `IgnoreCometNativeDataFusion` tags
     for the now-passing tests, regenerated against `v4.0.1`.
   
   ## How are these changes tested?
   
   Verified locally with Spark 4.0.1 source plus the regenerated diff, running
   each test under `ENABLE_COMET=true ENABLE_COMET_ONHEAP=true` so
   `native_datafusion` is used in auto scan mode:
   
   - `sql/testOnly org.apache.spark.sql.DynamicPartitionPruningV1SuiteAEOff -- 
-z "Subquery reuse across the whole plan"`
   - `sql/testOnly org.apache.spark.sql.FileBasedDataSourceSuite -- -z 
"Enabling/disabling ignoreMissingFiles using parquet"`
   - `sql/testOnly org.apache.spark.sql.execution.SimpleSQLViewSuite -- -z 
"alter temporary view should follow current storeAnalyzedPlanForView config"`
   - `sql/testOnly 
org.apache.spark.sql.execution.datasources.parquet.ParquetV1QuerySuite -- -z 
"Enabling/disabling ignoreCorruptFiles"`
   - `sql/testOnly 
org.apache.spark.sql.execution.datasources.parquet.ParquetV2QuerySuite -- -z 
"Enabling/disabling ignoreCorruptFiles"`
   
   All five pass. CI Spark SQL test runs cover the same matrix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] fix: enable Spark 4 SQL tests previously ignored for issues #3313 and #3314 [datafusion-comet]

Reply via email to