This is an automated email from the ASF dual-hosted git repository.
HyukjinKwon pushed a commit to branch branch-4.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.0 by this push:
new 04fcfb737da6 [SPARK-56584][PYTHON][4.0] Generalize
`RESULT_TYPE_MISMATCH_FOR_ARROW_UDF` error class and remove dead
`SCHEMA_MISMATCH_FOR_ARROW_PYTHON_UDF`
04fcfb737da6 is described below
commit 04fcfb737da6bb913914d7377d21be98cff64a43
Author: Yicong Huang <[email protected]>
AuthorDate: Sun May 10 18:02:06 2026 +0900
[SPARK-56584][PYTHON][4.0] Generalize `RESULT_TYPE_MISMATCH_FOR_ARROW_UDF`
error class and remove dead `SCHEMA_MISMATCH_FOR_ARROW_PYTHON_UDF`
### What changes were proposed in this pull request?
Backport of #55494 to branch-4.0.
The original change:
1. Renames error class `RESULT_TYPE_MISMATCH_FOR_ARROW_UDF` to
`RESULT_COLUMN_TYPES_MISMATCH` (parallel to `RESULT_COLUMN_NAMES_MISMATCH` /
`RESULT_COLUMN_SCHEMA_MISMATCH`).
2. Rewords the message from `Columns do not match in their data type:
<mismatch>.` to `Column types of the returned data do not match specified
schema. Mismatch: <mismatch>.` to align with sibling errors.
3. Removes the dead error class `SCHEMA_MISMATCH_FOR_ARROW_PYTHON_UDF`
(already absent on branch-4.0 — no-op for this branch).
Branch-4.1 backport: #55670.
### Why are the changes needed?
This restores message parity between master server and branch-4.0 client.
The scheduled cross-version Connect parity build was failing because master
raises the new `RESULT_COLUMN_TYPES_MISMATCH` text while branch-4.0 client
tests still assert the old "Columns do not match in their data type" text:
https://github.com/apache/spark/actions/runs/25187494316
Backporting keeps the Arrow result-verify error class name and message
consistent across maintained branches and unblocks cross-version parity tests.
### Conflicts resolved
- `python/pyspark/errors/error-conditions.json`: kept `RETRIES_EXCEEDED`
entry (only present on branch-4.0).
- `python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py`: kept the
branch-4.0 `lambda table: table` direct call form (master uses a
`function_variations(...)` loop helper that is not present on branch-4.0); only
the assertion message text is updated.
### Does this PR introduce _any_ user-facing change?
Yes (same as #55494). User-visible error class name and message for result
column type mismatches in Arrow UDFs change on branch-4.0.
### How was this patch tested?
Existing tests; updated 4 asserts in `test_arrow_grouped_map.py` /
`test_arrow_cogrouped_map.py` match the new message.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #55671 from Yicong-Huang/SPARK-56584-4.0.
Authored-by: Yicong Huang <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/errors/error-conditions.json | 8 ++++----
python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py | 6 ++++--
python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py | 6 ++++--
python/pyspark/worker.py | 2 +-
4 files changed, 13 insertions(+), 9 deletions(-)
diff --git a/python/pyspark/errors/error-conditions.json
b/python/pyspark/errors/error-conditions.json
index 49c5856934d3..a13b371c1c7f 100644
--- a/python/pyspark/errors/error-conditions.json
+++ b/python/pyspark/errors/error-conditions.json
@@ -890,14 +890,14 @@
"Number of columns of the returned data doesn't match specified schema.
Expected: <expected> Actual: <actual>"
]
},
- "RESULT_ROWS_MISMATCH": {
+ "RESULT_COLUMN_TYPES_MISMATCH": {
"message": [
- "The number of output rows (<output_length>) must match the number of
input rows (<input_length>)."
+ "Column types of the returned data do not match specified schema.
Mismatch: <mismatch>."
]
},
- "RESULT_TYPE_MISMATCH_FOR_ARROW_UDF": {
+ "RESULT_ROWS_MISMATCH": {
"message": [
- "Columns do not match in their data type: <mismatch>."
+ "The number of output rows (<output_length>) must match the number of
input rows (<input_length>)."
]
},
"RETRIES_EXCEEDED": {
diff --git a/python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py
b/python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py
index bc45e59639d1..88e01c9d2bba 100644
--- a/python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py
+++ b/python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py
@@ -148,7 +148,8 @@ class CogroupedMapInArrowTestsMixin:
with self.quiet():
with self.assertRaisesRegex(
PythonException,
- f"Columns do not match in their data type: {expected}",
+ "Column types of the returned data do not match
specified schema. "
+ f"Mismatch: {expected}",
):
self.cogrouped.applyInArrow(
lambda left, right: left, schema=schema
@@ -172,7 +173,8 @@ class CogroupedMapInArrowTestsMixin:
with self.quiet():
with self.assertRaisesRegex(
PythonException,
- f"Columns do not match in their data type:
{expected}",
+ "Column types of the returned data do not match
specified schema. "
+ f"Mismatch: {expected}",
):
self.cogrouped.applyInArrow(
lambda left, right: left, schema=schema
diff --git a/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py
b/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py
index 251c60a27f22..94058977376b 100644
--- a/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py
+++ b/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py
@@ -133,7 +133,8 @@ class GroupedMapInArrowTestsMixin:
with self.quiet():
with self.assertRaisesRegex(
PythonException,
- f"Columns do not match in their data type: {expected}",
+ "Column types of the returned data do not match
specified schema. "
+ f"Mismatch: {expected}",
):
df.groupby("id").applyInArrow(lambda table: table,
schema=schema).collect()
@@ -157,7 +158,8 @@ class GroupedMapInArrowTestsMixin:
with self.quiet():
with self.assertRaisesRegex(
PythonException,
- f"Columns do not match in their data type:
{expected}",
+ "Column types of the returned data do not match
specified schema. "
+ f"Mismatch: {expected}",
):
df.groupby("id").applyInArrow(
lambda table: table, schema=schema
diff --git a/python/pyspark/worker.py b/python/pyspark/worker.py
index 7ff60bd0258b..2d2efed09c4f 100644
--- a/python/pyspark/worker.py
+++ b/python/pyspark/worker.py
@@ -475,7 +475,7 @@ def verify_arrow_result(table, assign_cols_by_name,
expected_cols_and_types):
if type_mismatch:
raise PySparkRuntimeError(
- errorClass="RESULT_TYPE_MISMATCH_FOR_ARROW_UDF",
+ errorClass="RESULT_COLUMN_TYPES_MISMATCH",
messageParameters={
"mismatch": ", ".join(
"column '{}' (expected {}, actual {})".format(name,
expected, actual)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]