This is an automated email from the ASF dual-hosted git repository.
HyukjinKwon pushed a commit to branch branch-4.1
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.1 by this push:
new 723067aead8d [SPARK-56584][PYTHON][4.1] Generalize
`RESULT_TYPE_MISMATCH_FOR_ARROW_UDF` error class and remove dead
`SCHEMA_MISMATCH_FOR_ARROW_PYTHON_UDF`
723067aead8d is described below
commit 723067aead8d12e71f6fa8e791025d1566da09f2
Author: Yicong Huang <[email protected]>
AuthorDate: Sun May 10 18:03:02 2026 +0900
[SPARK-56584][PYTHON][4.1] Generalize `RESULT_TYPE_MISMATCH_FOR_ARROW_UDF`
error class and remove dead `SCHEMA_MISMATCH_FOR_ARROW_PYTHON_UDF`
### What changes were proposed in this pull request?
Backport of #55494 to branch-4.1.
The original change:
1. Renames error class `RESULT_TYPE_MISMATCH_FOR_ARROW_UDF` to
`RESULT_COLUMN_TYPES_MISMATCH` (parallel to `RESULT_COLUMN_NAMES_MISMATCH` /
`RESULT_COLUMN_SCHEMA_MISMATCH`).
2. Rewords the message from `Columns do not match in their data type:
<mismatch>.` to `Column types of the returned data do not match specified
schema. Mismatch: <mismatch>.` to align with sibling errors.
3. Removes the dead error class `SCHEMA_MISMATCH_FOR_ARROW_PYTHON_UDF`.
### Why are the changes needed?
This restores message parity between master server and branch-4.1 client.
The scheduled cross-version Connect parity build was failing because master
raises the new `RESULT_COLUMN_TYPES_MISMATCH` text while branch-4.1 (and
branch-4.0) clients still assert the old "Columns do not match in their data
type" text:
https://github.com/apache/spark/actions/runs/25187494316
Backporting keeps the Arrow result-verify error class name and message
consistent across maintained branches and unblocks cross-version parity tests.
### Does this PR introduce _any_ user-facing change?
Yes (same as #55494). User-visible error class name and message for result
column type mismatches in Arrow UDFs change on branch-4.1.
### How was this patch tested?
Existing tests; cherry-pick applied cleanly with no conflicts. Asserts in
`test_arrow_grouped_map.py` / `test_arrow_cogrouped_map.py` already match the
new message.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #55670 from Yicong-Huang/SPARK-56584-4.1.
Authored-by: Yicong Huang <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/errors/error-conditions.json | 13 ++++---------
python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py | 6 ++++--
python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py | 6 ++++--
python/pyspark/worker.py | 2 +-
4 files changed, 13 insertions(+), 14 deletions(-)
diff --git a/python/pyspark/errors/error-conditions.json
b/python/pyspark/errors/error-conditions.json
index 4cdddb7da3ef..3c264f851a2e 100644
--- a/python/pyspark/errors/error-conditions.json
+++ b/python/pyspark/errors/error-conditions.json
@@ -1002,14 +1002,14 @@
"Number of columns of the returned data doesn't match specified schema.
Expected: <expected> Actual: <actual>"
]
},
- "RESULT_ROWS_MISMATCH": {
+ "RESULT_COLUMN_TYPES_MISMATCH": {
"message": [
- "The number of output rows (<output_length>) must match the number of
input rows (<input_length>)."
+ "Column types of the returned data do not match specified schema.
Mismatch: <mismatch>."
]
},
- "RESULT_TYPE_MISMATCH_FOR_ARROW_UDF": {
+ "RESULT_ROWS_MISMATCH": {
"message": [
- "Columns do not match in their data type: <mismatch>."
+ "The number of output rows (<output_length>) must match the number of
input rows (<input_length>)."
]
},
"REUSE_OBSERVATION": {
@@ -1017,11 +1017,6 @@
"An Observation can be used with a DataFrame only once."
]
},
- "SCHEMA_MISMATCH_FOR_ARROW_PYTHON_UDF": {
- "message": [
- "Result vector from <udf_type> was not the required length: expected
<expected>, got <actual>."
- ]
- },
"SCHEMA_MISMATCH_FOR_PANDAS_UDF": {
"message": [
"Result vector from <udf_type> was not the required length: expected
<expected>, got <actual>."
diff --git a/python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py
b/python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py
index 5f256ece6e91..c52a86265363 100644
--- a/python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py
+++ b/python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py
@@ -151,7 +151,8 @@ class CogroupedMapInArrowTestsMixin:
with self.quiet():
with self.assertRaisesRegex(
PythonException,
- f"Columns do not match in their data type: {expected}",
+ "Column types of the returned data do not match
specified schema. "
+ f"Mismatch: {expected}",
):
self.cogrouped.applyInArrow(
lambda left, right: left, schema=schema
@@ -175,7 +176,8 @@ class CogroupedMapInArrowTestsMixin:
with self.quiet():
with self.assertRaisesRegex(
PythonException,
- f"Columns do not match in their data type:
{expected}",
+ "Column types of the returned data do not match
specified schema. "
+ f"Mismatch: {expected}",
):
self.cogrouped.applyInArrow(
lambda left, right: left, schema=schema
diff --git a/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py
b/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py
index 09554a5fa9ea..a78dc4b1c8dc 100644
--- a/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py
+++ b/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py
@@ -175,7 +175,8 @@ class ApplyInArrowTestsMixin:
for func_variation in function_variations(lambda table:
table):
with self.assertRaisesRegex(
PythonException,
- f"Columns do not match in their data type:
{expected}",
+ "Column types of the returned data do not match
specified schema. "
+ f"Mismatch: {expected}",
):
df.groupby("id").applyInArrow(func_variation,
schema=schema).collect()
@@ -200,7 +201,8 @@ class ApplyInArrowTestsMixin:
for func_variation in function_variations(lambda
table: table):
with self.assertRaisesRegex(
PythonException,
- f"Columns do not match in their data type:
{expected}",
+ "Column types of the returned data do not
match specified schema. "
+ f"Mismatch: {expected}",
):
df.groupby("id").applyInArrow(
func_variation, schema=schema
diff --git a/python/pyspark/worker.py b/python/pyspark/worker.py
index 8570e1186c5e..6d9299ff80ac 100644
--- a/python/pyspark/worker.py
+++ b/python/pyspark/worker.py
@@ -632,7 +632,7 @@ def verify_arrow_result(result, assign_cols_by_name,
expected_cols_and_types):
if type_mismatch:
raise PySparkRuntimeError(
- errorClass="RESULT_TYPE_MISMATCH_FOR_ARROW_UDF",
+ errorClass="RESULT_COLUMN_TYPES_MISMATCH",
messageParameters={
"mismatch": ", ".join(
"column '{}' (expected {}, actual {})".format(name,
expected, actual)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]