(spark) branch branch-4.1 updated: [SPARK-56584][PYTHON][4.1] Generalize `RESULT_TYPE_MISMATCH_FOR_ARROW_UDF` error class and remove dead `SCHEMA_MISMATCH_FOR_ARROW_PYTHON_UDF`

gurwls223 Sun, 10 May 2026 02:03:20 -0700

This is an automated email from the ASF dual-hosted git repository.

HyukjinKwon pushed a commit to branch branch-4.1
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-4.1 by this push:
     new 723067aead8d [SPARK-56584][PYTHON][4.1] Generalize 
`RESULT_TYPE_MISMATCH_FOR_ARROW_UDF` error class and remove dead 
`SCHEMA_MISMATCH_FOR_ARROW_PYTHON_UDF`
723067aead8d is described below

commit 723067aead8d12e71f6fa8e791025d1566da09f2
Author: Yicong Huang <[email protected]>
AuthorDate: Sun May 10 18:03:02 2026 +0900

    [SPARK-56584][PYTHON][4.1] Generalize `RESULT_TYPE_MISMATCH_FOR_ARROW_UDF` 
error class and remove dead `SCHEMA_MISMATCH_FOR_ARROW_PYTHON_UDF`
    
    ### What changes were proposed in this pull request?
    
    Backport of #55494 to branch-4.1.
    
    The original change:
    1. Renames error class `RESULT_TYPE_MISMATCH_FOR_ARROW_UDF` to 
`RESULT_COLUMN_TYPES_MISMATCH` (parallel to `RESULT_COLUMN_NAMES_MISMATCH` / 
`RESULT_COLUMN_SCHEMA_MISMATCH`).
    2. Rewords the message from `Columns do not match in their data type: 
<mismatch>.` to `Column types of the returned data do not match specified 
schema. Mismatch: <mismatch>.` to align with sibling errors.
    3. Removes the dead error class `SCHEMA_MISMATCH_FOR_ARROW_PYTHON_UDF`.
    
    ### Why are the changes needed?
    
    This restores message parity between master server and branch-4.1 client. 
The scheduled cross-version Connect parity build was failing because master 
raises the new `RESULT_COLUMN_TYPES_MISMATCH` text while branch-4.1 (and 
branch-4.0) clients still assert the old "Columns do not match in their data 
type" text:
    
    https://github.com/apache/spark/actions/runs/25187494316
    
    Backporting keeps the Arrow result-verify error class name and message 
consistent across maintained branches and unblocks cross-version parity tests.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes (same as #55494). User-visible error class name and message for result 
column type mismatches in Arrow UDFs change on branch-4.1.
    
    ### How was this patch tested?
    
    Existing tests; cherry-pick applied cleanly with no conflicts. Asserts in 
`test_arrow_grouped_map.py` / `test_arrow_cogrouped_map.py` already match the 
new message.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #55670 from Yicong-Huang/SPARK-56584-4.1.
    
    Authored-by: Yicong Huang <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 python/pyspark/errors/error-conditions.json                | 13 ++++---------
 python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py |  6 ++++--
 python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py   |  6 ++++--
 python/pyspark/worker.py                                   |  2 +-
 4 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/python/pyspark/errors/error-conditions.json 
b/python/pyspark/errors/error-conditions.json
index 4cdddb7da3ef..3c264f851a2e 100644
--- a/python/pyspark/errors/error-conditions.json
+++ b/python/pyspark/errors/error-conditions.json
@@ -1002,14 +1002,14 @@
       "Number of columns of the returned data doesn't match specified schema. 
Expected: <expected> Actual: <actual>"
     ]
   },
-  "RESULT_ROWS_MISMATCH": {
+  "RESULT_COLUMN_TYPES_MISMATCH": {
     "message": [
-      "The number of output rows (<output_length>) must match the number of 
input rows (<input_length>)."
+      "Column types of the returned data do not match specified schema. 
Mismatch: <mismatch>."
     ]
   },
-  "RESULT_TYPE_MISMATCH_FOR_ARROW_UDF": {
+  "RESULT_ROWS_MISMATCH": {
     "message": [
-      "Columns do not match in their data type: <mismatch>."
+      "The number of output rows (<output_length>) must match the number of 
input rows (<input_length>)."
     ]
   },
   "REUSE_OBSERVATION": {
@@ -1017,11 +1017,6 @@
       "An Observation can be used with a DataFrame only once."
     ]
   },
-  "SCHEMA_MISMATCH_FOR_ARROW_PYTHON_UDF": {
-    "message": [
-      "Result vector from <udf_type> was not the required length: expected 
<expected>, got <actual>."
-    ]
-  },
   "SCHEMA_MISMATCH_FOR_PANDAS_UDF": {
     "message": [
       "Result vector from <udf_type> was not the required length: expected 
<expected>, got <actual>."
diff --git a/python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py 
b/python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py
index 5f256ece6e91..c52a86265363 100644
--- a/python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py
+++ b/python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py
@@ -151,7 +151,8 @@ class CogroupedMapInArrowTestsMixin:
                 with self.quiet():
                     with self.assertRaisesRegex(
                         PythonException,
-                        f"Columns do not match in their data type: {expected}",
+                        "Column types of the returned data do not match 
specified schema. "
+                        f"Mismatch: {expected}",
                     ):
                         self.cogrouped.applyInArrow(
                             lambda left, right: left, schema=schema
@@ -175,7 +176,8 @@ class CogroupedMapInArrowTestsMixin:
                     with self.quiet():
                         with self.assertRaisesRegex(
                             PythonException,
-                            f"Columns do not match in their data type: 
{expected}",
+                            "Column types of the returned data do not match 
specified schema. "
+                            f"Mismatch: {expected}",
                         ):
                             self.cogrouped.applyInArrow(
                                 lambda left, right: left, schema=schema
diff --git a/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py 
b/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py
index 09554a5fa9ea..a78dc4b1c8dc 100644
--- a/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py
+++ b/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py
@@ -175,7 +175,8 @@ class ApplyInArrowTestsMixin:
                     for func_variation in function_variations(lambda table: 
table):
                         with self.assertRaisesRegex(
                             PythonException,
-                            f"Columns do not match in their data type: 
{expected}",
+                            "Column types of the returned data do not match 
specified schema. "
+                            f"Mismatch: {expected}",
                         ):
                             df.groupby("id").applyInArrow(func_variation, 
schema=schema).collect()
 
@@ -200,7 +201,8 @@ class ApplyInArrowTestsMixin:
                         for func_variation in function_variations(lambda 
table: table):
                             with self.assertRaisesRegex(
                                 PythonException,
-                                f"Columns do not match in their data type: 
{expected}",
+                                "Column types of the returned data do not 
match specified schema. "
+                                f"Mismatch: {expected}",
                             ):
                                 df.groupby("id").applyInArrow(
                                     func_variation, schema=schema
diff --git a/python/pyspark/worker.py b/python/pyspark/worker.py
index 8570e1186c5e..6d9299ff80ac 100644
--- a/python/pyspark/worker.py
+++ b/python/pyspark/worker.py
@@ -632,7 +632,7 @@ def verify_arrow_result(result, assign_cols_by_name, 
expected_cols_and_types):
 
         if type_mismatch:
             raise PySparkRuntimeError(
-                errorClass="RESULT_TYPE_MISMATCH_FOR_ARROW_UDF",
+                errorClass="RESULT_COLUMN_TYPES_MISMATCH",
                 messageParameters={
                     "mismatch": ", ".join(
                         "column '{}' (expected {}, actual {})".format(name, 
expected, actual)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch branch-4.1 updated: [SPARK-56584][PYTHON][4.1] Generalize `RESULT_TYPE_MISMATCH_FOR_ARROW_UDF` error class and remove dead `SCHEMA_MISMATCH_FOR_ARROW_PYTHON_UDF`

Reply via email to