This is an automated email from the ASF dual-hosted git repository.
HyukjinKwon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 9940da96f254 [SPARK-56860][PYTHON] Remove unused
CogroupArrowUDFSerializer
9940da96f254 is described below
commit 9940da96f254db3ec8f1475e78b33d1fd793ae07
Author: Yicong Huang <[email protected]>
AuthorDate: Tue May 19 07:09:16 2026 +0900
[SPARK-56860][PYTHON] Remove unused CogroupArrowUDFSerializer
### What changes were proposed in this pull request?
Delete `CogroupArrowUDFSerializer` from
`python/pyspark/sql/pandas/serializers.py`.
### Why are the changes needed?
`CogroupArrowUDFSerializer` is no longer used after SPARK-56312 refactored
`SQL_COGROUPED_MAP_ARROW_UDF` to use `ArrowStreamCoGroupSerializer` directly.
This class can be safely deleted.
Part of SPARK-55384.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing tests: `pyspark.sql.tests.arrow.test_arrow_cogrouped_map`.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #55868 from
Yicong-Huang/SPARK-56860/cleanup/cogroup-arrow-udf-serializer.
Authored-by: Yicong Huang <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/sql/pandas/serializers.py | 20 --------------------
1 file changed, 20 deletions(-)
diff --git a/python/pyspark/sql/pandas/serializers.py
b/python/pyspark/sql/pandas/serializers.py
index 6f22c3e84047..55d874aaa506 100644
--- a/python/pyspark/sql/pandas/serializers.py
+++ b/python/pyspark/sql/pandas/serializers.py
@@ -654,26 +654,6 @@ class
ArrowStreamAggPandasUDFSerializer(ArrowStreamPandasUDFSerializer):
return "ArrowStreamAggPandasUDFSerializer"
-class CogroupArrowUDFSerializer(ArrowStreamGroupUDFSerializer):
- """
- Serializes pyarrow.RecordBatch data with Arrow streaming format.
-
- Loads Arrow record batches as `[([pa.RecordBatch], [pa.RecordBatch])]`
(one tuple per group)
- and serializes `[([pa.RecordBatch], arrow_type)]`.
-
- Parameters
- ----------
- assign_cols_by_name : bool
- If True, then DataFrames will get columns by name
- """
-
- def load_stream(self, stream):
- """
- Deserialize Cogrouped ArrowRecordBatches and yield as two
`pyarrow.RecordBatch`es.
- """
- yield from ArrowStreamCoGroupSerializer.load_stream(self, stream)
-
-
class CogroupPandasUDFSerializer(ArrowStreamPandasUDFSerializer):
def load_stream(self, stream):
"""
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]