This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 388335d0d72c [MINOR][PYTHON][DOCS] Fix a pandas UDF example
388335d0d72c is described below
commit 388335d0d72c01e74ef887a89906f4ec735fedea
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Thu Jul 31 14:31:05 2025 +0900
[MINOR][PYTHON][DOCS] Fix a pandas UDF example
### What changes were proposed in this pull request?
Fix a pandas UDF example
### Why are the changes needed?
the original output is not correct
### Does this PR introduce _any_ user-facing change?
yes, doc-only
### How was this patch tested?
manually check
### Was this patch authored or co-authored using generative AI tooling?
no
Closes #51738 from zhengruifeng/minor_pandas_example.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/sql/pandas/functions.py | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/python/pyspark/sql/pandas/functions.py
b/python/pyspark/sql/pandas/functions.py
index 4a2e6db3b99f..1a07ea0deac3 100644
--- a/python/pyspark/sql/pandas/functions.py
+++ b/python/pyspark/sql/pandas/functions.py
@@ -388,21 +388,26 @@ def pandas_udf(f=None, returnType=None,
functionType=None):
`pandas.DataFrame` as below:
>>> @pandas_udf("col1 string, col2 long")
- >>> def func(s1: pd.Series, s2: pd.Series, s3: pd.DataFrame) ->
pd.DataFrame:
+ ... def func(s1: pd.Series, s2: pd.Series, s3: pd.DataFrame) ->
pd.DataFrame:
... s3['col2'] = s1 + s2.str.len()
... return s3
- ...
- >>> # Create a Spark DataFrame that has three columns including a struct
column.
- ... df = spark.createDataFrame(
+
+
+ Create a Spark DataFrame that has three columns including a struct column.
+
+ >>> df = spark.createDataFrame(
... [[1, "a string", ("a nested string",)]],
... "long_col long, string_col string, struct_col struct<col1:string>")
+
>>> df.printSchema()
root
- |-- long_column: long (nullable = true)
- |-- string_column: string (nullable = true)
- |-- struct_column: struct (nullable = true)
+ |-- long_col: long (nullable = true)
+ |-- string_col: string (nullable = true)
+ |-- struct_col: struct (nullable = true)
| |-- col1: string (nullable = true)
+
>>> df.select(func("long_col", "string_col", "struct_col")).printSchema()
+ root
|-- func(long_col, string_col, struct_col): struct (nullable = true)
| |-- col1: string (nullable = true)
| |-- col2: long (nullable = true)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]