(spark) branch master updated: [SPARK-53059][PYTHON] Arrow UDF no need to depend on pandas

ruifengz Fri, 01 Aug 2025 08:24:48 -0700

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 49dc0a7ab72a [SPARK-53059][PYTHON] Arrow UDF no need to depend on 
pandas
49dc0a7ab72a is described below

commit 49dc0a7ab72ad4e94b53d92e79fc66ada06dd120
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Fri Aug 1 23:24:25 2025 +0800

    [SPARK-53059][PYTHON] Arrow UDF no need to depend on pandas
    
    ### What changes were proposed in this pull request?
    Arrow UDF no need to depend on pandas
    
    ### Why are the changes needed?
    Arrow UDF doesn't have to `require_minimum_pandas_version`
    
    ### Does this PR introduce _any_ user-facing change?
    no
    
    ### How was this patch tested?
    ci
    
    ### Was this patch authored or co-authored using generative AI tooling?
    no
    
    Closes #51767 from zhengruifeng/arrow_udf_dep.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Ruifeng Zheng <[email protected]>
---
 python/pyspark/sql/pandas/functions.py | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/python/pyspark/sql/pandas/functions.py 
b/python/pyspark/sql/pandas/functions.py
index e45ef049f9a9..09e283ba21da 100644
--- a/python/pyspark/sql/pandas/functions.py
+++ b/python/pyspark/sql/pandas/functions.py
@@ -322,6 +322,8 @@ def arrow_udf(f=None, returnType=None, functionType=None):
     pyspark.sql.PandasCogroupedOps.applyInArrow
     pyspark.sql.UDFRegistration.register
     """
+    require_minimum_pyarrow_version()
+
     return vectorized_udf(f, returnType, functionType, "arrow")
 
 
@@ -660,6 +662,9 @@ def pandas_udf(f=None, returnType=None, functionType=None):
     # Note: Python 3.11.9, Pandas 2.2.3 and PyArrow 17.0.0 are used.
     # Note: Timezone is KST.
     # Note: 'X' means it throws an exception during the conversion.
+    require_minimum_pandas_version()
+    require_minimum_pyarrow_version()
+
     return vectorized_udf(f, returnType, functionType, "pandas")
 
 
@@ -669,9 +674,6 @@ def vectorized_udf(
     functionType=None,
     kind: str = "pandas",
 ):
-    require_minimum_pandas_version()
-    require_minimum_pyarrow_version()
-
     assert kind in ["pandas", "arrow"], "kind should be either 'pandas' or 
'arrow'"
 
     # decorator @pandas_udf(returnType, functionType)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-53059][PYTHON] Arrow UDF no need to depend on pandas

Reply via email to