(spark) branch master updated: [SPARK-52954][PYTHON][TESTS][FOLLOW-UP] Alway set safe_check=True in Arrow UDFs

gurwls223 Wed, 30 Jul 2025 04:16:11 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 5cf3c3204440 [SPARK-52954][PYTHON][TESTS][FOLLOW-UP] Alway set 
safe_check=True in Arrow UDFs
5cf3c3204440 is described below

commit 5cf3c320444031229517acac979d4e412f392673
Author: Ruifeng Zheng <ruife...@apache.org>
AuthorDate: Wed Jul 30 20:15:12 2025 +0900

    [SPARK-52954][PYTHON][TESTS][FOLLOW-UP] Alway set safe_check=True in Arrow 
UDFs
    
    ### What changes were proposed in this pull request?
    Alway set safe_check=True in Arrow UDFs
    
    ### Why are the changes needed?
    always checks issues like overflow, when the type coercion is needed
    
    ### Does this PR introduce _any_ user-facing change?
    no, this feature is not yet released
    
    ### How was this patch tested?
    existing tests
    
    ### Was this patch authored or co-authored using generative AI tooling?
    no
    
    Closes #51722 from zhengruifeng/arrow_always_safecheck.
    
    Authored-by: Ruifeng Zheng <ruife...@apache.org>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 python/pyspark/worker.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/worker.py b/python/pyspark/worker.py
index be49e527664f..d330de85c4ee 100644
--- a/python/pyspark/worker.py
+++ b/python/pyspark/worker.py
@@ -2171,8 +2171,8 @@ def read_udfs(pickleSer, infile, eval_type):
             PythonEvalType.SQL_GROUPED_AGG_ARROW_UDF,
             PythonEvalType.SQL_WINDOW_AGG_ARROW_UDF,
         ):
-            # Arrow cast for type coercion is enabled by default
-            ser = ArrowStreamArrowUDFSerializer(timezone, safecheck, 
_assign_cols_by_name, True)
+            # Arrow cast and safe check are always enabled
+            ser = ArrowStreamArrowUDFSerializer(timezone, True, 
_assign_cols_by_name, True)
         elif (
             eval_type == PythonEvalType.SQL_ARROW_BATCHED_UDF
             and not use_legacy_pandas_udf_conversion(runner_conf)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-52954][PYTHON][TESTS][FOLLOW-UP] Alway set safe_check=True in Arrow UDFs

Reply via email to