(spark) branch master updated: [SPARK-55333][PYTHON] Enable `DateType` and `TimeType` in `convert_numpy`

dongjoon Fri, 13 Feb 2026 11:44:47 -0800

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 8b83069435d7 [SPARK-55333][PYTHON] Enable `DateType` and `TimeType`  
in `convert_numpy`
8b83069435d7 is described below

commit 8b83069435d799f66715885e74ad00c4fcd7e9e8
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Fri Feb 13 11:44:26 2026 -0800

    [SPARK-55333][PYTHON] Enable `DateType` and `TimeType`  in `convert_numpy`
    
    ### What changes were proposed in this pull request?
    
    1, Enable `DateType` and `TimeType`  in `convert_numpy`
    2, Remove `date_as_object=True` from `convert_numpy`
    
    ### Why are the changes needed?
    1, to replace `convert_legacy` step by step;
    
    2, `date_as_object=True` is used in `pa.Array.to_pandas`, however, at least 
since pyarrow 2.0, the default value is already True
    
    see 
https://arrow.apache.org/docs/2.0/python/generated/pyarrow.Array.html?highlight=to_pandas#pyarrow.Array.to_pandas
    
    We don't have to explicitly set it since the minimum version is now 18.0
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    ci
    
    ### Was this patch authored or co-authored using generative AI tooling?
    no
    
    Closes #54303 from zhengruifeng/time_as_obj.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 python/pyspark/sql/conversion.py | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/python/pyspark/sql/conversion.py b/python/pyspark/sql/conversion.py
index bad2180c7317..a6a983c940e8 100644
--- a/python/pyspark/sql/conversion.py
+++ b/python/pyspark/sql/conversion.py
@@ -1352,6 +1352,8 @@ class ArrowArrayToPandasConversion:
             ShortType,
             IntegerType,
             LongType,
+            DateType,
+            TimeType,
             TimestampType,
             TimestampNTZType,
             UserDefinedType,
@@ -1476,17 +1478,10 @@ class ArrowArrayToPandasConversion:
                 YearMonthIntervalType,
             ),
         ):
-            # TODO(SPARK-55333): Revisit date_as_object in arrow->pandas 
conversion
-            # If the given column is a date type column, creates a series of 
datetime.date directly
-            # instead of creating datetime64[ns] as intermediate data to avoid 
overflow caused by
-            # datetime64[ns] type handling.
-            pandas_options = {
-                "date_as_object": True,
-            }
-            series = arr.to_pandas(**pandas_options)
+            series = arr.to_pandas()
         elif isinstance(spark_type, UserDefinedType):
             udt: UserDefinedType = spark_type
-            series = arr.to_pandas(date_as_object=True)
+            series = arr.to_pandas()
             series = series.apply(
                 lambda v: v
                 if hasattr(v, "__UDT__")


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-55333][PYTHON] Enable `DateType` and `TimeType` in `convert_numpy`

Reply via email to