This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 8b83069435d7 [SPARK-55333][PYTHON] Enable `DateType` and `TimeType`
in `convert_numpy`
8b83069435d7 is described below
commit 8b83069435d799f66715885e74ad00c4fcd7e9e8
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Fri Feb 13 11:44:26 2026 -0800
[SPARK-55333][PYTHON] Enable `DateType` and `TimeType` in `convert_numpy`
### What changes were proposed in this pull request?
1, Enable `DateType` and `TimeType` in `convert_numpy`
2, Remove `date_as_object=True` from `convert_numpy`
### Why are the changes needed?
1, to replace `convert_legacy` step by step;
2, `date_as_object=True` is used in `pa.Array.to_pandas`, however, at least
since pyarrow 2.0, the default value is already True
see
https://arrow.apache.org/docs/2.0/python/generated/pyarrow.Array.html?highlight=to_pandas#pyarrow.Array.to_pandas
We don't have to explicitly set it since the minimum version is now 18.0
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
ci
### Was this patch authored or co-authored using generative AI tooling?
no
Closes #54303 from zhengruifeng/time_as_obj.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
python/pyspark/sql/conversion.py | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/python/pyspark/sql/conversion.py b/python/pyspark/sql/conversion.py
index bad2180c7317..a6a983c940e8 100644
--- a/python/pyspark/sql/conversion.py
+++ b/python/pyspark/sql/conversion.py
@@ -1352,6 +1352,8 @@ class ArrowArrayToPandasConversion:
ShortType,
IntegerType,
LongType,
+ DateType,
+ TimeType,
TimestampType,
TimestampNTZType,
UserDefinedType,
@@ -1476,17 +1478,10 @@ class ArrowArrayToPandasConversion:
YearMonthIntervalType,
),
):
- # TODO(SPARK-55333): Revisit date_as_object in arrow->pandas
conversion
- # If the given column is a date type column, creates a series of
datetime.date directly
- # instead of creating datetime64[ns] as intermediate data to avoid
overflow caused by
- # datetime64[ns] type handling.
- pandas_options = {
- "date_as_object": True,
- }
- series = arr.to_pandas(**pandas_options)
+ series = arr.to_pandas()
elif isinstance(spark_type, UserDefinedType):
udt: UserDefinedType = spark_type
- series = arr.to_pandas(date_as_object=True)
+ series = arr.to_pandas()
series = series.apply(
lambda v: v
if hasattr(v, "__UDT__")
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]