khuggins opened a new issue, #47558:
URL: https://github.com/apache/arrow/issues/47558
### Describe the bug, including details regarding any error messages,
version, and platform.
What i expected:
via a `pandas.to_parquet()` i should be able to write to disk and preserve
the type of my data
What happened
The type of my data changed
Long description
I'm storing an array of pandas timestamp objects within a column of a pandas
dataframe. i'm writing this dataframe to disk and reading it back at a later
date and concatenating to it. However, when i read the dataframe back in via
`read_parquet`, the type of the timestamps in my array have changed from a
pandas.Timestamp to a numpy.datetime64.
This makes it challenging to append to the dataframe on disk as I get this
error:
```
pyarrow.lib.ArrowInvalid: ('numpy.datetime64 scalars cannot be mixed with
other Python scalar values currently', 'Conversion failed for column
<column_name> with type object')
```
The error is happening on the write and read. here's a minimal example:
```
from datetime import datetime, date
import pandas as pd
print(pd.__version__)
df = pd.DataFrame([(1, "x", date.today(), [pd.to_datetime(datetime(2018, 1,
2, 18, 53))])],
columns=["number", "string", "date", "datetime"])
print(df)
print(type(df['datetime'][0][0]))
df.to_parquet("/tmp/test_datetime.parquet")
df2 = pd.read_parquet("/tmp/test_datetime.parquet")
print(df2)
print(type(df2['datetime'][0][0]))
concat_df = pd.concat([df,df2])
concat_df.to_parquet("/tmp/test_datetime_concat.parquet")
```
the output of this snippet is:
```
2.2.3
number string date datetime
0 1 x 2025-09-13 [2018-01-02 18:53:00]
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
number string date datetime
0 1 x 2025-09-13 [2018-01-02T18:53:00.000000]
<class 'numpy.datetime64'>
```
with the associated error
```
{
"name": "ArrowInvalid",
"message": "('numpy.datetime64 scalars cannot be mixed with other
Python scalar values currently', 'Conversion failed for column datetime with
type object')",
}
```
the pandas version is `2.2.3` and the pyarrow version is `18.1.0`. I'm not
certain which one it is happening in, but because it happened on write, i'm
starting here.
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]