[I] [Python] Timestamp - out of bounds for nanoseconds [arrow]

via GitHub Tue, 15 Oct 2024 08:36:11 -0700


MarioShuuya opened a new issue, #44420:
URL: https://github.com/apache/arrow/issues/44420


   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   **Environment**
   OS: Windows/Linux
   Python: 3.11.2
   Pyarrow: 17.0.0
   Pandas: 2.2.2
   
   **Description**
   When trying to read a timestamp value, below the [pandas min. value of 
1677-09-21 
00:12:43.145224193](https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.min.html),
 from a datetime object into a pyarrow table, the result is an out of bounds 
for nanoseconds exception.
   
   I have found problems that might relate but did not solve the issue here
   
   - [ARROW-5359](https://issues.apache.org/jira/browse/ARROW-5359)
   - [ARROW-3448](https://issues.apache.org/jira/browse/ARROW-3448)
   
   **Example Code**
   ```python
   import pyarrow as pa
   import datetime
   
   schema = pa.schema([])
   schema = schema.append(pa.field("CreateAt", pa.timestamp(unit="ns")))
   
   ts = datetime.datetime(1677, 9, 21, 1)  # OK
   arrays = [[ts]]
   print(arrays)
   table = pa.Table.from_arrays(arrays, schema=schema)
   print(table)
   
   ts = datetime.datetime(1, 1, 1, 1)  # NoK
   arrays = [[ts]]
   print(arrays)
   table = pa.Table.from_arrays(arrays, schema=schema)
   print(table)
   ```
   
   **Use Case**
   I am reading data from a database, where one column has ns precision 
timestamps. Instead of null values, it uses `0001-01-01 00:00:00.0000000`. The 
goal is to store the result of the database read, which is an array containing 
datetime objects, into a Pyarrow table to then store it as parquet. This works 
well, until i hit a timestamp too big or small for pandas.
   
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] [Python] Timestamp - out of bounds for nanoseconds [arrow]

Reply via email to