[I] currently there is no way to handle fractional seconds for reading csv file [arrow]

via GitHub Tue, 11 Feb 2025 18:40:15 -0800


zeddit opened a new issue, #45503:
URL: https://github.com/apache/arrow/issues/45503


   ### Describe the enhancement requested
   
   In my case, my csv has date/datetime fields like `20250210`(pa.date32()), 
`2025021106000062`(pa.timestamp('ms')), which cannot be converted smoothly.
   
   Till now, csv.read_csv cannot recognize date32 data type and cannot convert 
fractional seconds e.g. miliseconds. what I should do is using pandas like 
below:
   
   1. change my schema to pa.string() for those date/datetime fields
   2. read_csv the input file
   3. manually convert the data type to the one I need, e.g. 
`pa.array(pd.to_datetime(table['date'], format='%Y%m%d')).cast(pa.date32(), 
safe=True)`, and `pa.array(pd.to_datetime(table['timestamp_ms'], 
format='%Y%m%d%H%M%S%f')).cast(pa.timestamp('ms'), safe=True)`
   4. assemble the arrow table I need with 
pa.Table.from_arrays([converted_arrays, part_of_original_table_arrays], 
schema=original_schema)
   
   which would be quite inefficient, is there any other method to boost this 
way? thanks
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] currently there is no way to handle fractional seconds for reading csv file [arrow]

Reply via email to