malduarte opened a new issue, #17548:
URL: https://github.com/apache/datafusion/issues/17548

   ### Is your feature request related to a problem or challenge?
   
   In Apache Spark it is possible to define what the expected timestamp format 
will be.  
   
   See `timestampFormat` and `timestampNTZFormat `  [csv 
options](https://spark.apache.org/docs/latest/sql-data-sources-csv.html)
   
   This works for both user supplied and inferred schemas. 
   
   In datafusion similar functionality is only available when parsing CSV via 
[SQL DDL 
options](https://datafusion.apache.org/user-guide/sql/format_options.html) 
   
   Currently with datafusion that is not possible even for user supplied 
schemas. Parsing a CSV that has timestamps with non standard formats  will 
result in an error. Example
   CSV contains a column `created_at` with the following value 
`2025-06-23-05.07.34.214000`
   
   Schema definition for the timestamp column:
   ```
           Field::new("created_at", DataType::Timestamp(TimeUnit::Microsecond, 
None), true),
   ```
   Attempting to parse CSV file
   ```
   Error: ArrowError(ParseError("Error parsing column 11 at line 1: Parser 
error: Error parsing timestamp from '2025-06-23-05.07.34.214000': invalid 
timestamp separator"), None)
   ```
   
   ### Describe the solution you'd like
   
   Users should be able to supply a [custom timestamp 
format](https://docs.rs/chrono/0.4.42/chrono/format/strftime/index.html#specifiers)
   
   ### Describe alternatives you've considered
   
   Extend `CsvReadOptions` and allow users to supply a custom timestamp format. 
This would be similar to the Apache Spark approach. This format would be used 
to parse timestamps during schema inference or with user supplied schemas.
   
   And, optionally, extend `DataType::Timestamp` to include a user defined 
timestamp. This would be more flexible as it would allow per column timestamp 
formats.
   
   ### Additional context
   
   - Currenty the only workaround is to define non custom timetamps as being 
strings and convert them afterwards with extra code.
   - It is possible to supply a timestamp format already when parsing with [SQL 
DDL](https://datafusion.apache.org/user-guide/sql/format_options.html)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to