kazu-2020 opened a new issue, #48236: URL: https://github.com/apache/arrow/issues/48236
### Describe the enhancement requested # Background We are implementing an ADBC adapter for Rails (https://github.com/red-data-tools/activerecord-adbc-adapter). With Rails + SQLite, timestamps are stored in the database as TEXT, so the adapter currently performs a Ruby-side conversion from the string to Arrow Time64 just before producing Arrow data. This conversion is a performance bottleneck when handling large volumes of data (see: [lib/activerecord_adbc_adapter/result.rb](https://github.com/red-data-tools/activerecord-adbc-adapter/blob/7eddfec33e18c43f1c038f31828698b74ae05d40/lib/activerecord_adbc_adapter/result.rb#L41-L65)). Readable ISO‑8601 examples of timestamp strings stored in SQLite: - UTC example: 2025-11-24T12:00:00Z (i.e., 12:00:00 UTC) - Local example (Japan Standard Time): 2025-11-24T21:00:00+09:00 (i.e., 21:00:00 JST) Both timestamp strings with and without offsets/timezones are seen in practice. Our branch’s implementation already supports parsing strings that include offsets. # Technical considerations - Input strings are often in a fixed format (e.g., YYYY-MM-DDTHH:MM:SS[.fraction][offset]), so a dedicated lightweight parser can be faster and safer than a generic strptime. - It is necessary to define how to handle fractional seconds (number of digits) and the output unit (milli/micro/nano), and to clarify behavior on parse failure (return null vs. error). - The repository’s value_parsing.h contains [ParseTimestampISO8601](https://github.com/apache/arrow/blob/55587efbf4f272afda97bff2f33d6aaf4b4c0c8a/cpp/src/arrow/util/value_parsing.h#L664), which recognizes and applies offsets in Z, +HH, +HHMM, and +HH:MM forms; offset handling is therefore already implemented. # Proposals (two starting points for discussion) ## Option A — Extend Cast to support utf8 → time64 - Pros: Integrates naturally with existing Cast workflows and can be used where users currently rely on Cast. - Considerations: Need to define cast semantics for rounding and error handling. ## Option B — Add a dedicated compute function (e.g., parse_time_to_time64(utf8, options)) - Pros: Allows fine-grained control of format strictness, output unit, and failure behavior; easier to roll out incrementally. - Considerations: Need to decide API policy to avoid duplicating functionality between Cast and a dedicated function. ### Our current work (for reference) On our fork branch feature/sqlite-time64-for-rails we implemented a lightweight parser (ParseTimeSinceMidnight, etc.) and a parse_time_to_time64-like implementation and have basic tests passing. PR (fork): https://github.com/kazu-2020/arrow/pull/1 # Requests / discussion points - Which API pattern should Arrow recommend (A: Cast extension or B: compute function)? - Feedback on acceptance criteria we should follow (precision/unit interface, failure semantics, separation of timezone interpretation responsibilities, etc.). - We would appreciate review feedback on our branch to align it with Arrow’s design principles. # References - Rails ADBC adapter: https://github.com/red-data-tools/activerecord-adbc-adapter - Our PR (fork): https://github.com/kazu-2020/arrow/pull/1 ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
