kazu-2020 opened a new issue, #48236:
URL: https://github.com/apache/arrow/issues/48236

   ### Describe the enhancement requested
   
   # Background
   
   We are implementing an ADBC adapter for Rails 
(https://github.com/red-data-tools/activerecord-adbc-adapter). With Rails + 
SQLite, timestamps are stored in the database as TEXT, so the adapter currently 
performs a Ruby-side conversion from the string to Arrow Time64 just before 
producing Arrow data. This conversion is a performance bottleneck when handling 
large volumes of data (see: 
[lib/activerecord_adbc_adapter/result.rb](https://github.com/red-data-tools/activerecord-adbc-adapter/blob/7eddfec33e18c43f1c038f31828698b74ae05d40/lib/activerecord_adbc_adapter/result.rb#L41-L65)).
   
   Readable ISO‑8601 examples of timestamp strings stored in SQLite:
   
   - UTC example: 2025-11-24T12:00:00Z (i.e., 12:00:00 UTC)
   - Local example (Japan Standard Time): 2025-11-24T21:00:00+09:00 (i.e., 
21:00:00 JST)
   
   Both timestamp strings with and without offsets/timezones are seen in 
practice. Our branch’s implementation already supports parsing strings that 
include offsets.
   
   # Technical considerations
   
   - Input strings are often in a fixed format (e.g., 
YYYY-MM-DDTHH:MM:SS[.fraction][offset]), so a dedicated lightweight parser can 
be faster and safer than a generic strptime.
   - It is necessary to define how to handle fractional seconds (number of 
digits) and the output unit (milli/micro/nano), and to clarify behavior on 
parse failure (return null vs. error).
   - The repository’s value_parsing.h contains 
[ParseTimestampISO8601](https://github.com/apache/arrow/blob/55587efbf4f272afda97bff2f33d6aaf4b4c0c8a/cpp/src/arrow/util/value_parsing.h#L664),
 which recognizes and applies offsets in Z, +HH, +HHMM, and +HH:MM forms; 
offset handling is therefore already implemented.
   
   # Proposals (two starting points for discussion)
   
   ## Option A — Extend Cast to support utf8 → time64
   
   - Pros: Integrates naturally with existing Cast workflows and can be used 
where users currently rely on Cast.
   - Considerations: Need to define cast semantics for rounding and error 
handling.
   
   ## Option B — Add a dedicated compute function (e.g., 
parse_time_to_time64(utf8, options))
   
   - Pros: Allows fine-grained control of format strictness, output unit, and 
failure behavior; easier to roll out incrementally.
   - Considerations: Need to decide API policy to avoid duplicating 
functionality between Cast and a dedicated function.
   
   
   ### Our current work (for reference)
   
   On our fork branch feature/sqlite-time64-for-rails we implemented a 
lightweight parser (ParseTimeSinceMidnight, etc.) and a 
parse_time_to_time64-like implementation and have basic tests passing. PR 
(fork): https://github.com/kazu-2020/arrow/pull/1
   
   # Requests / discussion points
   
   - Which API pattern should Arrow recommend (A: Cast extension or B: compute 
function)?
   - Feedback on acceptance criteria we should follow (precision/unit 
interface, failure semantics, separation of timezone interpretation 
responsibilities, etc.).
   - We would appreciate review feedback on our branch to align it with Arrow’s 
design principles.
   
   # References
   
   - Rails ADBC adapter: 
https://github.com/red-data-tools/activerecord-adbc-adapter
   - Our PR (fork): https://github.com/kazu-2020/arrow/pull/1
   
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to