anjakefala opened a new issue, #44641: URL: https://github.com/apache/arrow/issues/44641
### Describe the enhancement requested There has been recent work to move the [ChunkResolver](https://github.com/apache/arrow/issues/34535) to public API. `ChunkResolver` uses `O(log(num_chunks))` binary search to identify chunks, which is optimised for random access. For sequential row-by-row access, using ChunkResolver would be inefficient. Sometimes a user needs to be able to do row-major processing of the data. To that note, the proposal is to add these [helper methods](https://github.com/apache/arrow/issues/34535#issuecomment-1977304538) to the `ChunKResolver` API for more efficient sequential access traversal. These helper methods were written by @felipecrv: ``` /// \pre loc.chunk_index >= 0 /// \pre loc.index_in_chunk is assumed valid if chunk_index is not the last one inline bool Valid(ChunkLocation loc) const { const int64_t last_chunk_index = static_cast<int64_t>(offsets_.size()) - 1; return loc.chunk_index + 1 < last_chunk_index || (loc.chunk_index + 1 == last_chunk_index && loc.index_in_chunk < offsets_[last_chunk_index]); } /// \pre Valid(loc) inline ChunkLocation Next(ChunkLocation loc) const { const int64_t next_index_in_chunk = loc.index_in_chunk + 1; return (next_index_in_chunk < offsets_[loc.chunk_index + 1]) ? ChunkLocation{loc.chunk_index, next_index_in_chunk} : ChunkLocation{loc.chunk_index + 1, 0}; } ``` with the resulting loops: ``` ChunkResolver resolver(batches); for (ChunkLocation loc; resolver.Valid(loc); loc = resolved.Next(loc)) { // re-use loc for all the typed columns since they are split on the same offsets } ``` ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org