rtbs-dev opened a new issue, #44615: URL: https://github.com/apache/arrow/issues/44615
### Describe the enhancement requested I'm Coming from AwkwardArray and Polars use, trying to vectorize the equivalent of finding the byte offsets (or character spans) of all regex matches in an array of strings. See [this discussion](https://stackoverflow.com/questions/11918314/how-do-i-find-the-offset-of-a-matching-string-using-re2) for the request's solution in re2 directly. Per the solution there, it seems the information would be contained in the `re2::StringPiece` data, which [this thread](https://github.com/google/re2/issues/394#issuecomment-1290946763) indicated is preferable anyway, due to memory duplication. I see something vaguely related brought up [here](https://github.com/apache/arrow/issues/15381), where `string_view` was vendored instead, though I don't see a way to access the view objects right now, via the results of `extract_regex`. I do see the [struct getting returned](https://arrow.apache.org/docs/cpp/compute.html#string-component-extraction) is not a plain string, but adding span locations might mess with downstream users' type definitions or API contracts. Maybe new behavior could be added as an additional option? Alternatively, a new function `extract_regex_spans` would already make my life much easier, even if downstream libraries like Polars and AkwardArray have add new wrapper APIs for their code to support the behavior. Am I missing something obvious? I most importantly want to avoid having to loop twice over every string (first to find the string match and then to find the location of the previous match) because that feels wasteful when the matches are discovered via their offset locations in the first place, right? Thanks! ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org