[I] [Docs][HTTP] Clarify when to use batch-at-a-time vs. one-shot approach for receiving data [arrow]

via GitHub Sun, 17 Mar 2024 08:00:24 -0700


ianmcook opened a new issue, #40613:
URL: https://github.com/apache/arrow/issues/40613

### Describe the usage question you have. Please include as many useful
details as possible.

Among the simple HTTP GET client examples in
[`arrow-experiments/http/get_simple`](https://github.com/apache/arrow-experiments/tree/main/http/get_simple):

- Some iterate over the record batches as they stream in from the server
(i.e. "streaming" approach).
- Some just make a single function call that collects the full data (i.e.
"one-shot" approach).

For example:
- The [Python client
example](https://github.com/apache/arrow-experiments/blob/main/http/get_simple/python/client/client.py)
shows how to iterate over the batches calling `reader.read_next_batch()`,
whereas it could have just called `reader.read_all()` which would be simpler.
- The [Ruby client
example](https://github.com/apache/arrow-experiments/blob/main/http/get_simple/ruby/client/client.rb)
goes for the simpler all-at-once approach, whereas it could have used a
batch-at-a-time approach like in [this
example](https://gist.github.com/amoeba/b1ba73a1e863e689d4a2ee65601a18c5).

For many use cases, it makes no difference which approach is used, and we
should just prioritize whatever is syntactically simplest.

But for some use cases, the batch-at-a-time approach will be preferred or
needed for specific reasons, such as:
- The client wants to start processing batches _before_ the final batch is
received.
- The client wants to stream the data to a sink without accumulating it in
memory.

We should clarify this in the Arrow-over-HTTP conventions doc, and wherever
possible we should provide examples showing both approaches.

### Component(s)

Documentation

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Docs][HTTP] Clarify when to use batch-at-a-time vs. one-shot approach for receiving data [arrow]

Reply via email to