Dandandan commented on PR #21351:
URL: https://github.com/apache/datafusion/pull/21351#issuecomment-4238357134
> > Prefetching IO / combining small IO requests (reducing spawn_blocking /
thread switching overhead)
>
> Should we think about the interaction with `ObjectStore`? The GCS/S3/etc.
implementations do range coalescing. When we do combining of small IO requests
should we take into account the layout of that data so that we combine requests
that would be adjacent in the data, thus maximizing block hit rates for these
providers?
I did some tinkering more recently with object store and think that actually
doing much more requests (e.g. 1000s) in parallel (ideally using HTTP/2 or 3)
would be better than a lot of coalescing, because:
1. For highly concurrent queries there isn't a lot of coalescing that could
work as data is in different files or row groups / wide tables (so effectively
you would download a large part of the file).
2. Multiple requests finish more quickly than coalescing with lots of space
(and causing lot's of extra bandwidth), for large requests splitting them into
smaller chunks and downloading in parallel also helps.
3. If you can process the data once it arrives you can hide the latency much
better than needing to wait (and idling) between requests
HTTP/2 is disabled by default in object-store I think mainly because forcing
everything through a single TCP channel causes some HOL-blocking issues even
with HTTP/2 (and perhaps in the h2 implementation), but this is circumventable
by creating a larger connection pool (combining multiplexing with multiple
requests). Using HTTP/2 avoids creating lots of connections (which is somewhat
slower as well) when issuing requests in parallel and helps with speeding up
new connections and issuing smaller requests.
HTTP/3 probably would solve the HOL blocking issue (as it doesn't use TCP).
Splitting large requests into smaller ones helps increasing throughput (not
sure if the standard object-store implementation does it).
There is a limit per shard / prefix of 5500/s (S3) / 5000/s (GCS), so when
doing many concurrent reads you need to make sure to spread the load over the
key space that the store creates with increasing load, so that's something else
to consider.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]