tustvold commented on issue #172: URL: https://github.com/apache/iceberg-rust/issues/172#issuecomment-1911993011
> I think if users are judicious and provide sufficients hints, and buffer the reads the performance difference will be negligible. If primarily performing sequential IO I would tend to agree, the AsyncRead abstraction will be less efficient than a streaming request, but if pre-fetching is configured appropriately the end-to-end latency should be similar. However, it is "random" IO such as occurs when reading structured file formats like parquet, that this difference becomes more stark. Fortunately the fix is extremely simple, adding `InputFile::get_ranges` that can be called by [AsyncFileReader](https://docs.rs/parquet/latest/parquet/arrow/async_reader/trait.AsyncFileReader.html). This can then call through to vectorised IO primitives where supported. > Of course we are open to contributions from everyone > In iceberg's design, all file ios are hidden under the [FileIO](https://github.com/apache/iceberg-rust/blob/3b5c35ebc0b6e47bfaf74167711e7b605d994ab3/crates/iceberg/src/io.rs#L146) interface Would you be open to a PR to allow using either OpenDAL or object_store, or would you prefer to not complicate matters at this time? I _think_ this could be achieved in a fairly unobtrusive manner. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org