tustvold commented on issue #172:
URL: https://github.com/apache/iceberg-rust/issues/172#issuecomment-1911993011

   > I think if users are judicious and provide sufficients hints, and buffer 
the reads the performance difference will be negligible.
   
   If primarily performing sequential IO I would tend to agree, the AsyncRead 
abstraction will be less efficient than a streaming request, but if 
pre-fetching is configured appropriately the end-to-end latency should be 
similar. However, it is "random" IO such as occurs when reading structured file 
formats like parquet, that this difference becomes more stark. 
   
   Fortunately the fix is extremely simple, adding `InputFile::get_ranges` that 
can be called by 
[AsyncFileReader](https://docs.rs/parquet/latest/parquet/arrow/async_reader/trait.AsyncFileReader.html).
 This can then call through to vectorised IO primitives where supported.
   
   > Of course we are open to contributions from everyone
   > In iceberg's design, all file ios are hidden under the 
[FileIO](https://github.com/apache/iceberg-rust/blob/3b5c35ebc0b6e47bfaf74167711e7b605d994ab3/crates/iceberg/src/io.rs#L146)
 interface
   
   Would you be open to a PR to allow using either OpenDAL or object_store, or 
would you prefer to not complicate matters at this time? I _think_ this could 
be achieved in a fairly unobtrusive manner.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to